Time Series Analysis

This analysis takes a portfolio comprising of AMZN, GOOG, AAPL, MSFT and GS. First it uses ex-post data to arrive at the optimum portfolio weights. Then, several regression, smoothing and ARIMA models are fit to the data and a comparison of accuracy statistics is carried out. Afterward the volatility of returns and two other measures of volatility are estimated using ARCH/GARCH. Finally, the VAR (vector autoregressive) model is fit to the data and the impulse response function is graphed to assess the effect of a certain shock on the portfolio return.

Part 1 - Stock Selection
Part 2 - Select Weights
Part 3 - Calculate Returns
Part 4 - CAPM regression
- CAPM Results
- 3 Factor CAPM Results
Part 5 - Smoothing Techniques
- Note:
Part 6 - ARIMA
- ARIMA Forecast Results
Part 7 - ARIMA Stability
Part 8 - Volatility ARCH/GARCH
Part 9 - Volatility Proxies
Part 10 - Comparison of tecniques
Part 11 - Efficient Market Hypothesis
Part 12 - Value at Risk
Part 13 - VAR and IRF
Conclusion

Part 1

warnings = FALSE

library(corpcor)
library(tseries)

## Warning: package 'tseries' was built under R version 3.4.4

library(quantmod)

## Loading required package: xts

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: TTR

## Version 0.4-0 included new data defaults. See ?getSymbols.

library(forecast)

## Warning: package 'forecast' was built under R version 3.4.4

# Download stock and bond data
start_date = "2013-01-01"
end_date = "2018-03-31"
invest_date = "2014-01-01"

getSymbols("AAPL", from=start_date, to=end_date)

## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.

## 
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
## 
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).

## [1] "AAPL"

maapl<-to.monthly(AAPL)
aapl<-(maapl$AAPL.Adjusted)

getSymbols("AMZN", from=start_date, to=end_date)

## [1] "AMZN"

mamzn<-to.monthly(AMZN)
amzn<-(mamzn$AMZN.Adjusted)

getSymbols("GOOG", from=start_date, to=end_date)

## [1] "GOOG"

# Convert to monthly
mgoog<-to.monthly(GOOG)
goog<-(mgoog$GOOG.Adjusted)

getSymbols("GS", from=start_date, to=end_date)

## [1] "GS"

mgs<-to.monthly(GS)
gs<-(mgs$GS.Adjusted)

getSymbols("MSFT", from=start_date, to=end_date)

## [1] "MSFT"

mmsft<-to.monthly(MSFT)
msft<-(mmsft$MSFT.Adjusted)

# get the bond
getSymbols("^TNX", from=start_date, to=end_date)

## Warning: ^TNX contains missing values. Some functions will not work if
## objects contain missing values in the middle of the series. Consider using
## na.omit(), na.approx(), na.fill(), etc to remove or replace them.

## [1] "TNX"

mbond=to.monthly(TNX)

## Warning in to.period(x, "months", indexAt = indexAt, name = name, ...):
## missing values removed from data

bond<-na.fill((mbond$TNX.Adjusted), fill="left")

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

rbond=matrix(bond/1200)

# Generate monthly returns
raapl=diff(log(aapl))
ramzn=diff(log(amzn))
rgoog=diff(log(goog))
rgs=diff(log(gs))
rmsft=diff(log(msft))

# Generate monthly market risk premium for each asset
rpaapl=raapl-rbond
rpamzn=ramzn-rbond
rpgoog=rgoog-rbond
rpgs=rgs-rbond
rpmsft=rmsft-rbond

# data tables
data_ret_rp = data.frame(raapl,ramzn,rgoog,rgs,rmsft,rbond,rpaapl,rpamzn,rpgoog,rpgs,rpmsft)
colnames(data_ret_rp) <- c("rAAPL", "rAMZN", "rGOOG", "rGS", "rMSFT", "bond", "rpAAPL", "rpAMZN", "rpGOOG", "rpGS", "rpMSFT")

# Define corresponding sets of data to avoid having to keep indexing. [1] is 1/2013, [62] is 3/2018 and [58] is 11/2017
data_ret_rp_2013 <- data_ret_rp[2:12,] # this is the 2/2013 through 12/2013 to calculate the optimal weights
data_ret_rp_train <- data_ret_rp[13:59,] # this is the 2014 through 11/2017 to forecast
data_ret_rp_test <- data_ret_rp[60:62,] # this is the 12/2017 to 2/2018 period

range.names = c("AAPL", "AMZN", "GOOG", "GS", "MSFT")

Part 2

# Calculate the weights for optimal risk vs. return
weights <- function(ret_calc, bond_calc) {
  covariance_matrix = matrix(c(cov(ret_calc)),nrow=5, ncol=5)
  mean_returns = matrix(colMeans(x=ret_calc, na.rm = TRUE))
  mean_bond = mean(bond_calc)
  mean_rp_returns = mean_returns - mean_bond
  w = matrix(solve(covariance_matrix) %*% mean_rp_returns)
  sw = sum(w)
  nw = w/sw
  return(nw)
}

# Calculate portfolio monthly returns
# USAGE: pfol_returns(weights, ret_calc)
pfol_returns_monthly <- function(weights, ret_calc) {
    return(t(weights) %*% t(ret_calc))
}

# Calculate the weights - will not rebalance due to large short positions suggested in 2015 and 2016
ret_13 = data_ret_rp_2013[,1:5]
bond_13 = data_ret_rp_2013[,6]
w_13 = weights(ret_13, bond_13)
# ret_14 = data_ret_rp[12:23,]
# bond_14 = rbond[12:2]
# w_14 = weights(ret_14, bond_14)

sprintf("Below are the weights in the order AAPL, AMZN, GOOG, GS and MSFT")

## [1] "Below are the weights in the order AAPL, AMZN, GOOG, GS and MSFT"

w_13

##            [,1]
## [1,]  0.1948645
## [2,]  0.3533498
## [3,]  0.1869541
## [4,] -0.1131105
## [5,]  0.3779420

Part 3

# Put all of the return, risk premium and Fama French information in one place
pfol_ret_train = pfol_returns_monthly(w_13, data_ret_rp_train[,1:5]) # pfol returns
pfol_ret_rp_train = pfol_returns_monthly(w_13, data_ret_rp_train[,7:11]) # pfol returns - risk free rate
pfol_ret_test = pfol_returns_monthly(w_13, data_ret_rp_test[,1:5]) # pfol returns
pfol_ret_rp_test = pfol_returns_monthly(w_13, data_ret_rp_test[,7:11]) # pfol returns - risk free rate
pfol_ret_train <- ts(pfol_ret_train, start=2014, frequency=12)
pfol_ret_rp_train <- ts(pfol_ret_rp_train, start=2014, frequency=12)
pfol_ret_test <- ts(pfol_ret_test, start=2017-12-01, frequency=12)
pfol_ret_rp_test <- ts(pfol_ret_rp_test, start=2014-12-01, frequency=12)

data_ret_rp_train$pfol = t(pfol_ret_train)
data_ret_rp_train$pfol_rp = t(pfol_ret_rp_train)
data_ret_rp_test$pfol = t(pfol_ret_test)
data_ret_rp_test$pfol_rp = t(pfol_ret_rp_test)

FF <- read.csv(file="F-F_Research_Data_Factors.CSV", header=TRUE, sep=",")

data_ret_rp_train$MRP = FF[13:59,2]/100
data_ret_rp_train$SMB = FF[13:59,3]/100
data_ret_rp_train$HML = FF[13:59,4]/100
data_ret_rp_train$RF = FF[13:59,5]/100

data_ret_rp_test$MRP = FF[60:62,2]/100
data_ret_rp_test$SMB = FF[60:62,3]/100
data_ret_rp_test$HML = FF[60:62,4]/100
data_ret_rp_test$RF = FF[60:62,5]/100

write.csv(data_ret_rp_train, 'out_ret_rp_train.csv')
write.csv(data_ret_rp_test, 'out_ret_rp_test.csv')

initial_investment = 100
sprintf("Cummulative portfolio returns from 2014 through 11/2017 are %#.4f", sum(pfol_ret_train))

## [1] "Cummulative portfolio returns from 2014 through 11/2017 are 0.9744"

sprintf("With an initial investment of $%#.2f, your total portfolio value as of 12/1/2017 is $%#.2f", initial_investment, 100*(1+sum(pfol_ret_train)))

## [1] "With an initial investment of $100.00, your total portfolio value as of 12/1/2017 is $197.44"

pfol_ret = ts(data_ret_rp_train$pfol, start=2014, frequency=12)

Part 4

library(forecast)
reg_CAPM <- lm(pfol_rp~MRP, data=data_ret_rp_train)
summary(reg_CAPM)

## 
## Call:
## lm(formula = pfol_rp ~ MRP, data = data_ret_rp_train)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.111924 -0.028187 -0.002317  0.028906  0.098323 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.008600   0.006957   1.236    0.223    
## MRP         1.101848   0.231090   4.768 1.98e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04533 on 45 degrees of freedom
## Multiple R-squared:  0.3356, Adjusted R-squared:  0.3209 
## F-statistic: 22.73 on 1 and 45 DF,  p-value: 1.983e-05

# will use the forecast +- 1.96 * standard error formula for computing the 95% confidence interval.
pred <- predict(reg_CAPM, newdata=data_ret_rp_test, se.fit = TRUE, interval="confidence", level = 0.95)
pred

## $fit
##                  fit          lwr          upr
## Dec 2017  0.02027971  0.006949242  0.033610186
## Jan 2018  0.06997306  0.044625631  0.095320488
## Feb 2018 -0.03161733 -0.056777966 -0.006456688
## 
## $se.fit
##    Dec 2017    Jan 2018    Feb 2018 
## 0.006618564 0.012584969 0.012492228 
## 
## $df
## [1] 45
## 
## $residual.scale
## [1] 0.0453323

accuracy(forecast(reg_CAPM, newdata =data_ret_rp_test,  h=3))

##                        ME      RMSE        MAE      MPE     MAPE      MASE
## Training set 9.237165e-19 0.0443573 0.03493115 29.38372 149.9925 0.7755625

The regression and the MRP coefficient are significant. The forecasts and 95% confidence interval (Lower and Upper columns) are in the table below where I remove the intercept from the forecast.

CAPM Results

The CAPM regression is \[Pfol_{risk premium} = \beta MRP = \beta Market_{return} - r_f\] so we have to add the bond back to the portfolio risk premium in order to get the portfolio returns.

Date	Forecast (RP)	Forecast Return	Lower	Upper	Actual Return	Forecast/Actual
12/2017	0.0117	0.0131	0.0002	0.0261	0.0022	5.89
1/2018	0.0614	0.0631	0.0384	0.0877	0.1290	.49
2/2018	-0.0402	-0.0385	-0.0630	-0.0140	0.0154	-2.5

reg_3fCAPM <- lm(pfol_rp~MRP+SMB+HML, data=data_ret_rp_train)
summary(reg_3fCAPM)

## 
## Call:
## lm(formula = pfol_rp ~ MRP + SMB + HML, data = data_ret_rp_train)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.061109 -0.021160 -0.002654  0.023167  0.080996 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.005169   0.005303   0.975 0.335158    
## MRP          1.279855   0.180809   7.078 9.91e-09 ***
## SMB         -0.813489   0.210100  -3.872 0.000362 ***
## HML         -0.760446   0.204254  -3.723 0.000568 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03423 on 43 degrees of freedom
## Multiple R-squared:  0.6381, Adjusted R-squared:  0.6128 
## F-statistic: 25.27 on 3 and 43 DF,  p-value: 1.395e-09

# will use the forecast +- 1.96 * standard error formula for computing the 95% confidence interval.
pred <- predict(reg_3fCAPM, newdata=data_ret_rp_test, se.fit = TRUE, interval="confidence", level = 0.95)
pred

## $fit
##                  fit         lwr         upr
## Dec 2017  0.02823026  0.01692955  0.03953096
## Jan 2018  0.11222933  0.08686621  0.13759245
## Feb 2018 -0.03469815 -0.05551599 -0.01388030
## 
## $se.fit
##    Dec 2017    Jan 2018    Feb 2018 
## 0.005603586 0.012576596 0.010322768 
## 
## $df
## [1] 43
## 
## $residual.scale
## [1] 0.03422909

accuracy(forecast(reg_3fCAPM, newdata =data_ret_rp_test,  h=3))

##                         ME       RMSE        MAE      MPE    MAPE
## Training set -1.113398e-19 0.03274015 0.02606064 124.1033 150.538
##                   MASE
## Training set 0.5786142

The regression, MRP, SMB and HML coefficients are significant.

3 Factor CAPM Results

The CAPM 3 factor regression is \[Pfol_{risk premium} = \beta_1 MRP + \beta_2 SMB + \beta_3 HML\] so we have to add the bond back to the portfolio risk premium in order to get the portfolio returns.

Date	Forecast (RP)	Forecast Return	Lower	Upper	Actual Return	Forecast/Actual
12/2017	0.0231	0.0245	0.0135	0.0355	0.0022	11.0
1/2018	0.1071	0.1087	0.0841	0.1334	0.1290	.84
2/2018	-0.0399	-0.0381	-0.0584	-0.0179	0.0154	-2.47

Neither the CAPM or 3 factor CAPM do good job of forecasting the portfolio returns. It is hard to say that the 3 factor CAPM is any better than the CAPM or the other way around. The ‘forecast/actual’ column shows this ratio and in the 12/2017 case the CAPM was less worse, in the 1/2018 case the 3 factor CAPM was better and in the 2/2018 case the 3 factor CAPM was marginally better.

Part 5

Note:

For the smoothing and ARIMA forecasts we are forecasting portoflio returns so there is no need to add back the bond.

# 5: Naive forecast
naive(pfol_ret, 3)

##          Point Forecast       Lo 80     Hi 80      Lo 95     Hi 95
## Dec 2017     0.03055546 -0.08355752 0.1446684 -0.1439653 0.2050762
## Jan 2018     0.03055546 -0.13082467 0.1919356 -0.2162541 0.2773651
## Feb 2018     0.03055546 -0.16709402 0.2282049 -0.2717233 0.3328343

plot(naive(pfol_ret, 3), xlab="Time")
legend("topleft",lty=1, col=c(1,"black"), c("data"))

accuracy(naive(pfol_ret, 3))

##                       ME       RMSE        MAE       MPE     MAPE
## Training set 0.001464876 0.08904283 0.07510783 -8.307972 431.1503
##                   MASE      ACF1
## Training set 0.9903769 -0.603905

#5: 3 period moving average
period = 3
ma3 <- ma(pfol_ret, order=period, centre=FALSE)
plot(pfol_ret, col="black", main="Portfolio 3 period moving average", xlab="Time")
lines(ma3, col="red")
legend("topleft",lty=1, col=c(1,"black"), c("data"))

predict(ma3, 4)

## Warning in ets(object, lambda = lambda, biasadj = biasadj,
## allow.multiplicative.trend = allow.multiplicative.trend, : Missing values
## encountered. Using longest contiguous portion of time series

##          Point Forecast        Lo 80      Hi 80       Lo 95      Hi 95
## Nov 2017     0.02064034 -0.009141864 0.05042254 -0.02490761 0.06618829
## Dec 2017     0.02064034 -0.009141864 0.05042254 -0.02490761 0.06618829
## Jan 2018     0.02064034 -0.009141865 0.05042254 -0.02490761 0.06618829
## Feb 2018     0.02064034 -0.009141865 0.05042254 -0.02490761 0.06618829

accuracy(ma3, pfol_ret)

##                   ME       RMSE        MAE       MPE     MAPE       ACF1
## Test set 0.001155313 0.05345471 0.04465934 -6.410823 235.1385 -0.6793237
##          Theil's U
## Test set 0.6548385

#5: 5 period moving average
period = 5
ma5 <- ma(pfol_ret, order=period, centre=FALSE)
plot(pfol_ret, col="black", main="Portfolio 3 period moving average", xlab="Time")
lines(ma5, col="red")
legend("topleft",lty=1, col=c(1,"black"), c("data"))

predict(ma5, 5)

## Warning in ets(object, lambda = lambda, biasadj = biasadj,
## allow.multiplicative.trend = allow.multiplicative.trend, : Missing values
## encountered. Using longest contiguous portion of time series

##          Point Forecast        Lo 80      Hi 80        Lo 95      Hi 95
## Oct 2017     0.03487265 0.0161647647 0.05358054  0.006261408 0.06348389
## Nov 2017     0.03487265 0.0112570499 0.05848825 -0.001244294 0.07098959
## Dec 2017     0.03487265 0.0072066391 0.06253866 -0.007438863 0.07718416
## Jan 2018     0.03487265 0.0036777825 0.06606752 -0.012835783 0.08258108
## Feb 2018     0.03487265 0.0005094233 0.06923588 -0.017681370 0.08742667

accuracy(ma5, pfol_ret)

##                    ME       RMSE        MAE       MPE    MAPE       ACF1
## Test set -0.001673655 0.05397095 0.04455854 -51.67693 260.391 -0.4580408
##          Theil's U
## Test set 0.8956348

# 5: Simple Exponential Smoothing
alpha1 =.1
alpha2 = .75
fit1 <- ses(pfol_ret, alpha=alpha1, initial="simple", h=3)
fit2 <- ses(pfol_ret, alpha=alpha2, initial="simple", h=3)
plot(fit2, main="Simple Exponential Smoothing (no trend)", fcol="white", type="o")
lines(fitted(fit1), col="blue", type="o")
lines(fitted(fit2), col="red", type="o")
lines(fit1$mean, col="blue", type="o")
lines(fit2$mean, col="blue", type="o")
legend("topleft",lty=1, col=c(1,"blue","red"), c("data", expression(alpha == alpha1), expression(alpha == alpha2)))

#legend("topleft",lty=1, col=c(1,"red"), c("data", expression(alpha == .75)))
summary(fit2)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pfol_ret, h = 3, initial = "simple", alpha = alpha2) 
## 
##   Smoothing parameters:
##     alpha = 0.75 
## 
##   Initial states:
##     l = -0.0368 
## 
##   sigma:  0.0765
## Error measures:
##                       ME       RMSE        MAE       MPE     MAPE
## Training set 0.002301682 0.07650081 0.06414185 -90.71479 425.2322
##                   MASE       ACF1
## Training set 0.8457788 -0.5301315
## 
## Forecasts:
##          Point Forecast       Lo 80     Hi 80      Lo 95     Hi 95
## Dec 2017     0.04430545 -0.05373427 0.1423452 -0.1056334 0.1942443
## Jan 2018     0.04430545 -0.07824421 0.1668551 -0.1431181 0.2317290
## Feb 2018     0.04430545 -0.09861078 0.1872217 -0.1742661 0.2628770

# 5: Holt
holt1 <- holt(pfol_ret, alpha=0.8, beta=0.2, initial="simple", exponential=FALSE, h=3) 
plot(holt1, col="black", type="o")
lines(fitted(holt1), col="red") 
lines(holt1$mean, col="blue", type="o") 
legend("topleft", lty=1, col=c("black","red"), c("Data","Holt's linear trend"))

predict(holt1, n.ahead=3)

##          Point Forecast       Lo 80     Hi 80      Lo 95     Hi 95
## Dec 2017     0.04780406 -0.06357239 0.1591805 -0.1225315 0.2181396
## Jan 2018     0.05032212 -0.10718797 0.2078322 -0.1905688 0.2912130
## Feb 2018     0.05284017 -0.15373237 0.2594127 -0.2630853 0.3687656

accuracy(holt1)

##                        ME       RMSE        MAE       MPE    MAPE
## Training set -0.008856668 0.08690751 0.07432256 -189.7633 611.288
##                   MASE       ACF1
## Training set 0.9800223 -0.4887236

# 5: Holt Winters (seasonal) method
hws1 <- hw(pfol_ret,seasonal="additive")
# cannot run multiplicative model due to error: Inappropriate model for data with negative or zero values
plot(hws1, type="o", col="black")
lines(fitted(hws1), col="red", lty=1)
lines(head(hws1$mean,3), type="o", col="blue")
legend("topleft",lty=1, pch=1, col=c("black","red"), c("data","Holt Winters' Additive"))

head(hws1$mean,3)

##              Jan         Feb Mar Apr May Jun Jul Aug Sep Oct Nov
## 2017                                                            
## 2018 0.009210074 0.025340827                                    
##              Dec
## 2017 0.006891063
## 2018

accuracy(hws1)

##                         ME       RMSE        MAE       MPE     MAPE
## Training set -0.0001602817 0.04845132 0.04073225 -32.80561 297.2107
##                   MASE       ACF1
## Training set 0.5370982 -0.3147235

#states <- cbind(hws1$model$states[,1:3],hws2$model$states[,1:3])
#colnames(states) <- c("level","slope","seasonal","level","slope","seasonal")
#plot(states, xlab="Year")
#hws1$model$state[,1:3]

Part 6

Do an ARIMA model of your portfolio returns and use it for three-period ahead forecasting of the returns to portfolio. Write confidence interval. Estimate the accuracy statistics.

plot(pfol_ret, main="Monthly Portfolio Returns")

Portfolio returns look random but lets check with ACF/PACF

#Testing portfolio price for Stationarity
par(mfrow=c(1,2))
acf(pfol_ret,main="ACF")
pacf(pfol_ret,main="PACF")

#Unit Root Test on portfolio price
adf.test(pfol_ret)

## Warning in adf.test(pfol_ret): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  pfol_ret
## Dickey-Fuller = -4.993, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

According ACF/PACF and the augmented Dickey Fuller test the returns are stationary. According to the ACF/PACF and using the parsimony principle, the model is ARIMA(1,0,1) on the returns.

fit_arima <- arima(pfol_ret, order=c(1, 0, 1))
summary(fit_arima)

## 
## Call:
## arima(x = pfol_ret, order = c(1, 0, 1))
## 
## Coefficients:
##         ar1      ma1  intercept
##       0.536  -1.0000     0.0212
## s.e.  0.143   0.0935     0.0010
## 
## sigma^2 estimated as 0.00234:  log likelihood = 74.31,  aic = -140.61
## 
## Training set error measures:
##                        ME       RMSE        MAE      MPE     MAPE
## Training set -0.007854736 0.04837142 0.04077945 33.58753 294.6323
##                   MASE       ACF1
## Training set 0.5429454 -0.0688287

library(lmtest)
coeftest(fit_arima)

## 
## z test of coefficients:
## 
##             Estimate Std. Error z value  Pr(>|z|)    
## ar1        0.5359938  0.1430450   3.747 0.0001789 ***
## ma1       -0.9999993  0.0935013 -10.695 < 2.2e-16 ***
## intercept  0.0211523  0.0010234  20.669 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Step 3, Diagnostics: Residuals Test
res <- fit_arima$residuals
par(mfrow=c(1,2))
acf(res,main="ACF of Residuals")
pacf(res,main="PACF of Residuals")

adf.test(res)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  res
## Dickey-Fuller = -4.0528, Lag order = 3, p-value = 0.01554
## alternative hypothesis: stationary

plot(res, type="o")
Box.test(res,type='Ljung',lag=log(length(res)))

## 
##  Box-Ljung test
## 
## data:  res
## X-squared = 2.2361, df = 3.8501, p-value = 0.67

The residuals look like white noise/random. The Box-Ljung test also accepts that the residuals are random.

# Auto ARIMA
fit_auto <- auto.arima(pfol_ret)
summary(fit_auto)

## Series: pfol_ret 
## ARIMA(0,0,1)(0,0,1)[12] with non-zero mean 
## 
## Coefficients:
##           ma1     sma1    mean
##       -0.4908  -0.2152  0.0211
## s.e.   0.1731   0.1344  0.0032
## 
## sigma^2 estimated as 0.002547:  log likelihood=74.8
## AIC=-141.59   AICc=-140.64   BIC=-134.19
## 
## Training set error measures:
##                        ME       RMSE        MAE      MPE     MAPE
## Training set -0.002185112 0.04883408 0.03952355 17.01204 189.5377
##                   MASE       ACF1
## Training set 0.5211602 0.05092489

res <- fit_auto$residuals
par(mfrow=c(1,2))
acf(res,main="ACF of Residuals")
pacf(res,main="PACF of Residuals")

adf.test(res)

## Warning in adf.test(res): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  res
## Dickey-Fuller = -4.8395, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

plot(res, type="o")
checkresiduals(fit_auto)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(0,0,1)[12] with non-zero mean
## Q* = 21.342, df = 21, p-value = 0.4382
## 
## Model df: 3.   Total lags used: 24

Following the parsimony principle, I will chose the Auto ARIMA(0,0,1) model on portfolio returns (ARIMA(0,1,1) on price). The MA coefficient is significant at the 95% level. The residuals of the model are stationary and appear to be white noise.

forecast <- forecast(fit_auto, h=3, level = 0.95)
plot(forecast)

predict(fit_auto, newdata=data_ret_rp_test, n.ahead=3, interval="confidence", level = 0.95)

## $pred
##              Jan         Feb Mar Apr May Jun Jul Aug Sep Oct Nov
## 2017                                                            
## 2018 0.014853294 0.022296242                                    
##              Dec
## 2017 0.007256747
## 2018            
## 
## $se
##             Jan        Feb Mar Apr May Jun Jul Aug Sep Oct Nov        Dec
## 2017                                                           0.05047152
## 2018 0.05622285 0.05622285

accuracy(forecast)

##                        ME       RMSE        MAE      MPE     MAPE
## Training set -0.002185112 0.04883408 0.03952355 17.01204 189.5377
##                   MASE       ACF1
## Training set 0.5211602 0.05092489

ARIMA Forecast Results

The ARIMA forecast is slighlty better than the CAPM and 3 factor CAPM since the forecasts came closer to the actual. Though it is worth noting that the confidence intermal in the CAPM models is tighter than that of the ARIMA model.

Date	Forecast Return	Std. Error	Lower	Upper	Actual Return	Forecast/Actual
12/2017	0.007256747	0.05047152	-0.091667432	0.106180926	0.002228345	3.256564212
1/2018	0.014853294	0.05622285	-0.095343492	0.12505008	0.128970138	0.115168474
2/2018	0.022296242	0.05622285	-0.087900544	0.132493028	0.01541726	1.446187091

# Fit ARIMA(1,0,1) with the first lag of the variable
fit_arima_l1 <- arima(pfol_ret, xreg = cbind(data_ret_rp_train$MRP,data_ret_rp_train$SMB,data_ret_rp_train$HML), order=c(1, 0, 1))
summary(fit_arima_l1)

## 
## Call:
## arima(x = pfol_ret, order = c(1, 0, 1), xreg = cbind(data_ret_rp_train$MRP, 
##     data_ret_rp_train$SMB, data_ret_rp_train$HML))
## 
## Coefficients:
##           ar1     ma1  intercept
##       -0.7600  0.6852     0.0071
## s.e.   0.3922  0.4312     0.0048
##       cbind(data_ret_rp_train$MRP, data_ret_rp_train$SMB, data_ret_rp_train$HML)1
##                                                                            1.2703
## s.e.                                                                       0.1710
##       cbind(data_ret_rp_train$MRP, data_ret_rp_train$SMB, data_ret_rp_train$HML)2
##                                                                           -0.8237
## s.e.                                                                       0.2056
##       cbind(data_ret_rp_train$MRP, data_ret_rp_train$SMB, data_ret_rp_train$HML)3
##                                                                           -0.7414
## s.e.                                                                       0.1922
## 
## sigma^2 estimated as 0.001054:  log likelihood = 94.39,  aic = -174.77
## 
## Training set error measures:
##                        ME       RMSE        MAE       MPE   MAPE      MASE
## Training set 2.663951e-05 0.03247008 0.02635677 -41.85882 192.17 0.3509191
##                    ACF1
## Training set 0.05395827

coeftest(fit_arima)

## 
## z test of coefficients:
## 
##             Estimate Std. Error z value  Pr(>|z|)    
## ar1        0.5359938  0.1430450   3.747 0.0001789 ***
## ma1       -0.9999993  0.0935013 -10.695 < 2.2e-16 ***
## intercept  0.0211523  0.0010234  20.669 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

forecast_arima_l1 <- forecast(fit_arima_l1, xreg=cbind(data_ret_rp_test$MRP,data_ret_rp_test$SMB,data_ret_rp_test$HML))
plot(forecast_arima_l1)

#predict(fit_auto, newdata=data_ret_rp_test, n.ahead=3, interval="confidence", level = 0.95)
accuracy(forecast_arima_l1)

##                        ME       RMSE        MAE       MPE   MAPE      MASE
## Training set 2.663951e-05 0.03247008 0.02635677 -41.85882 192.17 0.3475422
##                    ACF1
## Training set 0.05395827

Part 7

#7. Test your ARIMA model for the stability of the ARIMA coefficients.
#2- Coefficients of ARIMA are stable.  To test for the stability of the coefficients, split data, run two ARIMA and do an F test.  
# per https://www.stata.com/manuals13/rhausman.pdf
  #The null hypothesis is that the estimator θb2 is indeed an efficient (and consistent) estimator of the true parameters. If this is the case, there should be no systematic difference between the two estimators. 

fit1 = auto.arima(pfol_ret[1:12])
summary(fit1)

## Series: pfol_ret[1:12] 
## ARIMA(0,0,0) with zero mean 
## 
## sigma^2 estimated as 0.001527:  log likelihood=21.88
## AIC=-41.76   AICc=-41.36   BIC=-41.27
## 
## Training set error measures:
##                      ME       RMSE        MAE MPE MAPE      MASE      ACF1
## Training set 0.00504417 0.03907729 0.03304711 100  100 0.6118212 -0.414924

fit2 = auto.arima(pfol_ret[13:46])
summary(fit2)

## Series: pfol_ret[13:46] 
## ARIMA(0,0,1) with non-zero mean 
## 
## Coefficients:
##           ma1    mean
##       -0.5907  0.0246
## s.e.   0.2607  0.0040
## 
## sigma^2 estimated as 0.003016:  log likelihood=51.24
## AIC=-96.48   AICc=-95.68   BIC=-91.9
## 
## Training set error measures:
##                        ME       RMSE        MAE      MPE     MAPE
## Training set 0.0004813704 0.05327457 0.04278998 145.8056 205.0899
##                   MASE      ACF1
## Training set 0.5255062 0.1016045

An F test doesn’t have to be conducted because an Auto ARIMA on an equal split of the portfolio results in a different order model. So that is the very defintion of unstable coefficients.

$F = \frac{(SSR_T – SSR_1 – SSR_2)/k}{(SSR_1 + SSR_2)/(T – 2k)}$, T is sample size and K = p + q + 1

ssr_t = sum(fit_auto$residuals^2)
ssr_1 = sum(fit1$residuals^2)
ssr_2 = sum(fit2$residuals^2)
K = 0 + 1 + 1
T = length(pfol_ret)
F = ((ssr_t - ssr_1 - ssr_2)/K) / ((ssr_1 + ssr_2)/(T - 2*K))
F

## [1] -0.5127628

# Hausman Test for endogenity of regressors: Ho: Coefs are the same,  Ha: Coefs are different 
# CANNOT SOLVE because fit1 and fit2 are different order models so second line below fails.
# cf_diff <-  coef(fit1) - coef(fit2)
# vc_diff <-  vcov(fit1) - vcov(fit2)
# z_diff <-  as.vector(t(cf_diff)%*%solve(vc_diff)%*%cf_diff)
# pchisq(z_diff, df=2, lower.tail=FALSE)

The F-value is greater than the F-statistic so I can reject the null that the coefficients are the same. Also, I cannot run a Hausman test because the order of the split ARIMA models is not the same. So clearly the coefficients are not stable.

Part 8

#8. Test your ARIMA model for the existence of ARCH and GARCH and do proper corrections, if needed.
#find the best ARIMA and take error
auto_arima_squared_residuals = fit_auto$residuals^2 #square best ARIMA error
par(mfrow=c(1,2))                                   #do ACF/PACF of square of error of ARIMA
acf(auto_arima_squared_residuals, main="ACF")       #if it is AR(1) or AR(2) then it is ARCH
pacf(auto_arima_squared_residuals, main="PACF")     #if it is ARMA(1,1) then it is GARCH

fit_auto_arima_squared_residuals <- auto.arima(auto_arima_squared_residuals)  #instead of ACF/PACF you can do auto.ARIMA
summary(fit_auto_arima_squared_residuals)

## Series: auto_arima_squared_residuals 
## ARIMA(0,0,0) with non-zero mean 
## 
## Coefficients:
##         mean
##       0.0024
## s.e.  0.0005
## 
## sigma^2 estimated as 1.016e-05:  log likelihood=204
## AIC=-404   AICc=-403.72   BIC=-400.3
## 
## Training set error measures:
##                        ME        RMSE         MAE       MPE     MAPE
## Training set -6.92062e-19 0.003153255 0.002195095 -22646.09 22675.98
##                   MASE      ACF1
## Training set 0.6220665 0.1267859

checkresiduals(fit_auto_arima_squared_residuals)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0) with non-zero mean
## Q* = 27.876, df = 23, p-value = 0.2205
## 
## Model df: 1.   Total lags used: 24

According to the ACF/PACF of the squared residuals the model is ARMA(0,0). The auto ARIMA method also results in an ARMA(0,0) model. Therefore, there does not appear to be ARCH or GARCH in the portoflio returns. According to the Ljung-Box test the ARIMA model’s squared residuals are white noise. White noise residuals do not directly relate to ARCH or GARCH but give further proof that the error is randomly distributed.

Part 9

#9. Find different time-series measures of volatility for your portfolio returns (see the volatility file posted on Blackboard) and do a three-period ahead forecasting of the portfolio volatility.
historical_var <-var(pfol_ret)
sprintf("Historical variance is %#.4f", historical_var)

## [1] "Historical variance is 0.0030"

r2_var <- pfol_ret^2
plot(r2_var, type="l", main="Volatility Measured by r^2", ylab="returns squared")

par(mfrow=c(1,2))
acf(r2_var)
pacf(r2_var)

# Suggest an ARIMA(0,0,0) model

#Auto ARIMA fitted to R^2 measure of variance.
fitr2v <- auto.arima(r2_var)
fitr2v

## Series: r2_var 
## ARIMA(0,0,0) with non-zero mean 
## 
## Coefficients:
##         mean
##       0.0034
## s.e.  0.0008
## 
## sigma^2 estimated as 2.777e-05:  log likelihood=180.37
## AIC=-356.74   AICc=-356.46   BIC=-353.04

#Forecasting volatility using R^2.
#fitr2v <- arima(r2_var, order=c(1, 0, 1))
fcastr2v <- forecast(fitr2v, h=3) 
plot(fcastr2v, type="l", ylab="returns squared")

fcastr2v

##          Point Forecast        Lo 80      Hi 80        Lo 95      Hi 95
## Dec 2017    0.003389585 -0.003363558 0.01014273 -0.006938455 0.01371763
## Jan 2018    0.003389585 -0.003363558 0.01014273 -0.006938455 0.01371763
## Feb 2018    0.003389585 -0.003363558 0.01014273 -0.006938455 0.01371763

# Cannot do the ln(high/low) method since I don't have high and low data. Will substitute with ES
esv <- ses(r2_var, alpha = .8)
esvf <- esv$fitted
plot(esvf, type="l", main="Volatility Measured as Exponential Smoothing of r^2", ylab="returns squared")

#Forecasting volatility using SES of R^2.
fcastr2ses <- forecast(esv, h=3)
plot(fcastr2ses, type="l", ylab="returns squared")

fcastr2ses

##          Point Forecast        Lo 80      Hi 80       Lo 95      Hi 95
## Dec 2017    0.002978495 -0.006487764 0.01244475 -0.01149890 0.01745589
## Jan 2018    0.002978495 -0.009144232 0.01510122 -0.01556162 0.02151861
## Feb 2018    0.002978495 -0.011315243 0.01727223 -0.01888189 0.02483888

m <- cbind(r2_var, esvf)
cor(m)

##            r2_var       esvf
## r2_var  1.0000000 -0.1920711
## esvf   -0.1920711  1.0000000

For R^2 auto ARIMA suggested an (0,0,0) flat model which was very similar to the SES model. Both have a mean very close to the historical volatility of .003.

Part 10 - Comparison of tecniques

#10. Use the accuracy statistics of the different forecasting techniques to decide which technique fits the data best.
# Source:  https://www.otexts.org/fpp/2/5

Method	ME	RMSE	MAE	MPE	MAPE	MASE
CAPM	9.24E-19	0.0443573	0.03493115	29.38372	149.9925	0.7755625
CAPM 3	-1.11E-19	0.03274015	0.02606064	124.1033	150.538	0.5786142
Naïve	0.001464876	0.08904283	0.07510783	-8.307972	431.1503	0.9903769
Ma3	0.001155313	0.05345471	0.04465934	-6.410823	235.1385	-0.6793237
Ma5	-0.001673655	0.05397095	0.04455854	-51.67693	260.391	-0.4580408
SES	0.002301682	0.07650081	0.06414185	-90.71479	425.2322	0.8457788
Holt	-0.008856668	0.08690751	0.07432256	-189.7633	611.288	0.9800223
Holt Winters	-0.002269473	0.04810524	0.03989932	-51.35075	349.7016	0.5261151
ARIMA(0,0,1)	-0.002185112	0.04883408	0.03952355	17.01204	189.5377	0.5211602
ARIMA(1,0,1) + 3f CAPM	2.663951e-05	0.03247008	0.02635677	-41.85882	192.17	0.3475422

By almost all measures of accuracy but using only RMSE and MAE, the 3 factor CAPM is the best model. After that is the CAPM model and it is a close 3rd for the Holt Winters and ARIMA(0,0,1).

Part 11

#11 Test whether your portfolio conforms to the efficient market hypothesis.
pfol_ret_lag = c(pfol_ret[-1],0)
emh <- data.frame(pfol_ret, pfol_ret_lag)
reg_emh <- lm(pfol_ret~pfol_ret_lag, data=emh)
summary(reg_emh)

## 
## Call:
## lm(formula = pfol_ret ~ pfol_ret_lag, data = emh)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.104588 -0.044272  0.000775  0.030846  0.158419 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   0.02787    0.00825   3.378  0.00152 **
## pfol_ret_lag -0.33157    0.14232  -2.330  0.02436 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05252 on 45 degrees of freedom
## Multiple R-squared:  0.1076, Adjusted R-squared:  0.08781 
## F-statistic: 5.428 on 1 and 45 DF,  p-value: 0.02436

According to the regression, since both coefficients are significant we cannot say that the portfolio conforms to the efficient market hypothesis (that $\alpha_0 = 0$ and $\gamma = 0$).

Part 12

#12. Find 1% and 3% monthly and daily VaR of your portfolio.
as.year <- function(x) as.numeric(floor(as.yearmon(x)))
yearsum <- as.ts(aggregate(as.zoo(pfol_ret), as.year, sum))
annual_pfol_ret = mean(yearsum)
annual_pfol_stdev = sd(yearsum)
monthly_pfol_ret = annual_pfol_ret/12
monthly_pfol_stdev = annual_pfol_stdev/sqrt(12)
daily_pfol_ret = annual_pfol_ret/250
daily_pfol_stdev = annual_pfol_stdev/sqrt(250)

sprintf("Daily 3 percent VaR is %#.4f", daily_pfol_ret-(2.17009*daily_pfol_stdev))

## [1] "Daily 3 percent VaR is -0.0264"

sprintf("Daily 1 percent VaR is %#.4f", daily_pfol_ret-(2.57583*daily_pfol_stdev))

## [1] "Daily 1 percent VaR is -0.0315"

sprintf("Monthly 3 percent VaR is %#.4f", monthly_pfol_ret-(2.17009*monthly_pfol_stdev))

## [1] "Monthly 3 percent VaR is -0.1045"

sprintf("Monthly 1 percent VaR is %#.4f", monthly_pfol_ret-(2.57583*monthly_pfol_stdev))

## [1] "Monthly 1 percent VaR is -0.1279"

Assuming a normal distribution the 3% confidence lies at 2.17009 standard deviations and at the 1% that figure is 2.57583. The historical VaR results are above. For example, a 1M USD portfolio has a daily 3% VaR of -$26,400 or more.

Part 13

#13. Estimate VAR of your portfolio and S&P 500 and graph the Impulse Response Functions.
require(vars)

## Loading required package: vars

## Loading required package: MASS

## Loading required package: strucchange

## Loading required package: sandwich

## Loading required package: urca

sp <- read.csv(file="^GSPC.csv", header=TRUE, sep=",")
sp <- diff(log(sp$Close))
sp <- ts(sp, frequency=12, start=2014)
data <- cbind(pfol_ret, sp)
#head(data,5)

#Model Selection Based on AIC, Hannan Quinn (HQ), Schwarz Criteria(SC, SBC) and 
#Final Prediction Error (FPE) 
lagselect <- VARselect(data, lag.max=8, type="const")$selection
lagselect

## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      2      1      1      2

#Run VAR with one lag.
var <- VAR(data, p=2, type="const")
summary(var)

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: pfol_ret, sp 
## Deterministic variables: const 
## Sample size: 45 
## Log Likelihood: 184.686 
## Roots of the characteristic polynomial:
## 0.4624 0.4624 0.3017 0.3017
## Call:
## VAR(y = data, p = 2, type = "const")
## 
## 
## Estimation results for equation pfol_ret: 
## ========================================= 
## pfol_ret = pfol_ret.l1 + sp.l1 + pfol_ret.l2 + sp.l2 + const 
## 
##               Estimate Std. Error t value Pr(>|t|)   
## pfol_ret.l1 -0.3893526  0.1921889  -2.026  0.04948 * 
## sp.l1       -0.0004329  0.3838760  -0.001  0.99911   
## pfol_ret.l2 -0.2138254  0.2034962  -1.051  0.29968   
## sp.l2        0.0976380  0.3784565   0.258  0.79774   
## const        0.0334989  0.0096660   3.466  0.00128 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.05412 on 40 degrees of freedom
## Multiple R-Squared: 0.136,   Adjusted R-squared: 0.0496 
## F-statistic: 1.574 on 4 and 40 DF,  p-value: 0.1999 
## 
## 
## Estimation results for equation sp: 
## =================================== 
## sp = pfol_ret.l1 + sp.l1 + pfol_ret.l2 + sp.l2 + const 
## 
##              Estimate Std. Error t value Pr(>|t|)    
## pfol_ret.l1 -0.161256   0.088124  -1.830  0.07473 .  
## sp.l1       -0.072310   0.176018  -0.411  0.68341    
## pfol_ret.l2 -0.225980   0.093309  -2.422  0.02007 *  
## sp.l2        0.012187   0.173533   0.070  0.94436    
## const        0.016025   0.004432   3.616  0.00083 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.02482 on 40 degrees of freedom
## Multiple R-Squared: 0.2252,  Adjusted R-squared: 0.1477 
## F-statistic: 2.906 on 4 and 40 DF,  p-value: 0.03348 
## 
## 
## 
## Covariance matrix of residuals:
##           pfol_ret        sp
## pfol_ret 0.0029293 0.0007888
## sp       0.0007888 0.0006159
## 
## Correlation matrix of residuals:
##          pfol_ret     sp
## pfol_ret   1.0000 0.5873
## sp         0.5873 1.0000

#Test for serial correlation of errors in VAR models: Ho: No Serial Correlation.
SerialCo_Test<- serial.test(var, lags.pt=10, type="PT.asymptotic")
SerialCo_Test

## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object var
## Chi-squared = 29.164, df = 32, p-value = 0.6108

#Forecasting with VAR
fcast <- forecast(var, h=8)
par(mfrow=c(2, 1))
plot(fcast)

#Inpulse Response Function of the effect one std. dev. shock to sp on pfol_ret and vice versa.
IRF <- irf(var, impulse.variable = "sp", response.variable = "pfol_ret",  boot=TRUE, t = NULL, nhor = 20, ci=.95, scenario = 2, draw.plot = TRUE, t=NULL)
IRF

## 
## Impulse response coefficients
## $pfol_ret
##            pfol_ret            sp
##  [1,]  5.412281e-02  1.457510e-02
##  [2,] -2.107916e-02 -9.781520e-03
##  [3,] -1.938288e-03 -7.946600e-03
##  [4,]  4.310331e-03  5.531430e-03
##  [5,] -2.042068e-03 -7.538718e-04
##  [6,]  4.138298e-04 -5.228288e-04
##  [7,]  2.021401e-04  4.233516e-04
##  [8,] -2.184223e-04 -1.630974e-04
##  [9,]  8.322638e-05  6.495148e-06
## [10,] -1.627476e-06  3.348096e-05
## [11,] -1.654257e-05 -2.088687e-05
## 
## $sp
##            pfol_ret            sp
##  [1,]  0.000000e+00  2.008587e-02
##  [2,] -8.694619e-06 -1.452401e-03
##  [3,]  1.965157e-03  3.512119e-04
##  [4,] -9.052415e-04 -3.580241e-04
##  [5,] -3.329592e-05 -2.679417e-04
##  [6,]  1.716867e-04  2.249468e-04
##  [7,] -8.598582e-05 -3.969246e-05
##  [8,]  1.874834e-05 -1.932045e-05
##  [9,]  7.219112e-06  1.732110e-05
## [10,] -8.713560e-06 -6.888807e-06
## [11,]  3.543196e-06  4.829563e-07
## 
## 
## Lower Band, CI= 0.95 
## $pfol_ret
##           pfol_ret            sp
##  [1,]  0.039370046  0.0069039500
##  [2,] -0.035811638 -0.0168279889
##  [3,] -0.014831056 -0.0141206993
##  [4,] -0.005601003  0.0005388016
##  [5,] -0.011915452 -0.0051256286
##  [6,] -0.007283673 -0.0040661392
##  [7,] -0.002965369 -0.0018842492
##  [8,] -0.004485881 -0.0023237949
##  [9,] -0.001943674 -0.0012822575
## [10,] -0.001164735 -0.0006991048
## [11,] -0.001316875 -0.0007371920
## 
## $sp
##            pfol_ret            sp
##  [1,]  0.0000000000  0.0134664603
##  [2,] -0.0111678515 -0.0061724030
##  [3,] -0.0103562119 -0.0048483118
##  [4,] -0.0104187652 -0.0055090130
##  [5,] -0.0056402220 -0.0029626303
##  [6,] -0.0014680753 -0.0015163051
##  [7,] -0.0025654903 -0.0015027681
##  [8,] -0.0009596458 -0.0009555726
##  [9,] -0.0012575934 -0.0005456725
## [10,] -0.0013492590 -0.0005903336
## [11,] -0.0003835253 -0.0002138732
## 
## 
## Upper Band, CI= 0.95 
## $pfol_ret
##           pfol_ret            sp
##  [1,]  0.063695959  0.0218555714
##  [2,] -0.002370527 -0.0025746493
##  [3,]  0.010177139 -0.0009949916
##  [4,]  0.016055568  0.0127240964
##  [5,]  0.004859165  0.0035553312
##  [6,]  0.006410765  0.0028683224
##  [7,]  0.005058274  0.0031767099
##  [8,]  0.002151769  0.0016439859
##  [9,]  0.002613043  0.0016042296
## [10,]  0.001561356  0.0013609793
## [11,]  0.001033544  0.0006570860
## 
## $sp
##           pfol_ret           sp
##  [1,] 0.0000000000 0.0232530589
##  [2,] 0.0127107013 0.0038609238
##  [3,] 0.0149795723 0.0071754257
##  [4,] 0.0058754278 0.0029736292
##  [5,] 0.0034929432 0.0023494219
##  [6,] 0.0039678222 0.0039069722
##  [7,] 0.0018872165 0.0011131238
##  [8,] 0.0020945649 0.0009416756
##  [9,] 0.0009842244 0.0006702083
## [10,] 0.0007971507 0.0003904431
## [11,] 0.0008019371 0.0006089788

plot(IRF)

I chose an estimated VAR model of order 2 because when I attemped an order 1 only the ‘const’ parameter was significant. The effect of a positive shock to the portfolio shows a small positive change to the S&P 500. A shock to the S&P 500 has a small change to the portfolio and this is most likely due to the hedging effect applied via the optimization of the weights.

Conclusion

The accuracy measures tell you the in-sample accuracy and using RMSE and MAE the 3 factor CAPM model was the the best. This can be apprecciated in the second graph below. After that were the CAPM and the Holt Winters and ARIMA(0,0,1) were in third place.

In terms of the three period forecasts, none of the methods did great. The best two were the 3 factor CAPM and the ARIMA(0,0,1) method with the ARIMA being the better one as you can see in the table below. It is worth noting that the ARIMA model has a wider confidence interval than the 3 factor CAPM model. Also the ARIMA model does not have ARCH/GARCH so in the period studied there was no volatility clustering.

Finally, the portfolio does not conform to the EMH so there seems to be some stuctural movement of the portfolio returns but not to a degree where a simple model such as CAPM or 3 factor CAPM comes close enough to forecasting it well.

Date	3 Factor CAPM Forecast	ARIMA Forecast	Actual Return
12/2017	0.0231	0.007256747	0.0022
1/2018	0.1071	0.014853294	0.1290
2/2018	-0.0399	0.022296242	0.0154

par(mfrow=c(2,1))
ts.plot(pfol_ret, reg_CAPM$fitted.values, gpars = list(col = c("black", "green")), main="Portfolio vs CAPM(green)")
ts.plot(pfol_ret, reg_3fCAPM$fitted.values, gpars = list(col = c("black", "blue")), main="Portfolio vs 3 Factor CAPM(blue)")

par(mfrow=c(2,1))
ts.plot(pfol_ret, fit_auto$fitted, gpars = list(col = c("black", "orange")), main="Portfolio vs ARIMA(orange)")
ts.plot(pfol_ret, hws1$fitted, gpars = list(col = c("black", "purple")), main="Portfolio vs Holt W.(purple)")