Bootstrap aggregating (bagging), is a very useful averaging method to improve accuracy and avoids overfitting, in modeling the time series. It also helps stability so that we don’t have to do Box-Cox transformation to the data.
Modeling time series data is difficult because the data are autocorrelated. In this case, moving block bootstrap (MBB) should be preferred because MBB resamples the data inside overlapping blocks to imitate the autocorrelation in the data. If the length of a time series, n, and the block size l, the number of overlapping blocks are found as below:
What we mean by overlapping block is that observation 1 to l would be block 1, observation 2 to l+1 would be block 2, etc. We should use a block size for at least two years(l=24) for monthly data because we have to be certain whether there is any remaining seasonality in the block.
From these n-l+1 blocks, n/l blocks will be selected randomly and they will be gathered in order, to build the bootstrap observations. The time series values can be repetitive in different blocks.
This bootstrap process would be exercised to the remainder component after the time series decomposition. If there is seasonality it is used the stl function(trend, seasonal, remainder) otherwise the loess function(trend, remainder) is chosen for the decomposition. It should not be forgotten that the data has to be stationary in the first place.
Box-Cox transformation is made at the beginning but back-transformed at the end of the process; as we mentioned before, when we do average all the bootstrapped series, which is called bagging, we could handle the non-stability data problem and improve accuracy compared to the original series.
As we remembered from the previous two articles, we have tried to model gold prices per gram in Turkey. We have determined the ARIMA model the best for forecasting. This time, we will try to improve using the bagging mentioned above.
In order to that, we will create a function that makes bootstrapping simulations and builds the prediction intervals we want. We will adjust the simulation number, model, and confidence level as default. We will use the assign function to make the bagged data(simfc) as a global variable, so we will able to access it outside the function as well.
|
Because of the averaging part of the bagging, we don’t use the lambda parameter of Box-Cox transformation for the stability of the variance. We can see the forecasting results for 18 months in %95 confidence interval for training set below. We can also change the model type or confidence level if we want.
|
We will create the Arima model as same as we did before and compare it with a bagged version of it in a graph.
|
When we examine the above plot, we can see that the bagged Arima model is smoother and more accurate compared to the classic version; but it is seen that when the forecasting horizon increases, both models are failed to capture the uptrend.
In the below, we are comparing the accuracy of models in numeric. We can easily see the difference in the accuracy level that we saw in the plot. The reason NaN values of the simulated version is that there is no estimation of fitted values(one-step forecasts) in the training set.
|
Conclusion
When we examine the results we have found, it is seen that bootstrapping simulation with averaging (bagging) improves the accuracy significantly. Besides that, due to the simulation process, it can be very time-consuming.
The original article can be found here.
References
Credit: Data Science Central By: Selcuk Disci