Background and motivation
In an earlier post I explored ways that might improve on standard methods for prediction intervals from univariate time series forecasting. One of the tools I used was a convenience function to combine forecasts from Rob Hyndman’s ets
and auto.arima
functions. David Shaub (with a small contribution from myself) has now built and published an R package forecastHybrid
that expands on this idea to create ensembles from other forecasting methods from Hyndman’s forecast
package.
The motivation is to make it easy to improve forecasts, both their point estimates and their prediction intervals. It has been well known for many years that taking the average of rival forecast methods improves the performance of forecasts. This new R package aims to make it as easy for people to do this as to fit the individual models in the first place.
Installation
The stable version of the forecastHybrid
package is on CRAN, and is installed the usual way:
It requires at least version 7.1 of Rob Hyndman’s forecast
package, which is a recent upgrade.
Usage
Basic
Usage is a two step process:
- fitting the required time series models
- forecasting them and taking a weighted average
Prediction intervals are based on the conservative (and accurate, at least for the auto.arima / ets combination and the M3 competition data) method I set out in my earlier post on hybrid methods. That is, at each time period of the forecast, the points of the prediction intervals of all the component models in the ensemble that are highest in absolute magnitude are used for the boundaries of the ensemble prediction interval. This method seems to give truer prediction intervals than any individual model’s prediction intervals, which are usually too narrow (ie give a false sense of precision) because they don’t take model uncertainty into account. This is an active area of investigation; we’re not sure we’re going to keep them calculated this way.
The object created by the above procedure is of class forecast
and the base graphics plotting method from Hyndman’s forecast
package applies:
Custom combinations and weights
Controlling which of the five types of available models are used in the ensemble is done via the models
argument of hybridModel
. models
is a character string of any combination of a
, e
, n
, s
, and t
for auto.arima
, ets
, nnetar
, stlm
and tbats
respectively. In the code below I combine just auto.arima
and ets
(exponential state space smoothing) models, two component models that make up a high-performing (on average) forecasting method.
I also use this example to show how, instead of weighting the models equally, we can specify greater weight to be given to the model that fits the historical data better. This procedure isn’t recommended - it seems better just to give the models a priori weights, usually equal, rather than let the data dictate them. Why this is the case is out of my scope just here. Here’s the example:
External regressors
While most of the candidate models for an ensemble are univariate methods, auto.arima
and nnetar
models can incorporate an xreg
argument. If you have actual values for the forecast period of external regressors, it’s often useful to use them in the forecasting process. The forecastHybrid approach lets you do this with the component models that support xreg
, while ignoring it and fitting univariate time series models with other component models (eg ets
).
To pass xreg
or other parameters through to the model-fitting functions, the user passes up to five lists of parameters (a.arg
, e.arg
, n.arg
, s.arg
and t.arg
). Here’s an example showing how to pass through xreg parameters to auto.arima
for automated ARIMA modelling and nnetar
for the feed-forward neural network model. This example tries to forecast 12 months of unemployment in Wisconsin, given known values of unemployment in surrounding states and for the USA as a whole (this isn’t a particularly realistic example in itself, but is of a type of forecast that does occur in reality, for example when one economic time series is only available after a much longer delay than other more timely measures).
The data management and forecast question below has been pinched from an example on the BIBA blog by joaquin
There’s a lot more functionality in this package, mostly just inherited from Hyndman’s forecast
package. So check it out on CRAN.
Future work
Future work on the package, if / when we get around to it, is likely to include:
- Allowing the weights between the models to be set based on cross-validation performance of the component models
- Allowing weights between the models to change over different forecast horizons (eg some models are known to be generally better at predicting long term than short term, so could be given extra weight as the forecast horizon increase)
ggplot2
graphics integration- More models
- Improved parallelization (it already works with parallel processing, but this could be improved between models)
- Automating model selection
- Various under the hood things
I also hope to do some more work with the package, eg more systematic tests of the performance of these hybrid model forecasts both as point forecasters and prediction intervals.
Bugs, issues and enhancement requests can be filed at GitHub.
Nearly all the credit for this package goes to David Shaub; although it’s hosted on my GitHub account, my contribution has been small. So thanks David for a great convenient set of functionality for forecasters using R.