## Background and motivation

In an earlier post I explored ways that might improve on standard methods for prediction intervals from univariate time series forecasting. One of the tools I used was a convenience function to combine forecasts from Rob Hyndman’s `ets`

and `auto.arima`

functions. David Shaub (with a small contribution from myself) has now built and published an R package `forecastHybrid`

that expands on this idea to create ensembles from other forecasting methods from Hyndman’s `forecast`

package.

The motivation is to make it easy to improve forecasts, both their point estimates and their prediction intervals. It has been well known for many years that taking the average of rival forecast methods improves the performance of forecasts. This new R package aims to make it as easy for people to do this as to fit the individual models in the first place.

## Installation

The stable version of the `forecastHybrid`

package is on CRAN, and is installed the usual way:

It requires at least version 7.1 of Rob Hyndman’s `forecast`

package, which is a recent upgrade.

## Usage

### Basic

Usage is a two step process:

- fitting the required time series models
- forecasting them and taking a weighted average

Prediction intervals are based on the conservative (and accurate, at least for the auto.arima / ets combination and the M3 competition data) method I set out in my earlier post on hybrid methods. That is, at each time period of the forecast, the points of the prediction intervals of all the component models in the ensemble that are highest in absolute magnitude are used for the boundaries of the ensemble prediction interval. This method seems to give truer prediction intervals than any individual model’s prediction intervals, which are usually too narrow (ie give a false sense of precision) because they don’t take model uncertainty into account. This is an active area of investigation; we’re not sure we’re going to keep them calculated this way.

The object created by the above procedure is of class `forecast`

and the base graphics plotting method from Hyndman’s `forecast`

package applies:

### Custom combinations and weights

Controlling which of the five types of available models are used in the ensemble is done via the `models`

argument of `hybridModel`

. `models`

is a character string of any combination of `a`

, `e`

, `n`

, `s`

, and `t`

for `auto.arima`

, `ets`

, `nnetar`

, `stlm`

and `tbats`

respectively. In the code below I combine just `auto.arima`

and `ets`

(exponential state space smoothing) models, two component models that make up a high-performing (on average) forecasting method.

I also use this example to show how, instead of weighting the models equally, we can specify greater weight to be given to the model that fits the historical data better. This procedure isn’t recommended - it seems better just to give the models a priori weights, usually equal, rather than let the data dictate them. Why this is the case is out of my scope just here. Here’s the example:

### External regressors

While most of the candidate models for an ensemble are univariate methods, `auto.arima`

and `nnetar`

models can incorporate an `xreg`

argument. If you have actual values for the forecast period of external regressors, it’s often useful to use them in the forecasting process. The forecastHybrid approach lets you do this with the component models that support `xreg`

, while ignoring it and fitting univariate time series models with other component models (eg `ets`

).

To pass `xreg`

or other parameters through to the model-fitting functions, the user passes up to five lists of parameters (`a.arg`

, `e.arg`

, `n.arg`

, `s.arg`

and `t.arg`

). Here’s an example showing how to pass through xreg parameters to `auto.arima`

for automated ARIMA modelling and `nnetar`

for the feed-forward neural network model. This example tries to forecast 12 months of unemployment in Wisconsin, given known values of unemployment in surrounding states and for the USA as a whole (this isn’t a particularly realistic example in itself, but is of a type of forecast that does occur in reality, for example when one economic time series is only available after a much longer delay than other more timely measures).

The data management and forecast question below has been pinched from an example on the BIBA blog by joaquin

There’s a lot more functionality in this package, mostly just inherited from Hyndman’s `forecast`

package. So check it out on CRAN.

## Future work

Future work on the package, if / when we get around to it, is likely to include:

- Allowing the weights between the models to be set based on cross-validation performance of the component models
- Allowing weights between the models to change over different forecast horizons (eg some models are known to be generally better at predicting long term than short term, so could be given extra weight as the forecast horizon increase)
`ggplot2`

graphics integration- More models
- Improved parallelization (it already works with parallel processing, but this could be improved between models)
- Automating model selection
- Various under the hood things

I also hope to do some more work *with* the package, eg more systematic tests of the performance of these hybrid model forecasts both as point forecasters and prediction intervals.

Bugs, issues and enhancement requests can be filed at GitHub.

Nearly all the credit for this package goes to David Shaub; although it’s hosted on my GitHub account, my contribution has been small. So thanks David for a great convenient set of functionality for forecasters using R.