## The experiment

A colleague at work was working with a time series where one got quite different results depending on whether one seasonally adjusted it first, or treated the seasonality as part of a SARIMA (seasonal auto-regressive integrated moving average) model. I have some theories about why this might have happened which I won’t go into in this post. Instead, I’m going to report back on an experiment with the purpose of determining, empirically, if either of these strategies dominates the other or at least is better on average.

To do this I took all the quarterly and monthly time series from the M3 and the Tourism forecasting competition data collections, available in the Mcomp and Tcomp R packages. This is 2,977 data series in total. I developed forecasts with eight different methods for the training set of each of these time series - the combinations of:

- (S)ARIMA and exponential smoothing state space (using
`auto.arima`

and`ets`

from Hyndman’s`forecast`

package); - with and without a Box-Cox transformation;
- seasonally adjusting before modelling and incorporating seasonality into the model.

I then compared the predictions of the models against the official test sets and calculated the mean absolute scaled error (MASE) with the `accuracy`

function from `forecast`

. The forecasting horizons were 8 periods for the quarterly data, 18 periods for the M3 monthly data, and 24 periods for the tourism monthly data.

For the seasonal adjustment prior to modelling I made use of the `stlm`

function in the `forecast`

package for which “forecasts of STL objects are obtained by applying a non-seasonal forecasting method to the seasonally adjusted data and re-seasonalizing using the last year of the seasonal component.”

I was interested in the impact of a Box-Cox transformation on forecasting accuracy in its own right, but also particularly in its interaction with seasonal adjustment. I had a sort of working hypothesis that modelling untransformed data that isn’t second-order stationary is made harder by seasonality. My intuition there was that if (as is common with untransformed data) the variance of a time series increases as its mean increases, the seasonality may magnify this effect, giving even more inappropriate weight than otherwise to parts of the time series with a higher mean value. This would suggest that, with ARIMA models that rely on second-order stationarity, it would be helpful to do at least one of a Box-Cox transformation or seasonal adjustment before fitting the model. *spoiler - the results aren’t completely consistent with this; seasonal adjustment helps, but Box-Cox transformations don’t seem to*.

For the models that included an automated Box-Cox transformation, the value of lambda for the transformation was chosen by Guerrero’s method implemented in `forecast::BoxCox.lambda`

, but constrained to be between 0 and 1 (or 0.5 and 1.0 if the data included values of zero, which tended to result in very erratic forecasts when values of lambda were below 0.5).

## Results

Here’s an example of how the results look for just one of the collections of data; quarterly time series from the M3 data collection. Each point on the plot below is a single data series. The horizontal axes show the error from the forecasts that incorporate seasonality into the models; the vertical axes show the same for when the series were seasonally adjusted before the modelling. The diagonal line shows equality - where a point would be if the two methods gave equally accurate forecasts:

Several things stand out:

- The results are quite different. For any individual data series, you might get much worse/better results with one of the methods rather than the other. The differences can be quite substantial.
- On average there’s no really obvious trend. The cloud of points is actually a little to the right and below the diagonal line, showing that errors are slightly lower when seasonal adjustment is done first, but not to the degree it’s obvious to the eye.
- The decision to Box-Cox transform or not first also doesn’t have any large, obvious overall average impact. The two trend lines drawn through the cloud of points are practically identical. This doesn’t mean the decision doesn’t matter for an individual series’ forecast, just that on average neither strategy is clearly better.

Here are the overall results for all data collections in a graphic designed to show overall effects:

There’s not much to tell here! For the ARIMA models (but not so much ETS), the blue boxes are a tiny bit to the left of the orange boxes, showing that on average the strategy of prior seasonally adjusting the series is a little better than incorporating seasonality into the forecasting model.

To quantify these results, I estimated a mixed-model with

- Mean Absolute Scaled Error as the response term;
- modelling type, transformation type, and seasonal strategy as fixed explanatory effects of interest;
- data origin (M3 or Tourism) and frequency (quarterly or monthly) as fixed explanatory nuisance effects; and
- randomness at the level of data series, and of data series - strategy combination.

Here’s the summary of the fixed effects for the simplest model:

Dependent variable: | |

mase | |

modelETS | -0.013^{**} |

(0.005) | |

transformNone | -0.032^{***} |

(0.005) | |

seasadjSeasonally adjusted first | -0.027^{***} |

(0.005) | |

frequencyquarterly | 0.238^{***} |

(0.031) | |

collectionTourism | 0.563^{***} |

(0.035) | |

Constant | 0.940^{***} |

(0.021) | |

Observations | 23,816 |

Log Likelihood | -17,232.660 |

Akaike Inf. Crit. | 34,481.320 |

Bayesian Inf. Crit. | 34,545.950 |

Note: | ^{*}p<0.1; ^{**}p<0.05; ^{***}p<0.01 |

Positive coefficients mean that factor is associated with a *higher* mean absolute scaled error ie worse predictions. Negative values, like those associated with the `ets`

models, no Box-Cox transformation, and seasonal adjustment prior to modelling, mean better predictions. Tourism data has a higher average scaled error, as does quarterly data (compared to monthly).

This table of coefficient estimates was produced with `stargazer`

, which is awesome but unfortunately doesn’t include a summary of the random effects from a mixed effects model. There are workarounds for this but I couldn’t be bothered. In this case, the size of those random effects is just a nuisance anyway. For the record, the standard deviation at the level of “model - series” combination was 0.81 and of the lowest level individual residuals was 0.40.

What we see here is that there are small but noticeable impacts:

`ets`

performs slightly better (ie lower mean absolute scaled error, on average) than`auto.arima`

- Fitting the model on untransformed data performs slightly better than using an automated Box Cox transformation
- Seasonally adjusting first performs slightly better than incorporating the seasonality into the model

### Subtlety - `ets`

versus `auto.arima`

The conclusions above are fine for a general takehome, but it’s worth noting that a better performing model to explain the experimental forecasting performance results (on the basis of AIC) includes a set of interaction effects between model type and my experimental factors of seasonal adjustment and transformation strategies. I performed a single comparison of a full model with the expected interactions versus the simple model. The interactions I looked at were between the model type (`ets`

versus `auto.arima`

) and the other explanatory variables; and between Box-Cox transformation and seasonality strategy. This superior model is quite a bit harder to interpret:

Dependent variable: | |

mase | |

modelETS | -0.052^{***} |

(0.011) | |

transformNone | -0.061^{***} |

(0.010) | |

seasadjSeasonally adjusted first | -0.065^{***} |

(0.010) | |

frequencyquarterly | 0.212^{***} |

(0.032) | |

collectionTourism | 0.604^{***} |

(0.035) | |

transformNone:seasadjSeasonally adjusted first | 0.034^{**} |

(0.015) | |

modelETS:transformNone | 0.038^{***} |

(0.015) | |

modelETS:seasadjSeasonally adjusted first | 0.059^{***} |

(0.015) | |

modelETS:frequencyquarterly | 0.052^{***} |

(0.011) | |

modelETS:collectionTourism | -0.084^{***} |

(0.012) | |

modelETS:transformNone:seasadjSeasonally adjusted first | -0.031 |

(0.021) | |

Constant | 0.965^{***} |

(0.022) | |

Observations | 23,816 |

Log Likelihood | -17,208.300 |

Akaike Inf. Crit. | 34,444.600 |

Bayesian Inf. Crit. | 34,557.690 |

Note: | ^{*}p<0.1; ^{**}p<0.05; ^{***}p<0.01 |

It’s basically impossible to think through the various interactions in the above single combined model, so purely for the purpose of presenting results I refit the model separately to the `ets`

results and the `auto.arima`

results.

#### ets

Dependent variable: | |

mase | |

transformNone | -0.022^{***} |

(0.009) | |

seasadjSeasonally adjusted first | -0.006 |

(0.009) | |

frequencyquarterly | 0.264^{***} |

(0.033) | |

collectionTourism | 0.521^{***} |

(0.036) | |

transformNone:seasadjSeasonally adjusted first | 0.003 |

(0.012) | |

Constant | 0.913^{***} |

(0.022) | |

Observations | 11,908 |

Log Likelihood | -8,653.195 |

Akaike Inf. Crit. | 17,322.390 |

Bayesian Inf. Crit. | 17,381.470 |

Note: | ^{*}p<0.1; ^{**}p<0.05; ^{***}p<0.01 |

#### auto.arima

Dependent variable: | |

mase | |

transformNone | -0.061^{***} |

(0.010) | |

seasadjSeasonally adjusted first | -0.065^{***} |

(0.010) | |

frequencyquarterly | 0.212^{***} |

(0.032) | |

collectionTourism | 0.604^{***} |

(0.035) | |

transformNone:seasadjSeasonally adjusted first | 0.034^{**} |

(0.015) | |

Constant | 0.965^{***} |

(0.022) | |

Observations | 11,908 |

Log Likelihood | -10,361.570 |

Akaike Inf. Crit. | 20,739.150 |

Bayesian Inf. Crit. | 20,798.220 |

Note: | ^{*}p<0.1; ^{**}p<0.05; ^{***}p<0.01 |

And here’s a graphic representation that helps (a bit):

All my attempts to get a way to easily understand those implications couldn’t improve on a simple contingency table of the average MASE values (these are trimmed means of the MASE return for all 2,977 series), using the tables of coefficients above to guide inference:

| transform | | | seasadj | | | ARIMA | | | ETS | |
---|---|---|---|

BoxCox | Seasonal in model | 1.020 | 0.987 |

BoxCox | Seasonally adjusted first | 0.987 | 0.993 |

None | Seasonal in model | 0.993 | 0.976 |

None | Seasonally adjusted first | 0.976 | 0.992 |

## Conclusion

- For an ARIMA model, the best approach on average is to seasonally adjust first, and not use an automated Box-Cox transformation.
- For an ETS model, no evidence either way on whether to seasonally adjust first or to incorporate into the model; but again, it is better to not automatically use a Box-Cox transformation.

That it’s best not to automatically use a Box-Cox transformation slightly surprised me, particularly for SARIMA models which assume second order stationarity. Something to think about.

Limitations or caveats include:

- The accuracy checks were all done on a single point, 8 (for quarterly data) and 18 or 24 (for monthly) periods after the end of the training series. The results may not generalize, particularly for forecast horizons that aren’t a relatively simple multiple of the series’ frequency.
`stlm`

is not the state of the art in seasonal adjustment. A more thorough experiment would use X13-SEATS-ARIMA to do the seasonal adjustment. My reason for*not*doing this was convenience -`stlm`

in combination with`forecast`

makes it easy to backtransform the forecasts to the original seasonal version that I did the accuracy check on.- There may be a better way to automate the choice of Box-Cox transformation.

## Code

All the code for the above (and a bit more) in one chunk today: