A broad family of fast and effective forecast methods
Exponential smoothing state space methods constitute a broad family of approaches to univariate time series forecasting that have been around for many decades and only in the twenty-first century placed into a systematic framework. The definitive book on the subject is Hyndman, Koehler, Ord and Snyder’s Forecasting with Exponential Smoothing: The State Space Approach. Well known models such as simple or single exponential smoothing, Holt’s linear trend method (ingeniously implemented here in Excel - these methods are popular in the business world where better tools are often not used) and the Holt-Winters seasonal method can be shown to be special cases of a single family.
A simple must-read in this space is the taxonomy of exponential smoothing methods in Hyndman and Athanasopolous’ Forecasting Principles and Practice. Their taxonomy is based on characterising each model against three dimensions: error, trend and seasonality (hence the function that implements these models is ets
in the forecast
package). Each of those three can be characteristised as “additive”, “multiplicative”, or “none”. From the ets helpfile:
model
: Usually a three-character string identifying method using the framework terminology of Hyndman et al. (2002) and Hyndman et al. (2008). The first letter denotes the error type (“A”, “M” or “Z”); the second letter denotes the trend type (“N”,”A”,”M” or “Z”); and the third letter denotes the season type (“N”,”A”,”M” or “Z”). In all cases, “N”=none, “A”=additive, “M”=multiplicative and “Z”=automatically selected. So, for example, “ANN” is simple exponential smoothing with additive errors, “MAM” is multiplicative Holt-Winters’ method with multiplicative errors, and so on.
By taking a fully general approach, ets
is able to make the most of all of the members of its family and automatically choose the most effective method for a given dataset.
Three relatives
I wanted to get my head around the performance with seasonal data of two close relatives of ets
: stlm
and thetaf
, Hyndman’s implementation of the successful Theta decomposition method of forecasting.
After the Theta method made a name for itself in the forecasting competition world as a complex new algorithm, Hyndman and Billah successfully showed that it could be seen as yet another special case of an exponential smoothing model with drift. Both the thetaf
and stlm
methods, in Hyndman’s forecast
package, work by first seasonally adjusting a series, then applying a type of exponential smoothing model to it. The original Theta method did not detect seasonality, but Hyndman’s more recent implementations of it do.
Because Hyndman’s forecast
package is open source, we can inspect the source code to be sure the functions are doing what we think they are. Here are the three functions I’m looking at:
ets
The most general and flexible version. Models error, trend and seasonal elements together.stlm
Seasonal adjustment viastl
(Cleveland-style loess); then a non-seasonalets
model (ie just error and trend terms); then re-seasonalise.thetaf
Seasonal adjustment viadecomp
(classical multiplicative seasonal decomposition);ets
with just an additive model (no trend or seasonality); estimate trend/drift by an adjusted form of linear regression of the seasonally adjusted variable against time.
One way of looking at this is that ets
uses exponential smoothing for all three of error, trend and seasonal; stlm
for just error and trend; and thetaf
for just error. stlm
and thetaf
bring in other methods for dealing with the seasonal and trend elements.
Here are those three methods in action against one of the example monthly data sets from the 2010 Tourism Forecasting Competition.
We see from the plot titles in this example that the ets
approach detects additive error and seasonal structures; whereas the stlm
approach first removes the seasonality, so what is left is simply additive error. None of the methods is good at picking up the subtle upwards trend and they all slightly underestimate the forward values of the series, albeit with good prediction interval coverage.
Here’s a second example, this time a quarterly series:
In this case, multiplicative error and seasonal components are detected in the original data.
A systematic comparison
To get a more systematic view on which of these variants of methods is most effective with seasonal data of the sort I mostly deal with, I ran an experiment. I used all three methods against each of the quarterly and monthly series in the 2010 tourism forecasting competition and the 1982 “M1” forecasting competition from Makridakis et al. This constitutes well over a thousand time series in total. I calculated the mean absolute scaled error (MASE) of the forecasts against the actual future data. Here’s the distribution of those values of MASE:
Easier to take in than all those individual values of MASE is a summary statistic. Here’s the trimmed mean:
The Theta method stands out as not as consistently accurate as either ets
or stlm
and there is little to tell the two apart. But here is the mean:
All three methods obviously generated a few poor values of MASE that drag their untrimmed mean scores up. Those pesky, hard to predict real datasets… Mostly but not always stlm
appears a bit better than ets
, which might suggest that it is a good idea to seasonally adjust the data before fitting an exponential smoothing model to it - but there isn’t much in it. thetaf
does seem to be systematically outperformed by its more general and flexible cousins with this seasonal data. We can’t tell from this whether it is the simple seasonal adjustment method used in thetaf
that is responsible or the Theta method itself; my suspicion is with the seasonal adjustment. Classical decomposition is a less flexible method than the rolling window smoothing approach used by stl
, which is able to pick up changes in seasonality over time.
Here’s the code that produces all the above examples and runs the tests against the competition data: