Comparing two timeseries-generating blackboxes
This question on Cross-Validated got me interested. I gave a fairly inadequate answer and want to explore a few of the issues. Actually, I have a plan for an effective technique which is what I think the original post was asking for, but I need to check out a few things first.
The challenge, if I understand correctly, is this:
We have one model we are confident generates a set of data that accurately represents what will actually happen in the epidemic. For some reason (presumably because it’s expensive or slow to build and run, or there’s theoretical interest in a simple equation that generates similar results to the complex model), we have an additional model which can also generate a set of data, and we want to check if the version generated by this model is close to the version generated by the known-good model.
In effect, we have two blackbox time series generators. The question of interest is
Does blackbox 2 generate time series that are “similar” to those generated by blackbox 1?
My substantive approach to answering that question will come in a later post. First I want to look at an issue raised in a comment against my answer on the Cross-Validated site. In my answer, I said that it was a bad idea to compare the best ARIMA models of datasets generated by the two black boxes because there was no easy way to go from the parameterisation of those models to an estimate of similarity. Models that look different in their parameterisation can have very similar material impacts (and, probably, vice versa). The comment from user333700 suggested that comparing impulse response functions, spectral densities or autocorrelation functions might be a way around this. Today’s post is just about demonstrating that even this isn’t necessarily a get out of jail card, at least as far as autocorrelation functions are concerned.
There’s nothing really new in here, just me clearing the decks as part of my thinking through how I’m going to answer the main question. Obviously, I’m getting beyond the scope of an answer on Cross-Validated, and I had to record my musings somewhere…
Linear combinations of a stationary timeseries give identical autocorrelation functions but materially different original data
The heading above says it all - it follows from the definition of the autocorrelation function - and here’s the demonstration. I create a stationary ARMA(2, 1) time series with 1,000 observations. series1 is just the original process; and series2 is the original process multiplied by 3 and adding 2. This makes a big difference to the substance of the data - much larger observations - but gives the same autocorrelation function.
Different individual instances of the same non-stationary time series have similar ACF but vary materially in their original data
Again, the heading says it all. In this case, I generate two separate instances from an identical ARIMA(2, 2, 1) data generating process. One grows steadily up, the other grows steadily down. If they were real life data and you had only those two series, you would say they were radically different. But the autocorrelation functions of the original two series are similar:
and so are the autocorrelation functions of the two series after we take the differences of the differences:
user333700 would probably say, rightly, that this is a demonstration of how comparing autocorrelation functions of two series is in fact an effective way of identifying whether they plausibly come from the same data generating process. In this instance, that is exactly what has happened. However it begs the question of whether plausibly coming from the same DGP is really what the original post wanted. With such materially different series - one showing strong positive growth, the other showing strong negative growth, does it matter that both can be modelled with a similar ARIMA model? More important is something like “does blackbox1 always show strong positive growth and blackbox2 always strong negative growth?”
This second example of how similar autocorrelation functions come from different time series relates to something I’ll take up when I answer the substantive original question. A big part of the challenge is that each time the blackbox is run it is a random process and a different set of data will come out. It might be radically different, so we need more than one example from each blackbox (the more the better) to form a view on whether they are similar or not. To answer this properly, we need more data from the blackboxes, and not to confuse them with our fitted models. More on that later.