## Pooling the polls with Bayesian statistics

In an important 2005 article in the Australian Journal of Political Science, Simon Jackman set out a statistically-based approach to pooling polls in an election campaign. He describes the sensible intuitive approach of modelling a latent, unobserved voting intention (unobserved except on the day of the actual election) and treats each poll as a random observation based on that latent state space. Uncertainty associated with each measurement comes from sample size and bias coming from the average effect of the firm conducting the poll, as well as of course uncertainty about the state of the unobserved voting intention. This approach allows house effects and the latent state space to be estimated simultaneously, quantifies the uncertainty associated with both, and in general gives a much more satisfying method of pooling polls than any kind of weighted average.

Jackman gives a worked example of the approach in his excellent book Bayesian Analysis for the Social Sciences, using voting intention for the Australian Labor Party (ALP) in the 2007 Australian federal election for data. He provides `JAGS`

code for fitting the model, but notes that with over 1,000 parameters to estimate (most of those parameters are the estimated voting intention for each day between the 2004 and 2007 elections) it is painfully slow to fit in general purpose MCMC-based Bayesian tools such as `WinBUGS`

or `JAGS`

- several days of CPU time on a fast computer in 2009. Jackman estimated his model with Gibbs sampling implemented directly in R.

Down the track, I want to implement Jackman’s method of polling aggregation myself, to estimate latent voting intention for New Zealand to provide an alternative method for my election forecasts. I set myself the familiarisation task of reproducing his results for the Australian 2007 election. New Zealand’s elections are a little complex to model because of the multiple parties in the proportional representation system, so I wanted to use a general Bayesian tool for the purpose to simplify my model specification when I came to it. I use Stan because its Hamiltonian Monte Carlo method of exploring the parameter space works well when there are many parameters - as in this case, with well over 1,000 parameters to estimate.

Stan describes itself as “a state-of-the-art platform for statistical modeling and high-performance statistical computation. Thousands of users rely on Stan for statistical modeling, data analysis, and prediction in the social, biological, and physical sciences, engineering, and business.” It lets the programmer specify a complex statistical model, and given a set of data will return a range of parameter estimates that were most likely to produce the observed data. Stan isn’t something you use as an end-to-end workbench - it’s assumed that data manipulation and presentation is done with another tool such as R, Matlab or Python. Stan focuses on doing one thing well - using Hamiltonian Monte Carlo to estimate complex statistical models, potentially with many thousands of hierarchical parameters, with arbitrarily set prior distributions.

*Caveat! - I’m fairly new to Stan and I’m pretty sure my Stan programs that follow aren’t best practice, even though I am confident they work. Use at your own risk!*

## Basic approach - estimated voting intention in the absence of polls

I approached the problem in stages, gradually making my model more realistic. First, I set myself the task of modelling latent first-preference support for the ALP in the absence of polling data. If all we had were the 2004 and 2007 election results, where might we have thought ALP support went between those two points? Here’s my results:

For this first analysis, I specified that support for the ALP had to be a random walk that changed by a normally distributed variable with standard deviation of 0.25 percentage points for each daily change. Why 0.25? Just because Jim Savage used it in his rough application of this approach to the US Presidential election in 2016. I’ll be relaxing this assumption later.

Here’s the R code that sets up the session, brings in the data from Jackman’s `pscl`

R package, and defines a graphics function that I’ll be using for each model I create.

Here’s the Stan program that specifies this super simple model of changing ALP support from 2004 to 2007:

And here’s the R code that calls that Stan program and draws the resulting summary graphic. Stan works by compiling a program in C++ that is based on the statistical model specified in the `*.stan`

file. Then the C++ program zooms around the high-dimensional parameter space, moving slower around the combinations of parameters that seem more likely given the data and the specified prior distributions. It can use multiple processors on your machine and works super fast given the complexity of what it’s doing.

## Adding in one polling firm

Next I wanted to add a single polling firm. I chose Nielsen’s 42 polls because Jackman found they had a fairly low bias, which removed one complication for me as I built up my familiarity with the approach. Here’s the result:

That model was specified in Stan as set out below. The Stan program is more complex now; I’ve had to specify how many polls I have (`y_n`

), the values for each poll (`y_values`

), and the days since the last election each poll was taken (`y_days`

). This way I only have to specify 42 measurement errors as part of the probability model - other implementations I’ve seen of this approach ask for an estimate of measurement error for each poll on each day, treating the days with no polls as missing values to be estimated. That obviously adds a huge computational load I wanted to avoid.

In this program, I haven’t yet added in the notion of a house effect for Nielsen. Each measurement Nielsen made is assumed to have been an unbiased one. Again, I’ll be relaxing this later. The state model is also the same as before ie standard deviation of the day to day innovations is still hard coded as 0.25 percentage points.

Here’s the R code to prepare the data and pass it to Stan. Interestingly, fitting this model is noticeably faster than the one with no polling data at all. My intuition for this is that now the state space is constrained to being reasonably close to some actually observed measurements, it’s an easier job for Stan to know where is good to explore.

## Including all five polling houses

Finally, the complete model replicating Jackman’s work:

As well as adding the other four sets of polls, I’ve introduced five house effects that need to be estimated (ie the bias for each polling firm/mode); and I’ve told Stan to estimate the standard deviation of the day-to-day innovations in the latent support for ALP rather than hard-coding it as 0.25. Jackman specified a uniform prior on `[0, 1]`

for that parameter, but I found this led to lots of estimation problems for Stan. The Stan developers give some great practical advice on this sort of issue and I adapted some of that to specify the prior distribution for the standard deviation of day to day innovation as `N(0.5, 0.5)`

, constrained to be positive.

Here’s the Stan program:

Building the fact there are 5 polling firms (or firm-mode combinations, as Morgan is in there twice) directly into the program must be bad practice, but seeing as there are different numbers of polls taken by each firm and on different days I couldn’t work out a better way to do it. Stan doesn’t support ragged arrays, or objects like R’s lists, or (I think) convenient subsetting of tables, which would be the three ways I’d normally try to do that in another language. So I settled for the approach above, even though it has some ugly bits of repetition.

Here’s the R code that sorts the data and passes it to Stan

## Estimates of polling house effects

Here’s the house effects estimated by me with Stan, compared to those in Jackman’s 2009 book:

Basically we got the same results - certainly close enough anyway. Jackman writes:

“The largest effect is for the face-to-face polls conducted by Morgan; the point estimate of the house effect is 2.7 percentage points, which is very large relative to the classical sampling error accompanhying these polls.”

Interestingly, Morgan’s phone polls did much better.

Here’s the code that did that comparison:

So there we go - state space modelling of voting intention, with variable house effects, in the Australian 2007 federal election.