Sankey charts based on individual level survey data are a good way of showing change from election to election. I demonstrate this, via some complications with survey-reweighting and missing data, with the New Zealand Election Study for the 2014 and 2011 elections.
21 May 2017
Continuing my examination of the individual level voting behaviour from the New Zealand Election Study, I wanted to look at the way individuals swap between parties, and between “did not vote” and a party, from one election to another. How much and how this happens is obviously an important question for both political scientists and for politicians.
Vote transition visualisations
I chose a Sankey chart as a way of showing the transition matrix from self-reported party vote in the 2011 election to the 2014 election. Here’s a static version:
And here is the more screen-friendly interactive version, with mouseover tooltips to give actual estimates:
The point with these graphics is to highlight the transitions. For example, what were the implications of turnout being higher in 2014 than 2011 (77.9% of enrolled voters in 2014 compared to 74.2% in 2011)? Judging from this survey data, the National Party gained 6.6% of the enrolled population in 2014 by converting them from a 2011 “did not vote” and lost only 3.6% in the other direction. This net gain of three percentage points was enough to win the election for the National-led coalition. In contrast, the Labour party had a net gain from “did not vote” in 2011 of only 0.2 percentage points. Remember though that these are survey-based estimates, and subject to statistical error.
I find setting up and polishing Sankey charts - controlling colours for example - a bit of a pain, so the code at the bottom of this post on how this was done might be of interest.
Weighting, missing data, population and age complications
Those visualisations have a few hidden fishhooks, which careful readers would find if they compare the percentages in the tooltips of the interactive version with percentages reported by the New Zealand Electoral Commission.
The 2014 percentages are proportions of the enrolled population. As the 2014 turnout of enrolled voters was 77.9%, the numbers here are noticeably less than the usually cited percentages which were used to translate into seat counts (for example, National Party reported party vote of 47.0% of votes becomes 36.6% of enrolled voters)
The 2011 percentages are even harder to explain, because I’ve chosen not only to scale the party vote and “did not vote” to the 2011 enrolled population as reported by the Commission, but also to add in around 5% of the 2014 population that were too young to vote in 2011.
Two things I would have liked to have taken into account but wasn’t able to were:
The “leakage” from the 2011 election of people who were deceased or had left the country by 2014
Explicit recognition of people who voted in 2014 but not in 2011 because they were out of the country. There is a variable in the survey that picks up the year the respondent came to live in New Zealand if not born here, but for only 10 respondents was this 2012 or later (in contrast to age - there were 58 respondents aged 20 or less in 2014).
I re-weighted the survey so the 2014 and 2011 reported party votes matched the known totals (with the addition of people aged 15 to 17 in 2011). One doesn’t normally re-weight a survey based on answers provided by the respondents, but in this case I think it makes perfect sense to calibrate to the public totals. The biggest impact is that for both years, but particularly 2011, relying on the respondents’ self-report and the published weighting of the NZES, totals for “did not vote” are materially understated, compared to results when calibrated to the known totals.
When party vote in 2011 had been forgotten or was an NA, and this wasn’t explained by being too young in 2011, I used multiple imputation based on a subset of relevant variables to give five instances of probable party vote to each such respondent.
Taken together, all this gives the visualisations a perspective based in 2014. It is better to think of it as “where did the 2014 voters come from” than “where did the 2011 voters go”. This is fairly natural when we consider it is the 2014 New Zealand Election Study, but is worth keeping in mind in interpretation.
Age (and hence the impact new young voters coming in, and of older voters passing on) is important in voting behaviour, as even the most casual observation of politics shows. In New Zealand, the age distribution of party voters in 2014 is seen in the chart below:
Non-voters, Green voters and to a certain extent Labour voters are young; New Zealand First voters are older. If this interests you though, I suggest you look at the multivariate analysis in this blog post or, probably more fun, this fancy interactive web app which lets you play with the predicted probabilities of voting based on a combination of demographic and socio-economic status variables.
Here’s the R code that did that imputation, weighting, and the graphics:
My day job is Chief Data Scientist at Nous Group, an international management consultancy with over 400 people working across Australia, the UK and Canada. Contact me if you are interested working with us on a grand challenge or broad agenda.
I'm pleased to be aggregated at R-bloggers, the one-stop shop for blog posts featuring R.