I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.
I use Elo ratings from 12 months or from 120 years of AFL results to predict the results in the next round. Ppredictions based on just the past 12 months do better than those using the full history.
I explore the data on two-party-preferred voting swings in Australian federal elections and tentatively introduce the ozfedelect R package.
I tidy up Australian polling data back to 2007 and produce a statistical model of two-party-preferred vote for the coming election.
I update the nzelect R package with the latest New Zealand polling data, and use a generalized additive model to look for a seasonal impact on support for the current government.
I load wave 6 of the World Values Survey into a database so it's possible to analyse more questions and countries at once, and find some interesting variations in what people agree with in different parts of the world.
Persian Monarchs described by P. G. Wodehouse in one of his funniest novels is an extremely simple fictional card game, but the gambling makes it a game of skill, and we can even construct plausible different strategies for winning. A good strategy involving card-counting beats a non-counting alternative by about 4% and random wagering by 36%.
It turns out that about 10 billion people were born in total in the twentieth century.
I have a play with counting how often particular digits turn up in numbers, starting with page numbers of a book based on a training exercise, and moving on to the so-called Benford's law or first-digit law.
A more systematic comparison of different ways of dealing with cells in a cross tab that have been supressed for confidentiality. For the particular model tested here, the best thing to do is the simple method of replacing all suppressed cells with 5; this works even better than using the original unsuppressed data which is very unstable when many cell counts are near zero.
I experiment with some different ways of handling counts in tables that have been suppressed for confidentiality, and come up in favour of multiple imputation. The mice R package helpfully lets you define your own imputation algorithm.