free range statistics

I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.

Recent posts

What the world agrees with

26 January 2019

I load wave 6 of the World Values Survey into a database so it's possible to analyse more questions and countries at once, and find some interesting variations in what people agree with in different parts of the world.

Simulating Persian Monarchs gameplay

23 December 2018

Persian Monarchs described by P. G. Wodehouse in one of his funniest novels is an extremely simple fictional card game, but the gambling makes it a game of skill, and we can even construct plausible different strategies for winning. A good strategy involving card-counting beats a non-counting alternative by about 4% and random wagering by 36%.

Number of births in the twentieth century

01 December 2018

It turns out that about 10 billion people were born in total in the twentieth century.

Counting digits

24 November 2018

I have a play with counting how often particular digits turn up in numbers, starting with page numbers of a book based on a training exercise, and moving on to the so-called Benford's law or first-digit law.

A more systematic look at suppressed data

18 November 2018

A more systematic comparison of different ways of dealing with cells in a cross tab that have been supressed for confidentiality. For the particular model tested here, the best thing to do is the simple method of replacing all suppressed cells with 5; this works even better than using the original unsuppressed data which is very unstable when many cell counts are near zero.

Suppressed data (left-censored counts)

06 November 2018

I experiment with some different ways of handling counts in tables that have been suppressed for confidentiality, and come up in favour of multiple imputation. The mice R package helpfully lets you define your own imputation algorithm.

Simulating simple dice games

27 October 2018

I play around with simulating some dice games.

Understanding the limitations of group-level inequality data

07 October 2018

Cross-sectional country-level data will show a relationship between income inequality and life expectancy even if inequality itself has no direct impact on life exectancy; so long as there is changing marginal impact of individual income on individual life space (as of course there is).

Sri Lanka visitor arrivals

26 September 2018

Sri Lanka has a rapidly growing tourism industry, two international tourism seasons, and seasonality patterns in arrivals that vary according to country of origin.

Rents in Melbourne

31 August 2018

Rents in Melbourne have on average grown fastest in suburbs that were the cheapest in 2000; at least for two and three bedroom flats and for two bedroom houses. Also, scatterplots are awesome.