free range statistics

I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.

Recent posts


World Economic Outlook

26 April 2025

I have a quick look at the latest World Economic Outlook released by the IMF, with a particular eye on the economic growth forecasts for Pacific island countries. The Pacific countries that have had the biggest revision downwards in their growth prospects over the six months since the last Outlook are the three in the Compact of Free Association with USA (Palau, Marshall Islands, and Federated States of Micronesia), plus Fiji.


Revisiting depression incidence by county and vote for Trump

03 January 2025

I expand on my last post, to see if the relationship between depression and voting for Trump at county-level persists when you control for the racial composition of counties (it doesn't).


Depression incidence by county and vote for Trump

23 December 2024

Multi-level modelling with spatial auto-correlation! I look at county level data on incidence of depression in 2020, and voting for Trump in the 2024 US Presidential election, and conclude that there's something there, but of course there are lots of potential explanations of what is behind the relationship.


Death rates by cause of death

14 December 2024

I explore death rates by cause of death with OECD data, for the USA and other countries. Causes of death that are relatively high in the USA include assaults, accidents, suicides; diseases of the nervous systems (including Alzheimer's); and diseases of the circulatory system (including heart attacks).


Simulating Ponzi schemes

30 November 2024

I write a function to simulate Ponzi schemes with various types of 'investor' growth, withdrawal rates, and extraction by the scammer / owner of the scheme.


Design effects for stratified sub-populations

16 November 2024

I look at the two different sorts of design effects that Stata will report for estimates from sub-populations of a complex survey, which vary depending on whether or not the hypothetical simple random sample we are comparing the complex survey to has the same sub-population sample sizes as the actual sample.


Regressions where the coefficients are a simplex.

06 November 2024

I compare some different ways of forcing the coefficients in a regression to form a simplex, all greater than zero and adding up to exactly one. Two methods - quadratic programming, and explicit modelling of the coefficients from a Dirichlet distribution - give essentially identical results that match the data generating process well.


Git, peer review, tests and toil

28 September 2024

I was honoured to give the third and final Ihaka Lecture in the 2024 series. My talk had the theme "Making R work in government".


Prime numbers as sums of three squares.

21 September 2024

I explore the number of ways to make a prime number as the sum of squares of three positive integers.


Stepwise selection of variables in regression is Evil.

14 September 2024

Stepwise variable selection is bad and dangerous, and you shouldn't do it. It increases false positives. It drops variables that should be in the model. It gives biased estimates for regression coefficients. The problems are worse for smaller samples; higher correlation between the X variables; and models with weaker explanatory power for the y (i.e. lower R-squared).