free range statistics

I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.

Recent posts


Design effects for stratified sub-populations

16 November 2024

I look at the two different sorts of design effects that Stata will report for estimates from sub-populations of a complex survey, which vary depending on whether or not the hypothetical simple random sample we are comparing the complex survey to has the same sub-population sample sizes as the actual sample.


Regressions where the coefficients are a simplex.

06 November 2024

I compare some different ways of forcing the coefficients in a regression to form a simplex, all greater than zero and adding up to exactly one. Two methods - quadratic programming, and explicit modelling of the coefficients from a Dirichlet distribution - give essentially identical results that match the data generating process well.


Git, peer review, tests and toil

28 September 2024

I was honoured to give the third and final Ihaka Lecture in the 2024 series. My talk had the theme "Making R work in government".


Prime numbers as sums of three squares.

21 September 2024

I explore the number of ways to make a prime number as the sum of squares of three positive integers.


Stepwise selection of variables in regression is Evil.

14 September 2024

Stepwise variable selection is bad and dangerous, and you shouldn't do it. It increases false positives. It drops variables that should be in the model. It gives biased estimates for regression coefficients. The problems are worse for smaller samples; higher correlation between the X variables; and models with weaker explanatory power for the y (i.e. lower R-squared).


Gender and sexuality in Australian surveys and census

08 September 2024

I familiarise myself with the Australian Bureau of Statistics' statistical Standard on sex and gender, and play around with some data from the Australian General Social Survey that has outputs reported by persons' sexual orientation.


Sampling without replacement with unequal probabilities

31 August 2024

I play around with sampling from finite populations with unequal probabilities, where the R sample() function turns out not to work the way I had expected it to.


Ratios of indexed line charts

30 August 2024

I draw some quick charts of Australian economic indicators and ponder the implications. A ratio of two indexes can be a useful part of exploratory analysis.


Polar-centred maps

24 August 2024

I draw maps of the largest settlements closest to the north pole and to the south pole, based on an idea by 'Brilliant Maps'.


Perturbing a non-symmetrical probability distribution

20 August 2024

Inspired by a Toot from Thomas Lumley, I explore a situation where adding random noise to a distribution changes the median but not the mean.