free range statistics

I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.

Recent posts


Essentially random isn't the same as actually random

09 August 2020

An observational study claiming to be an RCT might have something to say but there are far too many discretionary researcher choices taken to believe its findings. But I use this as a chance to play with statistical inference after estimating a regression via lasso.


Visualisation options to show growth in occupations in the Australian health industry

02 August 2020

Exploration of change in occupations in the Australian health industry, and economy more broadly, from 1986 to the present.


Estimating Covid-19 reproduction number with delays and right-truncation

18 July 2020

There is a fast growing body of knowledge and tools to help estimate effective reproduction number of an epidemic in real time; I have a go at applying the latest EpiNow2 R package to data for Covid-19 cases in Victoria, Australia.


Fixing scientific publishing and peer review

13 June 2020

Science isn't broken, but journals are. A joint solution is emerging for disparate problems of access, quality control and replicability in scientific publishing.


Forecasts for the 2020 New Zealand elections using R and Stan

06 June 2020

My forecasts for the 2020 New Zealand general election are out, and predict a comfortable win for Jacinda Ardern's Labour Party either alone or in coalition.


A health data firm making extraordinary claims about its data

30 May 2020

Surgisphere, a tiny startup that claims to be providing large real world data for scientific health studies, is probably fabricating data at scale.


Ordering bars within their clumps in a bar chart

23 May 2020

It turns out to be quite easy in R to reorder your bars within each clump, to produce a bad bar chart like the unfortunate example from Georgia doing the rounds.


Incidence of COVID-19 in Texas after adjusting for test positivity

17 May 2020

Even when you adjust for test-positivity rates, the number of new COVID-19 cases per day in Texas is going up, although not as rapidly as the unadjusted numbers imply.


Test positivity rates and actual incidence and growth of diseases

09 May 2020

I look at several different ways of accounting for the information given us by high positive testing rates for COVID-19 and look at the impact on estimates of effective reproduction number at a point in time.


Pragmatic prediction intervals from a quasi-likelihood GLM

18 April 2020

A pragmatic way of generating prediction intervals from a generalized linear model with a quasi-likelihood response, if you're prepared to make an additional assumption about the distribution of the response.