free range statistics

I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.

Recent posts


A health data firm making extraordinary claims about its data

30 May 2020

Surgisphere, a tiny startup that claims to be providing large real world data for scientific health studies, is probably fabricating data at scale.


Ordering bars within their clumps in a bar chart

23 May 2020

It turns out to be quite easy in R to reorder your bars within each clump, to produce a bad bar chart like the unfortunate example from Georgia doing the rounds.


Incidence of COVID-19 in Texas after adjusting for test positivity

17 May 2020

Even when you adjust for test-positivity rates, the number of new COVID-19 cases per day in Texas is going up, although not as rapidly as the unadjusted numbers imply.


Test positivity rates and actual incidence and growth of diseases

09 May 2020

I look at several different ways of accounting for the information given us by high positive testing rates for COVID-19 and look at the impact on estimates of effective reproduction number at a point in time.


Pragmatic prediction intervals from a quasi-likelihood GLM

18 April 2020

A pragmatic way of generating prediction intervals from a generalized linear model with a quasi-likelihood response, if you're prepared to make an additional assumption about the distribution of the response.


How to make that crazy Fox News y axis chart with ggplot2 and scales

06 April 2020

I demonstrate the power of the transformation functionality in the scales R package by re-creating an eccentric Fox News chart.


Impact of a country's age breakdown on COVID-19 case fatality rate

21 March 2020

I have a go at quantifying how important different demographic profiles will be for country average case fatality rates for COVID-19.


COVID-19 cumulative observed case fatality rate over time

17 March 2020

I have a quick look at how the observed case fatality rate of COVID-19 has evolved over time so far.


New Zealand Election Study webtool

07 March 2020

I release an improved and updated version of my crosstab webtool for exploring the New Zealand Election Study data, now covering 2017 as well as 2014, and letting the user explore relationship between party vote and a range of attitudes, experiences and demographics.


Log transform or log link? And confounding variables.

01 March 2020

I check the robustness of last week's analysis of height -> weight by trying a different method of specifying and fitting the model, and checking to see if socioeconomic status is acting as a confounder (because better-off people are both taller and healthier).