I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.
I demonstrate the power of the transformation functionality in the scales R package by re-creating an eccentric Fox News chart.
I have a go at quantifying how important different demographic profiles will be for country average case fatality rates for COVID-19.
I have a quick look at how the observed case fatality rate of COVID-19 has evolved over time so far.
I release an improved and updated version of my crosstab webtool for exploring the New Zealand Election Study data, now covering 2017 as well as 2014, and letting the user explore relationship between party vote and a range of attitudes, experiences and demographics.
I check the robustness of last week's analysis of height -> weight by trying a different method of specifying and fitting the model, and checking to see if socioeconomic status is acting as a confounder (because better-off people are both taller and healthier).
I test the traditional BMI calculation against the actual distribution of height and weight in USA adults in 2018. I decide BMI is quite a good metric. I find that one prominent critique of the BMI gets the direction wrong for whom has their weight exaggerated by BMI.
I put to the test a method of running a tennis tournament suggested by Lewis Carroll. It performs ok in allocating prizes fairly, although it takes about twice as many matches as a standard modern single-elimination. When there is realistic randomness in results it doesn't perform as well as Carroll argued it would on the unrealistic basis of deterministic match outcomes.
I have a go at quantifying how much giving a special draw to the top 32 seeds in a tennis tournament impacts on who makes it to the finals and who wins, based on simulations of a hypothetical matchup of the 128 top women players in 1990.
SQL Server and R work fine together for analysing 200 GB of the New York City taxi data. There's a lot of effort needed to prepare for analysis even relatively-tidy data. Also, you can't analyse big data without aggregating and summarising it somehow.
I try to show that cost-benefit analysis is easy to perform in R, and that R lets you build in uncertainty in a much clearer way than is generally done; and to demystify the internal rate of return.