I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.
Countries with higher income or consumption inequality tend to have more homicides per population. But looking at the relationship within each country's data, there does not seem to be a consistent relationship between inequality and homicide rates (that is, when there is a relationship in some countries, on average there is no evidence of such a relationship). This is an interesting multilevel or mixed-effects modelling problem that could easily trip one up.
I draw a map of the Pacific showing the key locations associated with the disappearance of Amelia Earhart and Fred Noonan in 1937.
A reflection on ten years, 225 posts and 350,000 words of the Free Range Statistics blog. Blogging works for me because it meets my own needs in the first instance, particularly by motivating learning and giving me a structured platform to engage with things I find interesting.
Motivated by an excellent recent book, I explore Papua New Guinea's statistics for GDP, population, employment and vaccination rates.
A few specific notes on technical issues relating to a previous post. On drawing network graphs with different coloured edges; modelling strategy; different specifications of models; and accessing UN SDG and gender inequality data.
The time that men spend on domestic chores is positively related to total fertility rate. But only if you are looking at countries selected because they are rich. Overall, it's negatively related. And if you model it with both GDP per capita and gender inequality (generally, more country-level gender inequality means more children), the effect goes away altogether. At the country level, it's a statistical artefact. To look into this properly, you need individual and household-level data.
I set out to improve a Sankey plot that had been shared as an example of how bad they are, and hopefully show that some careful design decisions and polish can make these plot useful for purposes like seeing cohorts' progress (up, down, same) over time.
Some more comprehensive simulations of what happens to 'fragile' p values (those between 0.01 and 0.05), when the actual power differs from the minimum detectable difference that an 80% power calculation was depended upon to set the sample size.
What proportion of significant p values should be between 0.01 and 0.05? Turns out the answer is 'it depends'.
How to produce an animation of demographic patterns in Pacific island countries and territories from 1950 to 2050, in just a few lines of code.