I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.
A few months back, I set up a server on Amazon Web Services with a data sciencey toolkit on it. Amongst other things, this means I can collect data around the clock when necessary, as well as host my little RRobot twitter bot, without having a physical machine humming in my living room. There are lots of fiddly things to sort out to make such a setup actually fit for purpose.
I explore the relationship between household income and expenditure on gasoline and motor oil in the USA Bureau of Labor Statistics' Consumer Expenditure Survey.
Simulating a population with changing total fertility rate, life expectancy, infant mortality, and other parameters
Minor updates available on CRAN for the ggseas (seasonal adjustment on the fly) and Tcomp (tourism forecasting competition data) R packages
Life expectancy is calculated directly from death rates. And mathematically speaking, changes in infant mortality have a much greater impact on life expectancy than do changes in death rates in any other year.
I had a brief look around New Zealand government agency websites and found 15 high quality web apps written in the Shiny platform.
Books, online courses and tools on surveys I've recently visited and liked.
I show a workaround to make it (relatively) easy to work with weighted survey data in Power BI, and ruminate on how this compares to other approaches of working with weighted data.
A negative binomial model isn't adequate for modelling the number of people killed per firearm incident in the USA; the real data has more events of one death, and also more extreme values, than the model. But estimating the model was an interesting exercise in fitting a single negative binomial model to two truncated subsets of data.
Two ways of fitting a model to truncated data.