Do we need another data blog?
At last count there were 573 individual blogs related to the statistical computing language R. There are a bunch more about data science in general, many hundreds for Python users, and a smattering for commercial software like SAS, Stata, and SPSS.
When it comes to content, there are (literally) uncountable numbers of blogs covering economic issues, including several high quality regular blogs on New Zealand economic issues. Thomas Lumley’s statschat is an excellent source of high quality commentary and critique of statistics in the public debate in New Zealand, and the New Zealand Herald’s Data Blog provides useful case studies and examples of data journalism focusing in on New Zealand.
However, I don’t see a technically oriented blog that covers the complete range of issues of interest to me - advances in statistical techniques; the links between reproducible computing analysis and high quality scientific research; visualisation of data for non-specialist audiences; and applying high quality, recently developed methods to new and old data to answer old and new questions, with a particular focus on New Zealand
This blog will be an experiment, and whether I continue will depend in equal amounts on whether anyone gets interested in reading it; and how much I enjoy writing it.
My aims are to use this blog:
- to explore and communicate some exciting technical issues relating to data management, analysis and presentation; and
- highlight a range of important, publicly available data and analysis that should be more widely appreciated.
This area is going through an extraordinary revolution at the moment. Advances in computing power have contributed, but so have new, more robust statistical techniques, new sources of data, improved collective learning through the open source movement and the internet in general. In my working hours I and my team are doing our own small bit to contribute to bringing these new techniques to New Zealand’s economic data, but two main issues motivate me to create this out-of-work blog:
- many questions and fascinating topics arise that are beside the point for immediate work priorities;
- we often find ourselves developing cool techniques, or conducting new analysis, that we’d like to polish up and share but there is no straightforward channel to do so.
Everything I write in this blog will use publicly available data, and I’ll certainly be steering clear of politically controversial topics.
That’s all for now. Next post will be on an actual analytical topic dealing with real data. As I build up a corpus of posts, I’ll also implement some key words and other organisational tools for the blog.