Mr Warburton also argues that the results from the Household Expenditure Survey are misleading because they include “people who can’t afford or otherwise don’t own cars”. His own analysis from the Ministry of Transport’s Household Travel Survey shows that households including two or more children, with Māori, with unemployed people, or in the poorer regions all pay more tax per kilometre.
I’m totally convinced by Mr Warburton on the argument that poorer people are paying more tax per kilometre, which makes a fuel tax a particularly regressive tax. Even paying the same rate per kilometre would be regressive because transport costs are a higher proportion of income for poorer people; so more tax per kilometre is really rubbing salt into the wound. I’m not convinced though by the suggestion that they will pay more of the new tax even in absolute terms; I’m inclined to trust the Household Economic Survey on this one.
The USA Consumer Expenditure Survey
One good thing about this debate was it motivated me to do something that’s been on my list for a while, which is to get my toes in the water with the USA Bureau of Labor Statistics’ impressive Consumer Expenditure Survey. Huge amounts of confidentialised microdata from this mammoth program are freely available for download - without even filling in any forms! This makes it suitable to use in a blog post in a way I couldn’t with the New Zealand or Australian Household Expenditure Surveys (both of which have Confidentialised Unit Record Files for researchers, but with restrictions that get in the way of quick usage in public like this).
Big caveat for what follows - literally I looked at this survey for the first time today, and it is very complex. Just for starters, the Consumer Expenditure Survey really comprises two surveys - an Interview Survey for “major and/or recurring items” and the Diary Survey for “minor or frequently purchased items”. It is very possible that I am using not-the-best variables. Feedback welcomed.
Densities of income and fuel spend
Let’s get started by downloading the data. Here’s a couple of graphs of what I think are the main variables of interest here. These draw on:
gasmocq “Gasoline and motor oil this quarter”
fincbtxm “Total amount of family income - imputed or collected” (in the past 12 months? although the data dictionary only implies this)
fam_size “Number of Members in CU” (ie in responding household)
bls_urbn “Is this CU located in an urban or a rural area”
Both these graphics are showing a quantity divided by household size to get a simple estimate of amount per person. A better approach would be to used equivalised figures, taking into account economies of scale for larger households and different cost structures for different age groups, but it probably would have taken me all morning just to work out a safe way of doing that so I’ve stuck with the simpler method.
Here’s code to do that download, prepare the data for later and draw graphs:
Relationship of income and fuel spend
Here are some different ways of looking at the relationship between household income and the amount spent on fuel. They all show a lot of variation between individual households, but significant evidence of a material positive relationship between the two.
First, here’s a graph that tries to combine the binned “income classification” with the ranking of the household on income(ie its quantile if you like). The categories aren’t as neat as might be expected, I’m pretty sure because of these variables representing different states of imputation:
The collection of people who don’t spend any money on petrol drags the regression line downwards, but it’s the slope that counts; definitely upwards. The higher ranked a household is on income, the more they spend per person on gasoline and motor oil.
BTW, note that the points in these plots are different sizes. The size is mapped to the calibrated survey weight indicating how many people in the US population each sample point is representing; this is a good starting point for trying to represented weighted data in a scatter plot.
I’m wary of using quantiles or rankings in this sort of analysis; I don’t see much or any gain over other transformations and new risks and interpretability problems are introduced. Perhaps more usefully, here are some straightforward scatterplots of income per person and vehicle fuel expenditure per person:
No doubt about that strong relationship; poorer households spend less on fuel (and nearly everything else, of course, although that’s not shown) than do richer households.
On the other hand, there’s equally no doubt that poorer households spend more on fuel as a proportion of their income:
Here’s the code for those four graphics:
Who is likely to spend nothing on petrol at all?
Finally, I was interested in who spends nothing on petrol at all. This relates to Mr Warburton’s argument that the New Zealand Household Economic Survey if flawed for these purposes because the average spend on petrol includes people who have been priced out of vehicles altogether. In fact, with the US data, there is a very strong negative relationship between household income and the probability of spending zero on gasoline and motor oil.:
However, as the previous scatterplots showed, removing either or both of the zero income and zero fuel spend cases from the US Consumer Expenditure survey doesn’t serve to remove the relationship between income and gasoline spend.
Finally, here’s the code for this last bit of analysis:
Just as I was finalising this post, new discussion started suggesting the American Time Use Survey as a good source of microdata to directly analyse the question of whether poorer households travel more or less. Looks a good topic to come back to at some point.
My day job is Chief Data Scientist at Nous Group, an international management consultancy with over 400 people working across Australia, the UK and Canada. Contact me if you are interested working with us on a grand challenge or broad agenda.
I'm pleased to be aggregated at R-bloggers, the one-stop shop for blog posts featuring R.