International Household Income Inequality data

At a glance:

I explore the University of Texas Inequality Project's Estimated Household Income Inequality data, which provides modelled estimates of inequality for more than 150 countries from 1963 to 2008.

30 Jun 2016

I’m at the New Zealand Association of Economists annual conference in Auckland. The opening keynote speech was from James K. Galbraith on a global view of inequality. He showed a variety of results from the University of Texas Inequality Project’s Estimated Household Income Inequality dataset, which I hadn’t realised existed before. It’s the result of a patient and painstaking effort to make the most internationally comparable estimate possible of household inequality, and involves modelling when needed to create predicted inequality based on the best indicators available.

The data are also online, with a whole bunch of supporting material. Well done Professor Galbraith and University of Texas! Here’s a taster view.

First, download the data, bring it into R, and tidy it up from its wide format into a more analysis-friendly tidy or normalised form:

library(dplyr)
library(tidyr)
devtools::install_github("hadley/ggplot2") # dev version needed for subtitle and caption
library(ggplot2) 
library(scales)
library(openxlsx)
library(ggseas)


download.file("http://utip.lbj.utexas.edu/data/EHII-UPDATED-10-30-2013.xlsx",
              destfile = "ehii.xlsx", mode = "wb")

ehii <- read.xlsx("ehii.xlsx")[ , -1] # don't need the first column

ehii_tidy <- ehii %>%
   gather(Year, Gini, -Country, -Code) %>%
   mutate(Year = as.numeric(Year))

Let’s take a first look

ggplot(ehii_tidy, aes(x = Year, y = Gini, colour = Country)) +
   geom_line() +
   theme(legend.position = "none")

plain

OK, lots of lovely data, not a terribly attractive plot. Not informative either, having chopped off the legend. We should be able to do better than that.

One thing of interest might be which countries have seen the biggest changes over time. Restricting ourselves to just countries with data in 1963 (to make comparison valid), let’s have a go:

gini-growth

Here’s the code that constructed that plot:

full_countries <- ehii_tidy %>%
   filter(Year = min(Year) & !is.na(Gini))

final_result <- ehii_tidy %>%
   filter(Country %in% full_countries$Country & !is.na(Gini)) %>%
   group_by(Country) %>%
   mutate(Gini_index = Gini / Gini[1] * 100) %>%
   filter(Year == max(Year)) %>%
   mutate(Year = Year + 1,
          label = paste(Code, round(Gini)))

ehii_tidy %>%
   filter(Country %in% full_countries$Country) %>%
   ggplot(aes(x = Year, y = Gini, colour = Country)) +
   stat_index(index.ref = 1, alpha = 0.3) +
   theme(legend.position = "none") +
   geom_text(data = final_result, aes(label = label, y = Gini_index)) +
   ggtitle("Relative changes in inequality since 1963",
           subtitle = "(for countries with data from 1963)") +
   labs(y = "Index of Gini coefficient, set to be 100 in 1963",
        x = "", caption = "UTIP Estimated Household Income Inequality dataset") +
   xlim(1960, 2012) +
   annotate("text", x = 1977, y = 150, 
            label = "Country codes appear at their final data point; 
numbers are the latest available Gini coefficient")

This is obviously just the beginning. The countries have their ISO 3 character codes, which will make it easy to join them with other data for analysis. Maps are an obvious presentation step too, and Galbraith’s team look to make extensive use of the data for this purpose. Looking forward to a closer look when I’ve got more time.

← Previous post

Monthly Regional Tourism Estimates

Animated world inequality map

My day job is Director of the Statistics for Development Division at the Pacific Community, the principal scientific and technical organisation in the Pacific region, proudly supporting development since 1947. We are an international development organisation owned and governed by our 27 country and territory members. This blog is not part of my role there and contains my personal views only.

Follow this blog with RSS.

Find me on Bluesky or Mastodon.

I'm pleased to be aggregated at R-bloggers, the one-stop shop for blog posts featuring R.

I've never been an academic but have written a few small things and hence have a Google Scholar profile

I read a lot of books. In the unlikely event you want to see what I think about them or what I'm reading at the moment, follow me on Goodreads.

free range statistics by Peter Ellis is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Household Income Inequality data

At a glance:

Bluesky feed: