Big data and program / project evaluation

Peter Ellis

May 2016

Today’s talk

  • Why might you care
  • What is big data doing anyway?
  • Thoughts from Chicago
  • Implications for evaluation work
  • Implications for evaluator skills

Why might you care

  • A new type of evaluand
  • New insight as background knowledge
  • New data for old evaluands
  • A whole new skillset

What is big data

Some unsatisfactory definitions

  • “Too big for Excel”
  • “Too big for a single SQL Server instance”
  • “Data sets that are so large or complex that traditional data processing applications are inadequate.”
  • “Velocity, volume, variety”
  • “The convergence of enterprise and consumer IT”
  • “The new tools helping us find relevant data and analyze its implications.”
  • “A new attitude … that combining data from multiple sources could lead to better decisions.”

My definition (today)

Digital traces of everyday activity, collected and stored at much finer resolution and higher frequency than pre 2000

  • wearable devices like fitbits
  • individual financial transactions
  • mouse events
  • motion sensors
  • CCTV
  • GPS tracking in phones and cars
  • smart electricity meters

Different types

types

The data revolution

  • New statistical methods post 1970
  • Rise in computing power
  • Increasing scope of data capture
  • AI explosion post-2000

Some examples

googlemaps

Capacity planning

telco

customer

twitter

facebook

facebook

facebook

No longer “hype”?

Gartner hype cycle 2014

g2014

Gartner hype cycle 2015

g2015

Official statistics

stats

Thoughts from a Chicago roundtable

Opportunities

  • Build counterfactuals when not previously possible?
  • Whole new sources of data?
  • Use predictive capabilities to plan quasi-experiments
  • Provide clues to what needs further examination by other methods

Reservations expressed

  • Populations we work with often don’t leave a digital trail
  • Satellite imagery good for environmental work, but what is the social equivalent?
  • Do evaluators have the skills?
  • New analytical methods better at prediction than explaining causality…

Michael Bamberger’s ‘high applicability’ set (1)

  • Large, complex interventions
  • Conventional evaluation designs methodologically weak
  • Easy, physical measurement
  • Indicators have construct validity

Michael Bamberger’s ‘high applicability’ set (2)

  • Long duration
  • Continue to operate beyond proof of concept so can test against prediction
  • Large number of potential variables and no clear theoretical way through
  • No political concerns about data ownership, privacy etc

So what might it mean in New Zealand…

New evaluands

  • Government and even NGO programs will increasingly be based on data-intensive analytics (eg social investment approach)
  • Private sector is embracing this area fast, although New Zealand has a way to go
  • When the evaluand is a big-data intervention, you’ll need to construct a big data quasi experiment to be credible

New sources of insight outside the evaluand

  • Official statistics will increasingly be based on admin data including big data
  • Big data will change the background level of knowledge in many areas of social science
  • Other countries’ social and economic sectors will build much more profound understanding of what works

New methods

  • Interventions that don’t think of themselves as “big data” will increasingly generate it as a side product
  • We won’t have the luxury of insisting on exactly the right indicator, but will have to get (very) good at separating the signal from the noise with proxies
  • We have to get on top of the accidental opportunities arising from the routine storage of sensor, administrative, and other information

A whole new skillset

selected References