Statistical graphics for communicating

Peter Ellis

July 2023

Today’s content

  • Different purposes of graphics
  • What makes graphics excellence
  • Improving graphics

Purposes of graphics

Different parts of the workflow

Different purposes

…exploratory…

…analysis and diagnosis…

…presentation…

Try to take this in:

data(anscombe)
anscombe[ , c(1,5,2,6,3,7,4,8)]
   x1    y1 x2   y2 x3    y3 x4    y4
1  10  8.04 10 9.14 10  7.46  8  6.58
2   8  6.95  8 8.14  8  6.77  8  5.76
3  13  7.58 13 8.74 13 12.74  8  7.71
4   9  8.81  9 8.77  9  7.11  8  8.84
5  11  8.33 11 9.26 11  7.81  8  8.47
6  14  9.96 14 8.10 14  8.84  8  7.04
7   6  7.24  6 6.13  6  6.08  8  5.25
8   4  4.26  4 3.10  4  5.39 19 12.50
9  12 10.84 12 9.13 12  8.15  8  5.56
10  7  4.82  7 7.26  7  6.42  8  7.91
11  5  5.68  5 4.74  5  5.73  8  6.89

Compared to this:

Put the data in its place

Use during analysis

Or to present results

Compared to this:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.8776830 0.1386138 64.0461912 0.0000000
MeanBedrooms 0.0123833 0.0112909 1.0967507 0.2729015
PropPrivateDwellings 0.6500909 0.1107259 5.8711725 0.0000000
PropSeparateHouse -0.1482185 0.0250748 -5.9110564 0.0000000
PropMultiPersonHH -0.0818089 0.1046629 -0.7816417 0.4345311
PropNotOwnedHH 0.1702598 0.0327103 5.2050871 0.0000002
MedianRentHH 0.0002120 0.0000314 6.7432821 0.0000000
PropLandlordPublic -0.0140308 0.0175583 -0.7990966 0.4243430
PropNoMotorVehicle -0.2738078 0.0673276 -4.0668001 0.0000498
PropOld 0.4896176 0.0727685 6.7284284 0.0000000
PropAreChildren 0.2213041 0.0749041 2.9544985 0.0031737
PropSameResidence5YearsAgo -0.0527033 0.0288307 -1.8280276 0.0677159
PropOverseas5YearsAgo -0.5013942 0.0867807 -5.7777159 0.0000000
PropMaori -0.0740921 0.0284821 -2.6013581 0.0093640
PropPacific -0.2491472 0.0409452 -6.0848893 0.0000000
PropAsian -0.2962357 0.0306009 -9.6806355 0.0000000
PropNoReligion -0.1370151 0.0351671 -3.8961181 0.0001014
PropSmoker 0.0643973 0.0679729 0.9473969 0.3435676
PropSeparated -0.3178489 0.0750838 -4.2332539 0.0000242
PropDoctorate 1.9143327 0.2147473 8.9143524 0.0000000
PropPTStudent 0.3340670 0.1837581 1.8179716 0.0692397
PropUnemploymentBenefit -0.2105984 0.1723818 -1.2216977 0.2219868
PropStudentAllowance -1.9585982 0.1886299 -10.3832867 0.0000000
PropFullTimeEmployed 1.6739377 0.0556158 30.0982411 0.0000000
PropPartTimeEmployed 0.0950533 0.1073109 0.8857751 0.3758606
PropUnemployed 0.0582842 0.2076558 0.2806771 0.7789913
PropEmployer 0.8759442 0.0729325 12.0103356 0.0000000
PropSelfEmployedNoEmployees -0.3325500 0.0530972 -6.2630379 0.0000000
PropTrades -0.4917097 0.0787759 -6.2418757 0.0000000
PropLabourers -0.4598715 0.0569490 -8.0751436 0.0000000
PropAgForFish 0.0293579 0.0343194 0.8554305 0.3924302
PropPubAdmin 0.1195282 0.0470599 2.5399141 0.0111740
PropFinServices 0.9964298 0.1582853 6.2951509 0.0000000
PropProfServices 1.2353349 0.0878017 14.0695967 0.0000000
PropWorked40_49hours 0.2773584 0.0603275 4.5975426 0.0000046
PropPublicTransport -0.0048683 0.0573232 -0.0849264 0.9323296
PropWalkJogBike -0.0760221 0.0427188 -1.7795945 0.0753161
PropNoUnpaidActivities -0.8419494 0.0859113 -9.8002120 0.0000000

Illustrate concepts

Let user play with parameters

Simplify results

Grab attention

Graphic excellence

Principles

  • well-designed presentation of interesting data - substance, statistics, and design
  • complex ideas communicated with clarity, precision, and efficiency
  • greatest number of ideas in the shortest time with the least ink in the smallest space
  • telling the truth about the data

Adapted from Tufte

Some specifics

  • Comparative
  • Multivariate
  • Reveal interactions
  • Nearly all the ink is data ink

Examples

Change this…

…to this:

This is good

But this might be better

This is good

But this might be better

More detailed examples

Perception of quantity

From best to worst

  1. Position
  2. Length
  3. Area
  4. Volume
  5. Area and slope
  6. Colour and density

Typical stacked bars…

Orient for easy reading

Sequential colours

Diverging scale

Use position

Much better than

Cluttered

Minimal axis guides

Fade axis title

Remove borders

Remove boxes

Guidelines to back

Background to back

Consistent doc theme

Consistent font

Corporate colours

Direct labels

Much better than:

Remember those principles

  • Remove all unnecessary ink
  • Focus on the data
  • Well-designed

Another improvement example

Original

User-friendly labels

Horizontal text

Meaningful ordering

Better shape and geom

Labels on points

Title and annotation

Another dimension

Better than:

More lessons from that:

  • Use order / position on page
  • As multivariate and comparative as possible
  • Choose right geom to make comparison easy
  • Use colour to make comparison easy
  • Avoid strobing and similar unfortunate effects

Another improvement example

Difficult

Use cartesian coordinates

Use height

Flip for readability

Sequence

Maximise focus on data

Labels near the data

Use like a table

Better than

More principles

  • Don’t rely on angle and slope - use position instead
  • Minimise non-data ink
  • Subtle colours to focus on the data
  • Make matching data to labels easy

Statistical transformations

Not just this

But this

Or this

Last set of tips

  • Don’t be afraid of using a statistical transform to make the data meaningful
  • Discrete annotations, but which don’t take the foreground from the data
  • Lots of subtlety in colour (shades of grey) to allow focus on data

Final word

  • Comparative
  • Multivariate
  • Reveal interactions
  • Nearly all the ink is data ink
  • All attention to the data and to the story!

Source code available at: https://github.com/PacificCommunity/sdd-graphics-talk-2023-July/