Showing a distribution over time: how many summary stats?

I saw this nice graph today on Twitter, by Thomas Forth:

but the more I looked at it, the more I felt it was hard to understand the changes over time across the income distribution from the Gini coefficient and the median. People started asking online for other percentiles, so I thought I would smooth each of them from the source data and plot them side by side:

uk_income

Now, this has the advantage of showing exactly where in society the growth or contraction is, but it loses the engaging element of the wandering nation across economic space (cf Booze Space; where do we end up? washed up on the banks of the Walbrook?), which should not be sneezed at. Something engaging matters in dataviz.

Code (as you know, I’m a nuts ‘n’ bolts guy, so don’t go recommending ggplot2 to me):


library(foreign)
library(splines)
bluecol<-"#467db4"
redcol<-"#b44f46"
uk<-read.csv("uk_income.csv")[1:53,1:22]
uk$Year <- as.numeric(substr(uk$Year,1,4))
sm<-apply(uk,2,function(z){smooth.spline(x=uk$Year,y=z)$y})
png("uk_income.png")
par(yaxs="i")
plot(uk$Year[1:3],sm[1:3,4],type="l",
ylim=c(min(sm[,4:22]-1),max(sm[,4:22]+60)),
xlim=c(1960,2015),
col=bluecol,
main="Percentiles of UK income over time",
sub="(Colour indicates governing political party)",
ylab="2013 GBP",
xlab="Year")
lines(uk$Year[4:10],sm[4:10,4],col=redcol) # Wilson I
lines(uk$Year[11:14],sm[11:14,4],col=bluecol) # Heath
lines(uk$Year[15:19],sm[15:19,4],col=redcol) # Wilson II, Callaghan
lines(uk$Year[20:37],sm[20:37,4],col=bluecol) # Thatcher, Major
lines(uk$Year[38:50],sm[38:50,4],col=redcol) # Blair, Brown
lines(uk$Year[51:53],sm[51:53,4],col=bluecol) # cameron
for(i in 5:22) {
lines(uk$Year[1:3],sm[1:3,i],col=bluecol) # Macmillan, Douglas-Home
lines(uk$Year[4:10],sm[4:10,i],col=redcol) # Wilson I
lines(uk$Year[11:14],sm[11:14,i],col=bluecol) # Heath
lines(uk$Year[15:19],sm[15:19,i],col=redcol) # Wilson II, Callaghan
lines(uk$Year[20:37],sm[20:37,i],col=bluecol) # Thatcher, Major
lines(uk$Year[38:50],sm[38:50,i],col=redcol) # Blair, Brown
lines(uk$Year[51:53],sm[51:53,i],col=bluecol) # Cam'ron
}
dev.off()

(uk_income.csv is just the trimmed down source data spreadsheet)

Advertisements

Leave a comment

Filed under R, Visualization

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s