Tag Archives: politics

UK election cartogram in the medium of Opal Fruits

I’ve been stockpiling Opal Fruits, which young people tell me are now called Starburst, in anticipation of today’s election results.


This is like one-tenth of the stash. I don’t want to eat them though. You know what you’re going to get if you knock here at Halloween.

I took the New York Times’ hexbin cartogram, imposed a 6×8 rectangular grid and counted the most common party in each block. There was a little bit of fudging and chopping up the sweets. It is art, no? Here’s the video:

Leave a comment

Filed under Visualization

Dataviz of the week, 7/2/2017

This is in the Brexit White Paper, so a little bit important. Lots of people have been sharing this on Twitter. I can imagine the intern thinking¬†“this is my big chance, must pay attention now, even if it’s 4 am and I’m really tired, come on now… how do I do this bar chart thing again?”

Then they hit “Send”, followed by the comedy trombones: bwaaap bwaap bwaaaaaaap.



But on the bright side, apparently everyone in Britain has been entitled to 14 weeks of holiday each year for ages. I am due some serious back-pay.

Leave a comment

Filed under Visualization

Futility audit

Theresa May’s “racial disparity audit” announced on 27 August, is really just a political gesture that works best if it never delivers findings. I’m reminded of the scene in Yes, Minister (or is it The Thick Of It? Or both?) where the protagonists are all in trouble for something and when the prime minister announces that there will be a public inquiry to find out what went wrong, they are delighted. They know that inquiries are the political equivalent of long grass, with the intention being that everybody involved has retired by the time it reports*.


Larry knows better than to look for mice in 300,000 different places.

It’s not entirely clear what is meant by audit here. Not in the accountants’ sense, surely. Something more like clinical audit? Audit, done properly, is pretty cool. Timely information on performance can get fed back to professionals who run public services, and they can use those data to examine potential problems and improve what they do. But when central agencies examine the data and make the call, it is not the same thing. The trouble is that, whatever indicators you measure, indicators can only indicate; it takes understanding of the local context to see whether it really is a problem.

But there’s another, more statistical problem in this plan: it is impossible to deliver all those goals in the announcement from the prime ministers office:

  • audit to shine a light on how our public services treat people from different backgrounds
  • public will be able to check how their race affects how they are treated on key issues such as health, education and employment, broken down by geographic location, income and gender
  • the audit will show disadvantages suffered by white working class people as well as ethnic minorities
  • the findings from this audit will influence government policy to solve these problems

So that pulls together data across the country from all providers of health services, all schools and colleges, all employers. There needs to be sufficient numbers to break them down into categories by ethnicity (18 categories are used by the Census in England), location at sufficient scale to influence policy (152 local authorities, presumably), income (maybe deciles?) and gender (in this context, they probably need more than two, let’s allow four). Also, social class has been dropped into the objectives, so they will need to collect at least three categories there.

This gives about 300,000 combinations. Inside each of these, sufficient data are needed in order to give precise estimates of fairly rare (one hopes) adverse outcomes. Let’s say maybe 200 people’s data. On total, data from 60,000,000 people, which is just short of the entire UK population, but that includes babies etc, who are not relevant to some of the indicators above. Oh dear. Now, those data need to be collected in a consistent and comparable way, analysed and fed back, including a public-friendly league table from the sounds of it, in timely fashion, say within six months of starting.

I’m being fast and loose with the required sample size, because there are some efficiency savings through excluding irrelevant combinations, multilevel modeling, assumptions of linearities or conditional independence etc, but it is still hopeless. I suspect then that this was never intended actually to happen, but just to be a sop to critics who regard our current government as representing the interests of white UK citizens only, while throwing some scraps to disenchanted white working class voters who chose Brexit and might now be disappointed that police are not going door to door rounding up Johnny Foreign.

One more concern and then I’ll be done: when politicians ask experts to do something, and everybody says no, they sometimes like to look for trimmed down versions such as a simpler analysis based on previously collected data. After all, it would be embarrassing to admit that you couldn’t do a project. However, that would be a serious mistake because of the inconsistencies and problems in making the extant sources commensurate. I hope any agency or academic department approached says no to this foolish quest.

* – you might like to compare with Nick Bostrom’s criticism of the great number of twenty-year predictions for technology: close enough to be exciting, but still after the predictor’s retirement.

1 Comment

Filed under Uncategorized

The irresistible lure of secondary analysis

The one thing that particularly worries me about the Department of Health in its various semi-devolved guises making 40% cuts to non-NHS spending is that some of the activities I get involved in or advise on, which rely on accurate data, can appear beguilingly simple to cut by falling back on existing data sources, but the devil is in the detail. It is very hard to draw meaningful conclusions from data that were not collected for that purpose, but when the analytical team or their boss is pressed to give a one-line summary to the politicians, it all sounds hunky dory. The guy holding the purse strings might never know that the simple one-liner is built on flimsy foundations.

Leave a comment

Filed under healthcare

UK election facts clarified with interactive graphics

I’ve been impressed with this website (constituencyexplorer.org.uk) put together by Jim Ridgway and colleagues at Durham, with input from the House of Commons Library and dataviz guru Alan Smith from the ONS. In part, it is aimed at members of parliament, so they can test their knowledge of facts about constituencies and learn more along the way. But it makes for a fun quiz for residents too. Everything is realised in D3, so it runs everywhere, even on your phone. There are a few features I really like: the clean design, the link between map, list and dotplot in the election results:


… the animation after choosing a value with the slider, highlighting the extra/shortfall icons and the numbers dropping in: nice!


… the simple but quite ambitious help pop-up:


… and the way that the dotplots are always reset to the full width of the variable, so you can’t be misled by small differences appearing bigger than they are. The user has to choose to zoom after seeing the full picture.

All in all, a very nice piece of work. I must declare that I did contribute a few design suggestions in its latter stages of development but I really take no credit for its overall style and functionality. Budding D3 scripters could learn a lot from the source code.

And while we’re on the topic, here some more innovative electoral dataviz:



And finally, take a moment to ask election candidates to commit to one afternoon of free statistical training, a great initiative from the RSS – and frankly, not much to ask. Unfortunately, none of my local (Croydon Central) would-be lawmakers have been bothered to write back yet. But here’s the parties that are most interested in accurate statistics, in descending order (by mashing up this and this):

  1. National Health Alliance: 4/13
  2. Pirate Party: 1/6
  3. Green Party: 47/568
  4. Labour Party: 51/647
  5. Plaid Cymru: 3/40
  6. Liberal Democrats: 47/647
  7. Ulster Unionist Party: 1/15
  8. Christian People’s Alliance: 1/17
  9. Conservative and Unionist Party: 27/647
  10. Scottish National Party: 2/59
  11. United Kingdom Independence Party: 15/624

Leave a comment

Filed under Visualization

Boris Johnson and the statistics of IQ

I’ve been putting off this post – I don’t really get a kick out of demolishing the statistical claims of politicians – but it intersects with league tables, an old area of interest, so here goes.

Boris Johnson, the mayor of London who enjoys some popularity by cultivating his lovable-eccentric persona, said last week:

Whatever you may think of the value of IQ tests, it is surely relevant to a conversation about equality that as many as 16% of our species have an IQ below 85, while about 2% have an IQ above 130.

You can read the whole speech here; the IQ bit is in page 7. Others have been alarmed by various aspects of this and the surrounding flurry of strange metaphors: society as breakfast cereal, society as centrifuge… but I will confine myself to the stats. “IQ” could mean any number of test procedures and questions. There is considerable debate about what it actually measures. There is also considerable debate about what we might want to measure, even if we could measure it. Bear that in mind, because Johnson sweeps those concerns aside.

What you do is to run your test on a bunch of normal people (small alarm bell should be ringing here about who is normal). Their results might come up with a certain mean and standard deviation, and you want your results to be comparable (read “indistinguishable”) to others, so you add or subtract something to bring the mean to 100, and multiply or divide to bring the SD to 15. Now you have an IQ score. So far, we have uncertainty arising from defining the construct, the validity of the measure, and the differences between. But with a nice normal distribution like that, it’s quite tempting to ignore those issues and move straight on to the number-crunching.

If the mean is 100 and the SD 15, then 16% of the distribution will be less than 85.1, and 2% of the distribution will be above 130.8. What Johnson is describing is the normal distribution’s shape. Why is it like that? Because you made it that way, remember, you squashed and shoved it into standardised shape. What else creates a normal distribution in nature? Adding unrelated things together (central limit theorem) and random noise. Saying that 16% are below 85 is no more meaningful that saying half the population is below the average. Everybody now knows to laugh at politicians and their apparatchiks who say things like that. I think that’s a good thing, because we are learning little by little not to have the wool puled over our eyes by “dilettantes and heartless manipulators”. Don’t fall for Johnson’s nonsense either.

A final note: why have the bigger portion at the low end and the smaller at the top? Clearly he is trying to appeal to a listener who thinks of themselves in the top 2%, and considers with distaste the rabble far below. Yet you could measure any old nonsense and talk with the same air of scientific veracity about quantiles of the normal distribution.

PS: I usually put a picture in, because blog readers like pictures. But on this occasion I fear it would only encourage him. You’ll note I don’t call him by his near-trademarked first name either.

1 Comment

Filed under Uncategorized

Things to do in Wokingham when you’re prematurely dead

The recent release by Public Health England of a ranking of local authorities in terms of premature death rates has attracted lots of media attention. I have to say that the interactive presentation is really clear and nice, but there is one big problem for me: no acknowledgement of uncertainty. The number one spot for Wokingham with 200.3 deaths per 100,000 population is presented as fact, and certainly it is true for 2009-2011 when the data were collected. Some of you may be interested in blaming your local politicians / health service for the last three years, but others will look at this chart and wonder what it tells you about the place where you live, or are thinking of moving to, now and indeed in the future. Surprisingly little, I would suggest.

Ideally, we would have a time series of premature deaths in successive years over a long time, so we could see the direction of travel. Are the rates coming down in one place or up in another? That would be more interesting than these single numbers. Then we could also tell whether the poor rates in Blackpool and Manchester in 2009-11 were just a statistical blip or a long-term trend. But even without long-term data we can do better than a single number.

It’s not an ideal thing to do (see Leckie and Goldstein “The limitations of school league tables to inform school choice”), but we can use the same tools we have for inference from a random sample. There is an amazingly simple formula for the uncertainty around a rate and I just quickly ran off 95% confidence intervals for each LA and plotted them in the graph below. The 95% confidence interval is based on the idea that, if everything stayed the same (that’s the unrealistic bit), future years would see the LAs’ rates fluctuating, sometimes a bit higher due to bad luck and sometimes a bit lower due to good luck. This particular interval should contain 95% of those future hypothetical rates. Or, to flip it around (and not every statistician would agree with me that this is equivalent), you can also say that there is a 95% chance that the rate in your LA will be within that interval this year.

This "caterpillar plot" has vertical lines that show the degree of uncertainty in each local authority's premature death rate. When they overlap, that suggests they are more likely to have swapped places in 2012 or 2013.

This “caterpillar plot” has vertical lines that show the degree of uncertainty in each local authority’s premature death rate. When they overlap, that suggests they are more likely to have swapped places in 2012 or 2013.

The overlap among the authorities is not an ideal way to judge the uncertainty in the league table, but it’s not a bad place to start. If your authority overlaps with its neighbour, then you could swap places simply by chance, without any change in the population or the public health activities. A more intuitive way of looking at it is to consider what this uncertainty in the rates could do to the ranks in the league table, and we can easily do this by simulating possible future rates from the 95% confidence intervals for each LA. I ran this for 10,000 probable futures and here’s the 95% confidence intervals for the ranks:

In this version of the graph, the rank in the league table is shown vertically. Uncertainty is much greater because LAs are very similar and could easily swap places.

In this version of the graph, the rank in the league table is shown vertically. Uncertainty is much greater because LAs are very similar and could easily swap places.

That’s a lot of uncertainty. So here is an alternative table that shows where your LA is in the Longer Lives table and also how low or high it might be this year, just by chance (remember – this assumes nothing has changed!) As an example in how to read it, my home is in Croydon (number 58 in 2009-11) but we could be as good as number 45 (currently occupied by Havering) or as bad as 70 (Hounslow). That span of 25 local authorities is not unusual and covers one-sixth of England.

LA Rank Lower 95% CI Higher 95% CI
1 Wokingham 1 1 8
2 Richmond upon Thames 2 1 11
3 Dorset CC 3 1 12
4 Surrey CC 4 1 12
5 South Gloucestershire 5 1 12
6 Rutland 6 1 14
7 Harrow 7 2 14
8 Kensington and Chelsea 8 2 16
9 Bromley 9 2 16
10 Hampshire CC 10 3 16
11 Kingston upon Thames 11 4 18
12 West Berkshire 12 5 24
13 Buckinghamshire CC 13 6 24
14 Windsor and Maidenhead 14 6 25
15 Cambridgeshire CC 15 6 27
16 Barnet 16 6 28
17 Suffolk CC 17 8 29
18 Bath and North East Somerset 18 8 29
19 Devon CC 19 11 31
20 Wiltshire 20 11 33
21 Hertfordshire CC 21 12 33
22 Oxfordshire CC 22 12 33
23 West Sussex CC 23 14 34
24 Poole 24 14 37
25 Solihull 25 14 37
26 Somerset CC 26 15 37
27 Bexley 27 17 38
28 Sutton 28 18 40
29 Merton 29 18 44
30 Leicestershire CC 30 18 45
31 Enfield 32 18 47
32 Gloucestershire CC 31 20 47
33 Central Bedfordshire 33 21 49
34 North Yorkshire CC 34 24 49
35 Essex CC 35 24 50
36 Shropshire 36 25 50
37 Bracknell Forest 37 25 50
38 Cheshire East 38 27 51
39 Norfolk CC 39 27 52
40 Redbridge 40 27 52
41 Warwickshire CC 41 27.98 54
42 Worcestershire CC 42 28 54
43 East Riding of Yorkshire 43 28 54
44 Herefordshire, County of 44 29 55
45 Havering 45 29 57
46 Cornwall 47 31 57
47 Westminster 46 33 58
48 East Sussex CC 48 33 59
49 Isle of Wight 49 36.98 60
50 North Somerset 50 37 60
51 Hillingdon 51 37 61
52 Brent 52 37 61
53 Kent CC 53 40 61
54 York 54 40 65
55 Staffordshire CC 55 40 65
56 Derbyshire CC 56 42 67
57 Swindon 57 44 68
58 Croydon 58 45 70
59 Cheshire West and Chester 59 45 71
60 Wandsworth 60 47 71
61 Trafford 61 49 73
62 Nottinghamshire CC 62 50 73
63 Lincolnshire CC 63 52 74
64 Milton Keynes 64 54 75
65 Camden 65 57 77
66 Northumberland 66 57 78
67 Bournemouth 67 57 78
68 Southend-on-Sea 68 57.98 79
69 Ealing 69 58 79
70 Hounslow 70 60 79
71 Thurrock 71 61 80
72 Northamptonshire CC 72 61 82
73 Waltham Forest 73 64 83
74 Dudley 74 64 84
75 Stockport 75 65 84
76 Cumbria CC 76 65 86
77 Bedford 77 65 88
78 Reading 78 67 90
79 Haringey 79 67 94
80 Medway 80 68 95
81 Sheffield 81 69 97
82 Warrington 82 70 99
83 North Lincolnshire 83 71 102
84 Torbay 84 74 102
85 Greenwich 85 77 104.02
86 Plymouth 86 77 107
87 Peterborough 87 78 107
88 Hammersmith and Fulham 88 78 108
89 Rotherham 89 79 108
90 Bristol, City of 90 80 109
91 Kirklees 91 80 109
92 Sefton 92 82 109
93 Redcar and Cleveland 93 82 110
94 Darlington 94 83 111
95 Southampton 95 83 111
96 Telford and Wrekin 96 84 112
97 North Tyneside 97 84 112
98 Brighton and Hove 98 84 113
99 Bury 99 85 114
100 Leeds 100 85 115
101 Derby 101 85 115
102 Stockton-on-Tees 102 86 115
103 Lancashire CC 103 86 116
104 Portsmouth 104 87 116
105 County Durham 105 87 117
106 Lewisham 106 88 117
107 North East Lincolnshire 107 88 118
108 Luton 108 88 120
109 Slough 109 88 121
110 Wakefield 110 88 123
111 Walsall 111 91 125
112 St. Helens 112 93 125
113 Wirral 113 94 126
114 Doncaster 114 94 127
115 Southwark 115 96 128
116 Newham 116 96 128
117 Calderdale 117 99 128
118 Islington 118 102 129
119 Birmingham 120 104 129
120 Barnsley 119 106 130
121 Lambeth 121 107 131
122 Bradford 122 108 131
123 Gateshead 123 109 131
124 Bolton 124 109 133
125 Wolverhampton 125 111 133
126 Coventry 126 112 133
127 Wigan 127 115 134
128 Hackney 128 115 135
129 South Tyneside 129 116 137
130 Newcastle upon Tyne 130 118 139
131 Hartlepool 131 118 140
132 Sunderland 132 120 142
133 Barking and Dagenham 133 123 142.02
134 Halton 134 127 143
135 Leicester 135 128 143
136 Sandwell 136 129 143
137 Tower Hamlets 137 129 143
138 Stoke-on-Trent 138 130 144
139 Oldham 139 131 144
140 Rochdale 140 131 144
141 Nottingham 141 132 144
142 Tameside 142 132 145
143 Blackburn with Darwen 143 133 145
144 Knowsley 144 134 146
145 Middlesbrough 145 138 147
146 Kingston upon Hull, City of 146 143 148
147 Salford 147 145 148
148 Liverpool 148 145 148
149 Blackpool 149 149 150
150 Manchester 150 149 150

Leave a comment

Filed under Uncategorized