Tag Archives: public health

Dataviz of the week, 15/6/17

It’s Clean Air Day in the UK. Air pollution interests me, partly as I worked in medical stats for many years, partly because I don’t want to breathe in a lot of crap, and partly because I don’t want my baby to breathe in a lot of crap. London is really bad, the worst place in Europe. Not Beijing, sure, but really bad, and it’s hard to imagine that Brexit will lead to anything but a relaxation of the rules.

Real World Visuals (formerly CarbonVisuals, who made the amazing mountain of CO2 balls looming over New York) have made a series of simple, elegant but powerful images about volumes of air and what they contain, and the volumes of air saturated with pollution which are left behind by one car over one kilometer travelled.


The tweet is accidentally poetic as it can’t accommodate more than the first four images, which leaves you on a cliffhanger with the massive stack looming behind the mother and girl. You know what it is but you can’t see its enormity yet.

The crowd visualisation of 9,416 dead Londoners as dots is not bad, though I like physical images of numbers of people, like this classic (adapted from http://www.i-sustain.com/old/CommuterToolkit.htm):


Here’s a picture of apparently 8-9000 people marching in Detroit:


All dead by Christmas. And then some.

You might like to compare and contrast with higher-profile causes of death, like terrorism.

Leave a comment

Filed under Visualization

Dataviz of the week, 25/1/17

We had guideline-bustin’, kiddie-stiflin’, grandparent-over-the-threshold-usherin’ pollution in London at the beginning of the week. This is fairly standard nowadays, sadly. It’s not quite so bad out where I live in the Cronx, but in town it’s the worst in Europe. At the same time, Cameron Beccario pointed out the Beijing effect in his wonderful globe of carbon monoxide levels – far worse than anywhere else in the world, though there are some petrochemical hot spots. I’ve praised this live viz before, but that was before I started having a pick of the week on my office door (then, when the door went, here on the blog), so I’ll mention it again. Nice.


Leave a comment

Filed under Visualization

Noise pollution map of London (part 1)

I’m working on a noise pollution map of central London. Noise is an interesting public health topic, overlooked and of debatable cause and effect but understandable to everyone. To realise it as interactive online content, I get to play around with Mapbox as well as D3 over Leaflet [1] and some novel forms of visualisation, audio delivery and interaction.

The basic idea is that, whenever the need arises to get from A to B, and I could do it by walking, I record the ambient sound and also capture a detailed GPS trail. Then, I process those two sets of data back at bayescamp and run some sweet tricks to make them into the map. I have about 15 hours of walking so far, and am prototyping the code to process the data. The map doesn’t exist yet, but in a future post on this subject, I’ll include a sketch of what it might look like. The map below shows some of my walks (not all). As I collect and process the files, I will update the image here, so it should be close to live.


I’d like it to become crowd-sourced, in the sense that someone else could follow my procedure for data capture, copy the website and add their own data before sharing it back. GitHub feels like the ideal tool for this. Then, the ultimate output is a tool for people to assemble their own noise-pollution data.

As I make gradual progress in my spare time, I’ll blog about it here with the ‘noise pollution’ tag. To start with, I’ll take a look at:

The equipment

Clearly, some kind of portable audio recorder is needed. For several years, when I made the occasional bit of sound art, I used a minidisc recorder [2] but now have a Roland R-05 digital recorder. This has an excellent battery life and enough storage for at least a couple of long walks. At present, you can get one from Amazon for GBP 159. When plugged into USB, it looks and behaves just like a memory stick. I have been saving CD-quality audio in .wav format, mindful that you can always degrade it later, but you can’t come back. That is pretty much the lowest quality the R-05 will capture anyway (barring .mp3 format, and I decided against that in that I don’t want it to dedicate computing power to compressing the sound data), so it occupies as little space on the device as possible. It will tuck away in a jacket pocket easily so there’s no need to be encumbered by kit like you’re Chris Watson.

Pretty much any decent microphone, plus serious wind shielding, would do, but my personal preference is for binaurals, which are worn in the ear like earphones and capture a very realistic stereo image. Mine are Roland CS-10EM which you can get for GBP 76. The wind shielding options are more limited for binaurals than a hand-held mic, because they are so small. I am still using the foam covers that come with the mics (pic below), and wind remains something of a consideration in the procedure of capturing data, which I’ll come back to another time.


On the GPS side, there are loads of options and they can be quite cheap without sacrificing quality. I wanted something small that allowed me to access the data in a generic format, and chose the Canmore GT-730FL. This looks like a USB stick, recharges when plugged in, can happily log (every second!) for about 8 hours on a single charge, and allows you to plug it in and download your trail in CSV or KML format. The precision of the trail was far superior to my mobile phone at the time when I got it, though the difference is less marked now even with a Samsung J5 (J stands for Junior (not really)). There is a single button on the side, which adds a flag to the current location datum when you press it. That flag shows up in KML format in its own field, but is absent from CSV. They cost GBP 37 at present. There are two major drawbacks: the documentation is awful (Remember when you used to get appliances from Japan in the 80s and none of the instructions made sense? Get ready for some nostalgia.) and the data transfer is by virtual serial port, which is straightforward on Windows with the manufacturer’s Canway software but a whole weekend’s worth of StackOverflow and swearing on Linux/OS X. Furthermore, I have not been able to get the software working on anything but an ancient Windows Vista PC (can you imagine the horror). Still, it is worth it to get that trail. There is a nice blog by Peter Dean (click here), which details what to do with the Canmore and its software, and compares it empirically to other products. The Canway software is quite neat in that it shows you a zoomable map of each trail, and is only a couple of clicks away from exporting to CSV or KML.

Having obtained the .kml file for the trail plus starting point, the .csv file for the trail in simpler format, and the .wav file for the sound, the next step is synchronising them, trimming to the relevant parts and then summarising the sound levels. For this, I do a little data-focussed programming, which is the topic for next time.


1 – these are JavaScript libraries that are really useful for flexible representations of data and maps. If you aren’t interested in that part of the process, just ignore them. There will be plenty of other procedural and analytic considerations to come that might tickle you more.

2 – unfairly maligned; I heard someone on the radio say recently that, back around 2000, if you dropped a minidisc on the floor, it was debatable whether it was worth the effort to pick it up


Leave a comment

Filed under Visualization

More peas, dear?

As soon as I woke I knew it was going to be one of those days. The first words I heard were from the BBC: researchers had discovered that eating five portions of fruit and vegetables a day (optimistic already) was not enough – we must all eat seven. I was suspicious. I said as much to Mrs Grant as I stumbled towards the kitchen: “residual confounding, socio-economic status”. She ignored me.

Eventually I got round to printing the paper in JECH and read it on the train into town (to hear the wise and insightful Sir David Cox talk at the RSS), with increasing alarm. Every day I see bad stats, of course, but the press coverage for this one makes it potentially very harmful by putting people off even attempting any increased fruit/veg consumption. I’ve no doubt that fruit & veg is good, but I don’t believe there is any decent evidence for 5 or 7 or 10 portions. Now, to be fair, the paper itself expresses some caution. Regular readers will know what’s coming next. UCL’s press release spins it a little bit, mentioning the 7 portions quite a lot. And then the press picked it up and spun it a bit more, into killjoy-docs-say-eat-two-pounds-of-broccoli-or-face-certain-death . It’s kind of nobody’s fault but it went wrong anyway. Like the Iraq War.

Well, that’s my kind words of comfort for the authors of the paper. From here on, it’s going to hurt.

I think there are six major flaws that make this study close to totally uninformative:

  1. Residual confounding, particularly by socio-economic status. SES is measured in the data source, the Health Survey for England (HSE) as the “head of household” having a manual or non-manual job. That’s all there is, and to put that into a regression as a covariate and pretend that SES has been taken out of the equation is sheer nonsense. For me, there is a smoking gun: eating more frozen or tinned fruit & veg is associated with significantly higher hazard of death. That just doesn’t make sense unless it is actually a confounded association. It is so ludicrous that they should have stopped at that point and considered things very carefully.
  2. The fruit & veg consumption in HSE relates to the 24 hours prior to the survey. We know that will balance out over the population, and also that it is no worse than other self-reported measures, but it remains biased.
  3. Several subgroup analyses (but not an exhaustive list) appear and get repeatedly quoted, with little or no rationale whatever for their selection. In every subgroup analysis that appears in the paper, the effect is significant and stronger than in the whole dataset. This may be above board – I don’t know – but it looks very much like cherry-picking.
  4. A linear assumption of hazard ratio for one more portion is assumed at one point, without theoretical justification or reference to the data. Presumably that’s where the idea of ten portions came from; we can just extrapolate a linear trend off the end of the observed data. Eat enough veg and you live forever; eat enough frozen veg and you die immediately.
  5. These are people who have chosen, of their own accord, to eat an awful lot of fruit & veg, or at least say they do. That’s not the same as the  UK population, encouraged and cajoled into eating X portions per day.
  6. Some variables are missing in as many as 62% of the participants. They are included as their own category, which we have known since Rubin (1976) is a very bad idea.

I don’t relish going out on a limb and attacking other people’s work that I’m not intimately acquainted with, but I do so here because it is potentially very harmful. If people are discouraged from even attempting to eat more veg because the bar has been set unrealistically high, that is a massive public health own-goal. Even if points 2-6 turn our to be fine, point 1 certainly isn’t, and that is the worst one.

Here is my own transcript of the BBC Radio 4 Today programme interview, and you will see the researcher is not entirely blameless in encouraging a certain bold interpretation of linear and universal benefit. Success and fame is a corrupting influence (or so I’m told).

JH: We’re used to being told that eating five portions of fruit and veg a day was good for us, and we should try to do it, although apparently two thirds of adults in this country don’t. Now researchers say that the benefits are even greater than we thought, and that eating seven or even more portions a day may have considerable benefits. They link high consumption of fruit and veg to longer life; it’s as simple as that. Lola Oyebode is the lead author of the research which is published today in the Journal of Epidemiology and Community Health, and she joins us now – good morning.
LO: Good morning.
JH: Now, tell us what you found, just putting it as simply as you can.
LO: We looked at the general population of England, and we grouped them by how many portions of fruit and vegetables they ate a day, so we looked at people who ate less than one, one to three, three to five, five to seven, and seven plus, and what we found was in each group, the more fruit and vegetables you ate, the better the benefit to your health, with the group who were eating seven or more portions a day having the lowest risk of mortality.
JH: Ah, well, lowest risk of mortality – but you seem to be suggesting there are wide benefits, that it’s a general prescription for good health.
LO: What we looked at was mortality and we looked at mortality from any cause, death from cancer and death from heart disease and stroke, so those were our outcomes.
JH: So, fruit and vegetables are enemies of the big killers?
LO: That’s right.
JH: Now, when you say seven portions, what do you mean? Seven carrots?
LO: The advice is that you have a variety of fruit and vegetables, so not to eat seven of the same sort.
JH: No…I’m not suggesting that most people would like to eat seven carrots!
LO: Well, actually, I would, but…
JH: Right, well, we’ll keep your personal habits out of it!
LO: A portion is about eighty grams, so that’s one large fruit, or a handful of smaller fruit or veg.
JH: Let’s just talk about the difference between fruit and vegetables. Is there any difference?
LO: Yes, what we found was that vegetables had a greater benefit than equivalent amounts of fruit, but we did still find that fruit gave significant benefit to health.
JH: What is it that causes this benefit to happen?
LO: Well, what we think is that the sugar content in fruit makes it not quite as good as vegetables, and that both fruit and vegetables have lots of micro-nutrients, which are important for the body to work properly, and also lots of fibre, which is good for health.
JH: So, in other words, what you’re saying is that if you want to increase your chances of living a long life rather than an artificially short life, if you cut down on red meat, fatty food, and all the rest of it, and increase your intake of fruit and vegetables, that’s the best thing you can do?
LO: Yes, that’s right.
JH: Does that sum it up?
LO: It does. Well, so we found that all additional portions of fruit and vegetables were of benefit, so even those eating one to three portions were doing significantly better than the people eating less than one portion. So, how ever many you’re eating now, eat more!
JH: And what about the age of the people, and the impact that it has? In other words, I mean if you are seventy, is it still worth increasing the amount of fruit and veg that you eat?
LO: Well, we included seventy year olds in our study – we looked at adults aged thirty five and over in the general population.
JH: So it’s true for everybody – get stuck in. Good news for greengrocers. Thank you very much indeed, Lola Oyebode.



Filed under Uncategorized

Things to do in Wokingham when you’re prematurely dead

The recent release by Public Health England of a ranking of local authorities in terms of premature death rates has attracted lots of media attention. I have to say that the interactive presentation is really clear and nice, but there is one big problem for me: no acknowledgement of uncertainty. The number one spot for Wokingham with 200.3 deaths per 100,000 population is presented as fact, and certainly it is true for 2009-2011 when the data were collected. Some of you may be interested in blaming your local politicians / health service for the last three years, but others will look at this chart and wonder what it tells you about the place where you live, or are thinking of moving to, now and indeed in the future. Surprisingly little, I would suggest.

Ideally, we would have a time series of premature deaths in successive years over a long time, so we could see the direction of travel. Are the rates coming down in one place or up in another? That would be more interesting than these single numbers. Then we could also tell whether the poor rates in Blackpool and Manchester in 2009-11 were just a statistical blip or a long-term trend. But even without long-term data we can do better than a single number.

It’s not an ideal thing to do (see Leckie and Goldstein “The limitations of school league tables to inform school choice”), but we can use the same tools we have for inference from a random sample. There is an amazingly simple formula for the uncertainty around a rate and I just quickly ran off 95% confidence intervals for each LA and plotted them in the graph below. The 95% confidence interval is based on the idea that, if everything stayed the same (that’s the unrealistic bit), future years would see the LAs’ rates fluctuating, sometimes a bit higher due to bad luck and sometimes a bit lower due to good luck. This particular interval should contain 95% of those future hypothetical rates. Or, to flip it around (and not every statistician would agree with me that this is equivalent), you can also say that there is a 95% chance that the rate in your LA will be within that interval this year.

This "caterpillar plot" has vertical lines that show the degree of uncertainty in each local authority's premature death rate. When they overlap, that suggests they are more likely to have swapped places in 2012 or 2013.

This “caterpillar plot” has vertical lines that show the degree of uncertainty in each local authority’s premature death rate. When they overlap, that suggests they are more likely to have swapped places in 2012 or 2013.

The overlap among the authorities is not an ideal way to judge the uncertainty in the league table, but it’s not a bad place to start. If your authority overlaps with its neighbour, then you could swap places simply by chance, without any change in the population or the public health activities. A more intuitive way of looking at it is to consider what this uncertainty in the rates could do to the ranks in the league table, and we can easily do this by simulating possible future rates from the 95% confidence intervals for each LA. I ran this for 10,000 probable futures and here’s the 95% confidence intervals for the ranks:

In this version of the graph, the rank in the league table is shown vertically. Uncertainty is much greater because LAs are very similar and could easily swap places.

In this version of the graph, the rank in the league table is shown vertically. Uncertainty is much greater because LAs are very similar and could easily swap places.

That’s a lot of uncertainty. So here is an alternative table that shows where your LA is in the Longer Lives table and also how low or high it might be this year, just by chance (remember – this assumes nothing has changed!) As an example in how to read it, my home is in Croydon (number 58 in 2009-11) but we could be as good as number 45 (currently occupied by Havering) or as bad as 70 (Hounslow). That span of 25 local authorities is not unusual and covers one-sixth of England.

LA Rank Lower 95% CI Higher 95% CI
1 Wokingham 1 1 8
2 Richmond upon Thames 2 1 11
3 Dorset CC 3 1 12
4 Surrey CC 4 1 12
5 South Gloucestershire 5 1 12
6 Rutland 6 1 14
7 Harrow 7 2 14
8 Kensington and Chelsea 8 2 16
9 Bromley 9 2 16
10 Hampshire CC 10 3 16
11 Kingston upon Thames 11 4 18
12 West Berkshire 12 5 24
13 Buckinghamshire CC 13 6 24
14 Windsor and Maidenhead 14 6 25
15 Cambridgeshire CC 15 6 27
16 Barnet 16 6 28
17 Suffolk CC 17 8 29
18 Bath and North East Somerset 18 8 29
19 Devon CC 19 11 31
20 Wiltshire 20 11 33
21 Hertfordshire CC 21 12 33
22 Oxfordshire CC 22 12 33
23 West Sussex CC 23 14 34
24 Poole 24 14 37
25 Solihull 25 14 37
26 Somerset CC 26 15 37
27 Bexley 27 17 38
28 Sutton 28 18 40
29 Merton 29 18 44
30 Leicestershire CC 30 18 45
31 Enfield 32 18 47
32 Gloucestershire CC 31 20 47
33 Central Bedfordshire 33 21 49
34 North Yorkshire CC 34 24 49
35 Essex CC 35 24 50
36 Shropshire 36 25 50
37 Bracknell Forest 37 25 50
38 Cheshire East 38 27 51
39 Norfolk CC 39 27 52
40 Redbridge 40 27 52
41 Warwickshire CC 41 27.98 54
42 Worcestershire CC 42 28 54
43 East Riding of Yorkshire 43 28 54
44 Herefordshire, County of 44 29 55
45 Havering 45 29 57
46 Cornwall 47 31 57
47 Westminster 46 33 58
48 East Sussex CC 48 33 59
49 Isle of Wight 49 36.98 60
50 North Somerset 50 37 60
51 Hillingdon 51 37 61
52 Brent 52 37 61
53 Kent CC 53 40 61
54 York 54 40 65
55 Staffordshire CC 55 40 65
56 Derbyshire CC 56 42 67
57 Swindon 57 44 68
58 Croydon 58 45 70
59 Cheshire West and Chester 59 45 71
60 Wandsworth 60 47 71
61 Trafford 61 49 73
62 Nottinghamshire CC 62 50 73
63 Lincolnshire CC 63 52 74
64 Milton Keynes 64 54 75
65 Camden 65 57 77
66 Northumberland 66 57 78
67 Bournemouth 67 57 78
68 Southend-on-Sea 68 57.98 79
69 Ealing 69 58 79
70 Hounslow 70 60 79
71 Thurrock 71 61 80
72 Northamptonshire CC 72 61 82
73 Waltham Forest 73 64 83
74 Dudley 74 64 84
75 Stockport 75 65 84
76 Cumbria CC 76 65 86
77 Bedford 77 65 88
78 Reading 78 67 90
79 Haringey 79 67 94
80 Medway 80 68 95
81 Sheffield 81 69 97
82 Warrington 82 70 99
83 North Lincolnshire 83 71 102
84 Torbay 84 74 102
85 Greenwich 85 77 104.02
86 Plymouth 86 77 107
87 Peterborough 87 78 107
88 Hammersmith and Fulham 88 78 108
89 Rotherham 89 79 108
90 Bristol, City of 90 80 109
91 Kirklees 91 80 109
92 Sefton 92 82 109
93 Redcar and Cleveland 93 82 110
94 Darlington 94 83 111
95 Southampton 95 83 111
96 Telford and Wrekin 96 84 112
97 North Tyneside 97 84 112
98 Brighton and Hove 98 84 113
99 Bury 99 85 114
100 Leeds 100 85 115
101 Derby 101 85 115
102 Stockton-on-Tees 102 86 115
103 Lancashire CC 103 86 116
104 Portsmouth 104 87 116
105 County Durham 105 87 117
106 Lewisham 106 88 117
107 North East Lincolnshire 107 88 118
108 Luton 108 88 120
109 Slough 109 88 121
110 Wakefield 110 88 123
111 Walsall 111 91 125
112 St. Helens 112 93 125
113 Wirral 113 94 126
114 Doncaster 114 94 127
115 Southwark 115 96 128
116 Newham 116 96 128
117 Calderdale 117 99 128
118 Islington 118 102 129
119 Birmingham 120 104 129
120 Barnsley 119 106 130
121 Lambeth 121 107 131
122 Bradford 122 108 131
123 Gateshead 123 109 131
124 Bolton 124 109 133
125 Wolverhampton 125 111 133
126 Coventry 126 112 133
127 Wigan 127 115 134
128 Hackney 128 115 135
129 South Tyneside 129 116 137
130 Newcastle upon Tyne 130 118 139
131 Hartlepool 131 118 140
132 Sunderland 132 120 142
133 Barking and Dagenham 133 123 142.02
134 Halton 134 127 143
135 Leicester 135 128 143
136 Sandwell 136 129 143
137 Tower Hamlets 137 129 143
138 Stoke-on-Trent 138 130 144
139 Oldham 139 131 144
140 Rochdale 140 131 144
141 Nottingham 141 132 144
142 Tameside 142 132 145
143 Blackburn with Darwen 143 133 145
144 Knowsley 144 134 146
145 Middlesbrough 145 138 147
146 Kingston upon Hull, City of 146 143 148
147 Salford 147 145 148
148 Liverpool 148 145 148
149 Blackpool 149 149 150
150 Manchester 150 149 150

Leave a comment

Filed under Uncategorized

Booze space – article on Significance website

Happy new year everybody! Just before Christmas I wrote an article for Significance which is on their website here. This took the HMRC booze data that Andrew McCulloch had previously analysed as time series, and turned them into an animation (and the R code to make it is here). Apart from the pretty pictures, an interesting angle is the difference between litres of booze (which is what the HMRC count in order to levy the tax) and the units of alcohol – particularly at Christmas, when we as a nation drink a lot of wine.

Leave a comment

Filed under Uncategorized