Category Archives: noticeboard

I’m writing a dataviz book

Today I am starting work on a major new project, writing a book on data visualisation for the CRC-ASA series on statistical reasoning in science and society. There are several excellent dataviz books out there but I’m excited to be adding something new. This will be a brief, affordable overview that does not assume any previous training in statistics, or design, or coding. A lot of techniques will get described, but rather than just a baffling gallery, I want to make this a tour that shows the reader how to think through the options critically and justify their choices.


Procrastinating by taking a selfie in my secret hideout

The series should be a great collection for just this reason. More people than ever before have to work with data, and not all are experts or intend to be. I was inspired by the popularity of short, simple books on various business topics that you see in airport & railway station bookshops, and hope to provide something like that. I picture as my readers the manager in charge of risk analysis at a credit card company, or starting up a new modeling department in an insurance company, or the charity boss who wants to know what to ask for from the design team so their publications are more compelling (with apologies to any friends who see their own images there). You won’t see this in bookshops for a little while, but I’ll keep you posted on progress.


Filed under learning, noticeboard, Visualization

Lectures at Conway Hall, London

Some public lectures coming up soon that may interest anyone with a social/medical/political focus. These are at Conway Hall, Red Lion Square, London.

28/10/2012 Our Public Relations, their Propaganda Graham Bell Jay Ginn
04/11/2012 8 Principles for successful optimists Mark Stevenson Andrew Copson
18/11/2012 Pharmageddon Prof David Healy
25/11/2012 The History and Future of Bioethics Prof Richard Ashcroft
02/12/2012 Preventative medicine? Are screening tests about science or politics? Dr Margaret McCartney
09/12/2012 The Ethics of Open Borders Prof Phil Cole
16/12/2012 Invisible England: Holding Therapy Practices in the UK

Thanks to Jay Ginn for posting these on the Radstats list.

Leave a comment

Filed under noticeboard

Converting continuous to binary outcomes for meta-analysis

I was intrigued by a paper just out in the International Journal of Epidemiology by da Costa et al. They look into the difficult situation where you are carrying out a meta-analysis and some papers reportĀ  odds ratios or relative risks for achieving a certain threshold of response to treatment (odds or risk of being a “responder”), while others report mean changes in outcomes. For example, some blood pressure studies might report mean changes in millimeters of mercury (mmHg) while others count how many people got down to the normal range. How does one then combine these studies without having the original data? There are five different techniques that the authors identify for approximating an odds ratio from the continuous outcomes. They go on to compare how they perform in terms of real life data where they knew both the odds ratio and the mean change, using studies in osteoarthritis of the knee or hip.

These are the five methods:

  • Hasselblad and Hedges (1995): multiply the standardised mean difference and its standard error by 1.81 – that’s the log-odds ratio and its standard error! (On average, if the mean scores follow a logistic distribution in all treatment groups)
  • Cox and Snell (1989): as above but multiply by 1.65 (assumes a normal distribution rather than logistic)
  • Furukawa & Leucht (2011): estimate a control group risk (or find it buried in the paper), then estimate the treatment risk using the SMD and probit transformations
  • Suissa (1991): similar to Furukawa & Leucht but using group-specific means, standard deviations and sample sizes; this should be superior if the group sizes are quite different to each other
  • Kraemer and Kupfer (2006): calculate a risk difference from an estimated area under the curve (AUC), which is just the CDF of the normal distribution at SMD/1.414

Their conclusion is that all the methods are good enough except Kraemer & Kupfer, which in fact gave estimated odds ratios significantly different to the true ones, and so they recommend not using the method. I noticed in their Table 2 that the 4 recommended methods all showed an underestimated odds ratio when the baseline risk was less than 20%, although this was not a significant trend for any of them. I wonder how the techniques behave for small risks (0.01% to 1%)… that would be a nice project for somebody to try out.

The moral of the story is a familiar one to many statisticians: David Cox got there first. Seriously though, a simple heuristic method is usually good enough, because our aim is to help people see the pattern in the data, right? Somehow my generation of statisticians are much more fixated on fancy methods that work in every situation and have proven properties (and I am a bit guilty of that too), but it is sobering to remember the lessons of the days before immensely powerful computers on every desk: if you draw a histogram or quantile plot and then just multiply the SMD by 1.65, you will often get the same result.

Leave a comment

Filed under noticeboard

Videos of all talks and Q&A at 2012 Radical Statistics conference

Radical Statistics have now uploaded all the videos from this year’s conference and you can view them here. Themes were medical and financial mis-management of the figures, and there were some cracking good presentations, Aubrey Blumsohn’s being my favourite! Check them out.

Leave a comment

Filed under noticeboard

Doh! Nut

Spotted at

Nooooo! That’s not how it works.

Leave a comment

Filed under noticeboard

London hire bike jouneys mapped and animated

This short animation by Jo Wood at City University aims to help us see the patterns in the mass of data arising from London’s bicycle hire scheme (often referred to as Boris Bikes, although the scheme was devised by the previous mayor Ken Livingstone). For those unfamiliar with this scheme, you can walk up to a bike rack, put in your credit card or pre-paid details, take a bike and then leave it at another rack somewhere else. Little trucks nip around the city redistributing the bikes to make sure they don’t all end up in one place.

Screenshot from the animation

At first glance I was baffled by the time aspect. What was changing over time? Were these real bike journeys at different times of the day? I was confused because I always click “play” before I read the text (also the reason why I can’t understand our TV remote control at home). Eventually I realised that it starts off showing all journeys, though the individual trails are simulated and not real people on bikes, and these accumulate over time until about 15 seconds in, when it gradually gets filtered down to showing the more popular routes and ends up with just the key “hubs” illuminated. Prof Wood says this is like “a graphic equaliser”, which is a concept much more familiar to my generation.

It’s a novel approach in quite a subtle way: time is used to show density. Imagine having loads of bivariate normal data and wanting to show the distribution. You could draw a contour plot but this gets nasty as the distribution gets more complex, so why not have an animation showing all the data in a scatterplot, and gradually remove the dots from the less populous regions, moving in by convex hulls until only the mode is still populated. Here’s a rough animation I made with uncorrelated bivariate normal data (n=10,000).

Now, for simple distributions like this, it’s not very useful. But when you get into weird shapes, it could be quite useful. Another way you could imagine it is a 3-D surface with density on the vertical axis, which gradually gets submerged below an opaque “water level” until only the highest peaks are visible.

Leave a comment

Filed under noticeboard

Tails you win – a new documentary on BBC4

Airing on 18 October 2012 on BBC4, there will be a new documentary on the role that chance plays in our lives, from the team that made The Joy of Stats. “Professor Risk” David Spiegelhalter will be presenting this time, clearly not put off by his previous TV role.

Spiegelhalter holding a pair of big fuzzy dice

Prof Spiegelhalter was looking forward to decorating his Ford Capri

Leave a comment

Filed under noticeboard