I was intrigued by a paper just out in the International Journal of Epidemiology by da Costa et al. They look into the difficult situation where you are carrying out a meta-analysis and some papers report odds ratios or relative risks for achieving a certain threshold of response to treatment (odds or risk of being a “responder”), while others report mean changes in outcomes. For example, some blood pressure studies might report mean changes in millimeters of mercury (mmHg) while others count how many people got down to the normal range. How does one then combine these studies without having the original data? There are five different techniques that the authors identify for approximating an odds ratio from the continuous outcomes. They go on to compare how they perform in terms of real life data where they knew both the odds ratio and the mean change, using studies in osteoarthritis of the knee or hip.

These are the five methods:

- Hasselblad and Hedges (1995): multiply the standardised mean difference and its standard error by 1.81 – that’s the log-odds ratio and its standard error! (On average, if the mean scores follow a logistic distribution in all treatment groups)
- Cox and Snell (1989): as above but multiply by 1.65 (assumes a normal distribution rather than logistic)
- Furukawa & Leucht (2011): estimate a control group risk (or find it buried in the paper), then estimate the treatment risk using the SMD and probit transformations
- Suissa (1991): similar to Furukawa & Leucht but using group-specific means, standard deviations and sample sizes; this should be superior if the group sizes are quite different to each other
- Kraemer and Kupfer (2006): calculate a risk difference from an estimated area under the curve (AUC), which is just the CDF of the normal distribution at SMD/1.414

Their conclusion is that all the methods are good enough except Kraemer & Kupfer, which in fact gave estimated odds ratios significantly different to the true ones, and so they recommend not using the method. I noticed in their Table 2 that the 4 recommended methods all showed an underestimated odds ratio when the baseline risk was less than 20%, although this was not a significant trend for any of them. I wonder how the techniques behave for small risks (0.01% to 1%)… that would be a nice project for somebody to try out.

The moral of the story is a familiar one to many statisticians: David Cox got there first. Seriously though, a simple heuristic method is usually *good enough*, because our aim is to help people see the pattern in the data, right? Somehow my generation of statisticians are much more fixated on fancy methods that work in every situation and have proven properties (and I am a bit guilty of that too), but it is sobering to remember the lessons of the days before immensely powerful computers on every desk: if you draw a histogram or quantile plot and then just multiply the SMD by 1.65, you will often get the same result.