Here’s a collection of horror stories published on Halloween. James McCormack, Ben Vandermeer and Michael Allan, all evidence-based medicine enthusiasts from Canada, have written a wonderful paper on misleading conclusions in medical research. Essentially, there is a spectre haunting medical (and other) stats: the spectre of significance. This is an old story, such as would make MR James blush, but you know what? It needs to be told again. And this paper tells it really well, with three compelling examples.
You think this can’t happen in five-star research designs: meta-analysis of large drug trials? Think again! Here are three collections of meta-analyses, where authors got very similar results, pooling results from mostly the same trials. Some landed just on the non-significant side and declared the drugs to be of no benefit, or no harm. In one case they felt so sure of this, they went on to trash their rivals whose meta-analysis came down on the significant side (the correspondence is worth reading if you like watching grown men and women getting angry at each other; the methodologists Deeks and Higgins make some interesting observations I won’t go into here). Of course, the rivals, investigating harms, declared the drugs to be downright dangerous. Countless others, investigating benefits and landing just on the significant side, have pronounced their favoured potion to be the best thing since sliced bread.
McCormack and colleagues make the point that people get twitchy about confidence intervals in just the same way as p-values close to 0.05, despite the idea that CIs prevent dichotomania by presenting a range of likely values. In one meta-analysis by Granger et al, comparing anticoagulants apixaban and warfarin, their CI for the relative risk of death went up to 0.998. All the other stats were presented to 2 decimal places but this one, without explanation, went to three (unbelievably, not even both sides of the CI: “0.80 – 0.998”). In the abstract, it appears to 2 dp, incorrectly rounded down
to 0.99. Oops-a-daisy! And their conclusion in the abstract, probably the only bit most people read?
“In patients with atrial fibrillation, apixaban was superior to warfarin in preventing stroke or systemic embolism, caused less bleeding, and resulted in lower mortality.”
Now, you’d be forgiven for thinking that no decent journal would have let that slip through the net. Granger et al published in the New England Journal of Medicine, the biggest impact medical journal in the world. Journals increasingly require CIs, and some even forbid p-values (a bold move of which I approve), but they can’t stop authors from making over-confident verbal interpretations. That’s where reviewers are needed, but also careful editing (remember the case of the Million Women study on alcohol).
Even way back when Fisher proposed 0.05, it was just a rule of thumb. In my experience, it does map pretty well to the point where I look at the data and start to feel there might be something real going on, but that feeling involves a certain amount of grey area. It’s a natural part of weighing up information and making decisions about how you think the world works, although to see how some people get jumpy in the grey area, you’d think it was the DMZ between saying hello to your Nobel prize and saying goodbye to your career. To reduce your research findings to this dichotomy is to profoundly miss the point. Stats is a set of tools for encoding and quantifying those human feelings about whether there is something real going on.
The other great gift McCormack and colleagues have given us is a collection of published meta-analyses where the truth seems to be rather uncertain, and small differences tip the authors into very different conclusions. That is the sort of thing methodological researchers like me hunt high and low for. It’s hard to find these really nice examples, and I look forward to getting to grips with them in more detail!