Category Archives: healthcare

The sad ossification of Cochrane reviews

Cochrane reviews made a huge difference to evidence-based medicine by forcing consistent analysis and writing on systematic reviews, but now I find them losing the plot in a rather sad way. I wanted to write a longer critique while still indemnified by being a university employee and after the publication of a review I have nearly completed with colleagues (all of whom say “never again”). But those two things will not overlap. So, I’ll just point you to some advice on writing a Summary Of Findings table (the only bit most people read) from the Musculo-skeletal Group:

  • “Fill in the NNT, Absolute risk difference and relative percent change values for each outcome as well as the summary statistics for continuous outcomes in the comments column.”

“Summary”, you say? Well, I’m all for relative + absolute measures, but the NNT is a little controversial nowadays (cf Stephen Senn everywhere) and are all those stats going to have appropriate measures of uncertainty, or will they be presented as gospel truth? With continuous outcomes, we were required to state means, SDs, % difference, and % change in either arm, which seems a bit over the top to me, and, crucially, relies on some pretty bold assumptions about distributions: assumptions that are not necessary elsewhere in the review.

  • “When different scales are used, standardized values are calculated and the absolute and relative changes are put in terms of the most often used and/or recognized scale.”

I can see the point of this but that requires a big old assumption about the population mean and standard deviation of the most often used scale, as well as assumption of normality. Usually, these scales have floor/ceiling effects.

  • “there are two options for filling in the baseline mean of the control group: of the included trials for a particular outcome, choose the study that is a combination of the most representative study population and has a large weighting in the overall result in RevMan. Enter the baseline mean in the control group of this study. […or…] Use the generic inverse variance method in RevMan to determine the pooled baseline mean. Enter the baseline mean and standard error (SE) of the control group for each trial”

This is an invitation to plug in your favourite trial and make the effect look bigger or smaller than it came out. Who says there is going to be one trial that is most representative and has a precise baseline estimate? There will be fudges and trade-offs aplenty here.

  • “Please note that a SoF table should have a maximum of seven most important outcomes.”

Clearly, eight would be completely wrong.

  • “Note that NNTs should only be calculated for those outcomes where a statistically significant difference has been demonstrated”

Jesus wept. I honestly can’t believe I have to write this in 2017. Reporting only significant findings allows genuine effects and noise to get through, and the quantity of noise can actually be huge, certainly not 5% of results (cf John Ioannides everything being false, and Andrew Gelman on types of error).

On calculating some absolute changes in % terms (all under 10%), reviewers then came back and told us that they should all be described as “slight improvement”, the term “slight” being reserved for absolute changes under a certain size. They also recommend using Cohen’s small-medium-large classification quite strictly, in a handy spreadsheet for authors called Codfish. I thought Cohen’s D and his classification had been thrown out long ago in favour of, you know, thinking. This is rather sad, as we see the systematic approach being ossified into a rigid set of rules. I suspect that the really clever methodologists involved in Cochrane are not aware of this, nor would they approve, but it is happening little by little in the specialist groups.


Archaeopteryx lithographica (Eichstätter specimen). H. Raab CC-BY-SA-3.0

This advice for reviewers is not on their website but needs proper statistical review and revision. We shouldn’t be going backwards in this era of Crisis Of Replication.

Leave a comment

Filed under healthcare

Is standing up good for your health?

Is sitting the new smoking? The excellent Tony “Dr K” Komaroff says yes, and I have great respect for his work on that website. I read the original paper (standing up and pacing about the office, of course) and felt that it was all just a bit too linear for my taste. What I’d really like to see on these data are additive / semi-parametric models of some form or other, graphically presented for us. I expect that the benefit is not a straight line function of time spent standing/stepping. Also, I suspect that activity at one time of the day is not the same as activity at another and would like to see that explored. And all in all, it’s such an interesting and important topic, it just seemed to me to be obscured unnecessarily by the stats, viz:

“Associations are described as regression coefficients (beta) or relative rates for log-transformed outcomes with 95% confidence intervals, and are plotted on a log scale, with beta rescaled as (beta+mean)/mean”

  1. mean? what mean?
  2. what log scale? the axis isn’t labelled
  3. log-transforms are cool because you get a multiplicative effect; why not use that to your advantage and describe a 10% reduction in triglycerides rather than RR=0.90?

The effects are for a 2 hour change per day (every day!) from sitting to standing, or sitting to stepping. Is that a meaningful change for most people? It sounds pretty ambitious! So, if I do 1 hour, do I get a 5% reduction (or, let’s be more mathematically aware, a 100*(1-sqrt(0.9))=5.1% reduction)? And we come back to the semi-parametric model of some form or other. There are so many cool models you can use, Generalized Additive Models being the most obvious candidate that comes to mind, why not do that sort of thing next time you face the Curse Of Linearity?

Now, for me, the real problem here is the disconnection between understanding the context of the analysis and then actually doing it. The experts who conducted the research certainly know that health benefits and biochemistry changes do not carry on and on and on as you pile in more minutes of standing up. Of course they know that! So then why do they go off and do totally dumb-arse things like this? At what point between starting up the computer and submitting the paper did they disengage the brain and go into a sort of auto-pilot torpor? I find it incredible. You want to know about effective feature selection in statistical modelling? Try thinking!

Leave a comment

Filed under healthcare

Every sample size calculation

A brief one this week, as I’m working on the dataviz book.

I’m a medical statistician, so I get asked about sample size calculations a lot. This is despite them being nonsense much of the time (wholly exploratory studies, no hypothesis, pilot study, feasibility study, qualitative study, validating a questionnaire…). In the case of randomised, experimental studies, they’re fine, and especially if there’s a potentially dangerous intervention or lack thereof. But we have a culture now where reviewers, ethics committees and such ask to see one for any quant study. No sample size, no approval.

So, I went back through six years of e-mails (I throw nothing out) and found all the sample size calculations. Others might have been on paper and lost forever, and there are many occasions where I’ve argued successfully that no calculation is needed. If it’s simple, I let students do it themselves. Those do not appear here, but what we do have (79 numbers from 21 distinct requests) give an idea of the spread.


You see, I am so down on these damned things that I started thinking I could just draw sizes from the distribution in the above histogram like a prior, given that I think it is possible to tweak the study here and there and make it as big or as small as you like. If the information the requesting person lavishes on me makes no difference to the final size, then the sizes must be identically distributed even conditional on the study design etc., and so a draw from this prior will suffice. (Pedants: this is a light-hearted remark.)

You might well ask why there are multiple — and often very different — sizes for each request, and that is because there are usually unknowns in the values required for calculating error rates, so we try a range of values. We could get Bayesian! Then it would be tempting to include another level of uncertainty, being the colleague/student’s desire to force the number down by any means available to them. Of course I know the tricks but don’t tell them. Sometimes people ask outright, “how can we make that smaller”, to which my reply is “do a bad job”.

And in those occasions where I argue that no calculation is relevant, and the reviewers still come back asking for one, I just throw in any old rubbish. Usually 31. (I would say 30 but off-round numbers are more convincing.) It doesn’t matter.

If you want to read people (other than me) saying how terrible sample size calculations are, start with “Current sample size conventions: Flaws, harms, and alternatives” by Peter Bacchetti, in BMC Medicine 2010, 8:17 (open access). He pulls his punches, minces his words, and generally takes mercy on the calculators:

“Common conventions and expectations concerning sample size are deeply flawed, cause serious harm to the research process, and should be replaced by more rational alternatives.”

In a paper called “Sample size calculations: should the emperor’s clothes be off the peg or made to measure”, which wasn’t nearly as controversial as it should have been, Geoffrey Norman, Sandra Monteiro and Suzette Salama (no strangers to the ethics committee), point out that they are such guesswork, we should just save people’s anxiety, delays waiting for a reply from the near-mythical statistician, and brain work, and let them pick some standard numbers. 65! 250! These sound like nice numbers to me; why not? In fact, their paper backs up these numbers pretty well.

In the special case of ex-post “power” calculations, see “The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis” by John M. Hoenig and Dennis M. Heisey, in The American Statistician (2001); 55(1): 19-24.

This is not a ‘field’ of scientific endeavour, it is a malarial swamp of steaming assumptions and reeking misunderstandings. Apart from multiple testing in its various guises, it’s hard to think of a worse problem in biomedical research today.

1 Comment

Filed under healthcare

How to assess quality in primary care

Jim Parle, of the University of Birmingham, and I have an editorial just out in the BMJ responding to the recent Health Foundation report on quality indicators in primary care. There’s a lot one could say about this subject but we had to be brief and engaging. Hopefully the references serve as a springboard for readers who want to dig in more. In brief:

  • We think it’s great that composite indicators received a strongly worded ‘no’; remember that Jeremy Hunt (and probably Jennifer Dixon too) started this process quite keen on global measures of quality reducing all the complexity of primary care organisation and care to a traffic light.
  • We agree that a single port of call would be invaluable. Too much of this information is scattered about online
    • but along with that, there’s a need for standardised methods of analysis and presentation; this is not talked about much but it causes a lot of confusion. At NAGCAE, my role is to keep banging on about this to make analysts learn from the best in the business and to stop spending taxpayers’ money reinventing wheels via expensive private-sector agencies
    • and interactive online content is ideally suited to this, viz ‘State of Obesity’
  • We think they should have talked about the value of accurate communication of uncertainty, arising from many different sources. Consider Elizabeth Ford’s work on GP coding, or George Leckie and Harvey Goldstein on school league tables (googlez-vous).
  • We also think they should have talked about perverse incentives and gaming. It never hurts to remind politicians of Mr Blair’s uncomfortable appearance on Question Time

Leave a comment

Filed under healthcare

The irresistible lure of secondary analysis

The one thing that particularly worries me about the Department of Health in its various semi-devolved guises making 40% cuts to non-NHS spending is that some of the activities I get involved in or advise on, which rely on accurate data, can appear beguilingly simple to cut by falling back on existing data sources, but the devil is in the detail. It is very hard to draw meaningful conclusions from data that were not collected for that purpose, but when the analytical team or their boss is pressed to give a one-line summary to the politicians, it all sounds hunky dory. The guy holding the purse strings might never know that the simple one-liner is built on flimsy foundations.

Leave a comment

Filed under healthcare

Can statistics heal?

(A rather florid headline, I’ll admit.)

I was reading this article on survivors of the 2005 London bombings and was struck by the story of someone who had been involved in two acts of terrorism but found it helpful to learn the stats from a disinterested third party:

One patient – so unlucky as to have also been caught up in a previous terrorist attack – was only reassured after being taken to a bookmakers to see how long the odds were of being in a third.

It makes me wonder how statisticians can help provide this sort of information to clinicians caring for people with PTSD and related problems. It is so easy to construct comparisons that are not quite right (horse-riding=ecstasy, for example, or Sally Clark, or lightning = cricket balls), so it is unfair to expect the doctor or psychologist to put them together as required. Perhaps future generations will have a more advanced grasp of statistics from school, but I wouldn’t count on it being able to overcome pathological distortions of risk perception. There may be a role for the quantified-self sort of data here; consider the popularity of pedometers and GPS for knowing how much exercise you really are doing.

Leave a comment

Filed under healthcare

Roman dataviz and inference in complex systems

I’m in Rome at the International Workshop on Computational Economics and Econometrics. I gave a seminar on Monday on the ever-popular subject of data visualization. Slides are here. In a few minutes, I’ll be speaking on Inference in Complex Systems, a topic of some interest from practical research experience my colleague Rick Hood and I have had in health and social care research.

Here’s a link to my handout for that: iwcee-handout

In essence, we draw on realist evaluation and mixed-methods research to emphasise understanding the complex system and how the intervention works inside it. Unsurprisingly for regular readers, I try to promote transparency around subjectivities, awareness of philosophy of science, and Bayesian methods.


Filed under Bayesian, healthcare, learning, R, research, Stata, Visualization