This is just a brief post to elaborate on my response to this tweet:
Everybody appeals to significance, but few have stopped to think about what it really means. You have to get a little philosophical (only a very little). I’m not sure Spiegelhalter actually used the word in this context, except as a rough shorthand when quickly answering questions, because I’m pretty sure he knows what it means!
Significance, along with confidence intervals and p-values, is one of the trappings of inference. In fact, significance just divides the p-values into whether they are above or below some threshold, making it the least informative of the three. Anyway, the important point is you have a sample and you are trying to say something about the population from which it was drawn*. If you have data on every patient the surgeons operated on last year (as we increasingly do), then inference to a population is meaningless. Your sample is the population. On the other hand, if you have a sample of last year’s patients, then you can make inference (if you believe you truly know the sampling mechanism) about the population of last year’s patients. But that almost certainly is not what you want to know. You want to know what next year will be like, whether you should go to Mr X or Miss A to have some odd growth chopped off. And, as Leckie and Goldstein showed us with school league tables, the accumulation of changes in the system make comparisons on past data almost completely uninformative. This is as strong or even stronger as an effect in healthcare, certainly in the UK where the National Health Service has been “liberated” by a “no top-down re-organisation” re-organisation, and in the USA where Obamacare has come into being (and the murmurs suggest waking up to cost-effectiveness next, if we’ve all got over Sarah Palin’s death panel).
So, the problem is not that they are significance or non-significant, it is that differences are taken to be clinically important (actually, they might be too small to worry anyone) and informative of patient and commissioner choice (they’re not). They are helpful for the professionals and their peers to learn from one another and shame the dullards into catching up, but only when combined with a heap of local insights into what’s going on. It’s important that we are transparent and publish them, but they need strong caveats, because at present they are sold to the public (and doctors) as the true objective measure of ‘quality’. The final warning from this gloomy post is that we try to extend the indicators of procedures and outcomes that are available to us into some vague global concept of quality. I leave it to the reader as an exercise (not of my own invention) to write down a definition of quality.
* – there may yet be some surviving ultra-frequentists out there who take it even further and believe you can only do inference if you can carry out infinite repetitions of the same data collection; presumably they would not permit any inference with data like these.
Your author is a member of the National Advisory group on Clinical Audit and Confidential Enquiries. This piece is his own personal view as a practicing statistician with an interest in healthcare quality indicators, and philosophy of science. It is not the view of NAGCAE, NHS England or Her Majesty’s Government (obviously, I would have thought).