The subject that all statisticians are bored by, yet all their colleagues are obsessed by.
Why are we bored? Because it is nonsense. You want to make a guess, go ahead. Dress it up with clever-looking formulas, put in some references to papers by well-respected people in the field, and it will still be a guess.
Why are you obsessed? Because your project will not get funded or ethically approved or you won’t get your PhD unless you do one of these calculations.
So I approached this paper in the BMJ, which my colleague Andy Jewell brought to my attention, with some trepidation. The paper is behind a paywall / institutional login, but the thrust is that the calcuations are guesses anyway so why not just use a number that seems to work for most people? (It is round about here that the number 30 starts to appear on experienced people’s lips.) If using science to make sample size calculations look trustworthy is polishing the proverbial turd, then this seems to be the opposite: pointing out the steaming excreta and then going out of your way to stand in it. Nevertheless it is honest if nothing else. And as Andy said to me, who are we to quibble with a student who cites this paper in an eminent journal and simply writes “30” in that section of their dissertation. But it is quite a stimulating read and pointed me to this paper by Bacchetti, which talks about the philosophical and logical basis for sample size calculations, and different ways of measuring the cost and utility of an experiment. Insightful, sober and inter-disciplinary, this paper is a really good one and I will be firing it at anyone who strays into my airspace with a question about sample size on board.
There are also some good responses to the BMJ paper, including one from Bacchetti. And here, finally, we must give credit to Hans-Hermann Dubben from Hamburg-Eppendorf University Medical Centre who pointed out that the paper asked totally the wrong question about the effect size:
How much do you think your treatment will affect systolic blood pressure?
This, my friends, is indeed the wrong question. I went so far as to take the dreaded yellow highlighter to it when I first read it. This is the sort of misunderstanding that costs my students dearly, because it leads to overpowered studies that waste time and money, and harm and kill participants. The right question is something like:
How much of a change in systolic blood pressure do you want to be able to detect?
which, of course, is quite different and is sometimes called the Minimum [Clinically] Important Difference.
And in considering this question, you might feel uncomfortable at imposing a threshold below which your findings are of no consequence whatsoever, and above which they are amazing and deserve that editorial in Nature followed by the inevitable Nobel prize. Well, this is how I feel all the time. Because sample size calculations are all about hypothesis tests, and hypothesis tests are either significant or not, sometimes correctly and sometimes not. They are sometimes a useful tool but usually they tell only a little bit of a story. How they came to be regarded as the pinnacle of all medical science is beyond me. I recently read (and chuckled to myself), while reviewing Andy Field’s new book “Discovering Stataistics Using R”, something along the lines of:
the p-value is not informative or very interesting [in correlation] but if you are a real stats nerd then you can obtain it with such-and-such an option, and then p yourself at how clever you are.