On cryptic multiplicity (for the last time, surely!)

Since some psychologists with no statistical qualifications decided to clean up psychology by banning statistics, other journals and assorted rags are seeing an opportunity to publish something kinda scientific, kinda popular, kinda easy. Here’s three that spectacularly miss the point. I don’t mind people being wrong on the internet, but when peer-reviewed journals tell you how to fix bad stats in peer-reviewed journals by using bad stats, it probably ought to be pointed out. Maybe it’s how Tom Lehrer felt on hearing of Kissinger’s Nobel Prize.

hiking-78x78Exhibit one: “The extent and consequences of P-hacking in science“, written by biologists in PLoS Biology. They are examining overt multiplicity – running loads and loads of tests and reporting only the super funky ones – which is quite rare. Most people are agreed on this. The real problem is cryptic multiplicity, or “the garden of forking paths“. Get rid of the overt abusers and you would still have Ioannides’s problem of everything ever published being absolute nonsense (I paraphrase).

“peer-reviewed journals tell you how to fix bad stats in peer-reviewed journals by using bad stats”

Exhibit two: “P-values are just the tip of the iceberg“, written by biostatisticians in Nature. This is pretty good, but falls into that medical habit of envisioning all hypotheses as terribly clear-cut (mean systolic blood pressure at 6 weeks, per protocol, differs between the drug and placebo groups, &c &c), and they really are not. If we could only sort out study designs and data cleaning, they argue, then we could resume p-values with impunity, safe in the knowledge they would lead us to the Universal Truth with alpha type 1 errors and beta type 2 errors. They name their own online course in statistics as the solution (I suppose there’s nothing wrong with that in a comment article, but we Brits feel a tad, you know, uncomfortable with the old self-promotion sort of thing (for a moment I imagined there would be a toll-free number to enrol in the footnotes, but there wasn’t; anyway, it’s $470 and you can pay up front or as-you-go with Coursera, they take all the usual credit cards)). There’s, tellingly, this diagram which shows how you collect your data, then clean it, then do some analyses and find what looks sexy, and then make a hypothesis, and then test it. Cool!

Spot the problem?

  • Yes? Well done. You don’t really need to read this. Get on with your work.
  • No? Scroll back up and read that garden o’ forking paths paper. Read it good.

Now, let’s be generous. I criticise Leek and Peng with some trepidation, because they know their onions. It could be that the comment piece was slashed about the editing room by Nature’s staffers, and maybe the diagram was also the product of fevered interns. Heaven knows it wouldn’t be the first time.

Exhibit three: “What is medicine’s five sigma?” by the editor of the Lancet (a doctor, that most humble and self-effacing of professions) writes that p-values are awful so the thing to do is to accept only papers with p<0.0000003. This kind of misses the point so profoundly it's like treating an ingrown toenail by amputation. Of the other leg.

I'm not even going to bother you with the psychology magazines.

Conclusion: plenty of experience of critiquing papers and running analyses does not seem to prevent a lack of understanding of what a hypothesis test really does and how it goes wrong familywise, sciencewise, or whatever. The recommended solution is to read some philosophy of science. The late, great Peter Lipton is my homeboy on this one. We should pre-specify everything we can, and not allow those around us to get away with vague scientific hypotheses that map to many possible statistical hypotheses (forks in the path). But we must also remember that sometimes you can't pre-specify everything, and that ultimately you are doing a sort-of-deductive process after an inductive guess at an interesting topic and before an inductive set of practical conclusions, so we should acknowledge this creativity and not brush it under the carpet.

