On multiplicity

Last week, the New Scientist ran a most enjoyable editorial and accompanying article on wrong study findings in neuroscience. They start with the old chestnut of the dead salmon and move on to John Ioannides and his claim that most research is wrong, backed up with a more recent critique of underpowered and publication-biased studies, with Katherine Button and other colleagues. If there’s little chance of detecting a genuine effect, goes their argument, then whenever you find something that looks interesting, it’s more likely to be a false alarm.

In the New Scientist article, another critic of dodgy neuroscience is worried about a media witch hunt against neuroscience results, producing “this kind of global nihilism that all of neuroscience is bullshit”. I recognise this in my own students. I have to teach them to be critical – very critical – of published research, and I show them some corkers published in great journals which are fundamentally flawed. After a talk and a group exercise in critiquing, there is usually a fairly large proportion of the room saying that everything is b.s. and that you can’t trust stats. I’d like to think I bring them back to the middle ground at that point, but I fear that sometimes they go back into the wild with this strange pessimistic notion.

The more interesting angle is on multiplicity. Genome wide association studies had to crack this in a principled way, because when you compare millions or billions of potential risk factors, many of them are going to come out as significant. Thousands or millions are going to give the fabled p<0.001, even if nothing is going on. Brain imaging types tend to be more inclined to machine learning ideas (in its crudest form, divide the sample, look for patterns in the first half, then see if they are confirmed in the other) than p-adjustment and Bayes factors, from what I see, but the problem is the same.

But I think the most intriguing angle in this whole disconcerting mess is that we deal with multiplicitous studies all the time. As Andrew Gelman described recently, there are many ways you could pick your hypothesis, define your variables and analyse your data, and you don’t account for that multiplicity other than in an unspoken gut-feeling kind of way. Ioannides commented on this too, in “Why most research findings are false”, with a nice turn of phrase:

We should then acknowledge that statistical significance testing in the report of a single study gives only a partial picture, without knowing how much testing has been done outside the report and in the relevant field at large. Despite a large statistical literature for multiple testing corrections, usually it is impossible to decipher how much data dredging by the reporting authors or other research teams has preceded a reported research finding.

That is not cause for throwing up our hands and giving up (like the students), but just a fact of every scientific endeavour. Openness is the answer, making the data, the analysis code and all the protocols and analysis plans available for others to pick over. And it’s the journal publishers who will drive this, because even if we researchers do fervently believe in openness, we are stopped by copyright, fear of criticism, and the burning need to get on with the next piece of work.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s