xkcd revisited: Frequentist vs Bayesian

Andrew Gelman blogged about this cartoon, coming gallantly to the aid of the poor frequentist who might feel victimised. Then all hell broke loose in his comments which you can read here, including the appearance of Randall “xkcd” Munroe himself. I think, optimistically, that everyone agreed to get along in the end.

I couldn’t resist one more word on this for the newcomer. It can all seem terribly esoteric and philosophical. That’s because it is! The traditional frequentist statistician will say that they make inferences about unknown parameters (like the mean blood pressure in stats lecturers) only in the sense that if you repeated the experiment many times, you would get a certain spread of different estimates. The true mean of the population blood pressure is, like the X-Files, out there, and you can never know it unless you get all the stats lecturers in the world and sphyg them. Well, that’s nice but it rather limits our ability to make inferences in complicated situations. Ronald Fisher took the probability equation that you use to model your data given the parameters, and turned it round in that the data are known, but the parameter is not. So some values of the parameter (mean population blood pressure) are more likely than others, given the data. This is called the likelihood, and mathematically it is not quite the same as a probability but we can set a computer running to find the parameter value(s) that get the highest likelihood. Hence “maximum likelihood estimation”, without which we would be still be stumbling around doing t-tests and Mann-Whitneys; there would certainly be no survival analysis, logistic regressions or multilevel models. But you’ll note that this treats the parameter as coming from a distribution, which a strict frequentist (and they are rare beasts) would object to. Most people are happy to say that likelihood is not the same as probability, and leave it there. Why trouble yourself asking what likelihood is if it works?

The term Bayesian is very confused. It can mean someone who treats parameters as coming from distributions, or someone who thinks that probability is not a long-run proportion of repeated results but rather a subjective measure of belief or personal conviction. It also gets used to describe analyses that use computational methods to find the shape of the likelihood when calculus fails us (Markov Chain Monte Carlo, usually). However you take it, there is no evidence that Rev Thomas Bayes held well-developed views compatible with them. In fact, Bayes’ theorem is just a fact of probability theory that nobody disputes.

There is also another elephant in the room. If an event is going to happen only once (e.g. Obama vs Romney) then there is no such thing as long-run proportions of repeated elections. How can we make inference about the % Obama votes? In what sense can Nate Silver say Obama had 86% chance of winning? This is tricky… one solution is to say that probabilities arise from some causal propensity to generate certain kinds of results, an approach usually attriuted to Charles Sanders Peirce that most find a little too slippery because it appeals to causality which, as Judea Pearl has shown us, is a force conceived by mortal man as something beyond mathematical formulae.

Then you have the matter of subjective Bayesian methods, distinguished by what they call “informative priors”. this says that you have a certain belief before the experiment, which is updated by the data (and their likelihood) to form your belief after the experiment – the posterior distribution. As you will have a different prior to me, you will also get a different posterior, but if we collect lots of lovely data, the likelihood will swamp the prior and we will end up believing the same thing. Neat, huh? These subjective Bayesians are the people my MSc supervisor called “mad Bayesians” and a Bayes-friendly colleague of mine calls “staunch Bayesians”. (Note the moral tone that both have adopted. Calm down, guys.) I don’t mind subjective Bayes, it’s a consistent and principled approach, but I don’t use it for one practical reason that I happen to think is the trump card in this particular argument. When you give the client their posterior distribution, it is their new belief. They are not supposed to look at it and think about it, they are supposed to adopt it without thought, because it is their thought. If the posterior says 88% certainty the drug works, well then Dr Subjective, you’d better start selecting 88% of your patients at random to give it to. And nobody behaves like that, so I see little point in mathematically modelling their cerebral processes only to be ignored.

Sadly, we are surrounded by people who “conclude that the sun has exploded”. It’s a basic logical error that afflicts all sorts, regardless of whether they think model parameters can be treated like random variates or that probability is a measure of personal certainty / belief. It is the error that translates some evidence of higher cancer rates in carrot-eaters into a headline that carrots cause cancer. It is the error of making a mathematical model which we call statistics, with a load of assumptions about reality, and then refusing ever to question the assumptions. It is the notion that science brings answers, which I’m afraid it rarely does (see Thomas Kuhn &c &c). If we all understood just a little philosophy of science, we would make better data analysts.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s