It’s Christmas Eve in old London Town, and a cold wind is whistling down Fleet Street. 250 years ago today, Richard Price, a philosopher and preacher from the nearby village of Islington, made his way down the street and turned under the arch into the little passageway called Crane Court. He was not going to the Temple Bar Indian restaurant, as some last-minute office workers will be right now, he was going to the Royal Society. Price was carrying with him the edited work of his friend and fellow nonconformist preacher Thomas Bayes, who had died two years earlier, leaving behind among his unpublished writings “An Essay towards solving a Problem in the Doctrine of Chances”. That evening, Price read the paper before the Society, thus committing it to the annals of history. Today, much is said about Bayesian statistics, and very little of it is in any way approachable to anyone who isn’t a total stats nerd. Let me try to explain what this is all about, without mathematical detail, to mark 250 years of Bayes.
There are two ideas of far-reaching importance in Bayes’ Essay. the first is his eponymous Theorem, which is an uncontroversial, and very useful, bit of simple probability theory. It allows you to flip conditional probabilities round. For example, you are a doctor and your patient has received a positive HIV test result. If you know the probability of getting a positive test result when a patient really has HIV, and you know the prevalence of HIV, then you can flip it round to the much more useful probability of really having HIV given the test result. So far so good. The second idea is set out at the very beginning; Bayes writes that he wants to find:
“the chance that the probability of [a certain event happening] lies somewhere between any two degrees of probability that can be named”
This would not have been controversial at the time, because nobody really knew what they meant by probability. Once the early 20th century arrived, though, debate was hotting up among philosophers of science, many of them just a mile to the north in the neighbourhood of Bloomsbury: John Maynard Keynes and Bertrand Russell in particular (and don’t forget that Ronald Fisher worked in University College, across the square from Keynes’s house). Russell viewed Keynes’s recasting of probability into a form of logic as the best of all the options, and warned that a simple probability determined solely by extant data was not a concept to be muddled up with human (or avian) decision-making:
“The importance of probability in practice is due to its connection with credibility, but if we imagine this connection to be closer than it is, we bring confusion into the theory of probability”
“The man who has fed the chicken every day throughout its life at last wrings its neck instead, showing that more refined views as to the uniformity of nature would have been useful to the chicken.”
Wise words. He was certainly something of a wag, if his History of Western Philosophy is anything to go by. Despite the unprepossessing title, this tome is studded with little jokes at the expense of his predecessors. Keynes’s neighbour Virginia Stephen (later Mrs Woolf) wrote in her diary simply that “the Bertrand Russells came for tea – very droll” and added that he was “a fervid egoist”. As a statistician interested in philosophy, a dedicated Woolf reader, and one predisposed towards a certain degree of drollery, I am sorry to have missed out on the invitation to tea that day.
Some of these philosophers said that the probability of the event happening is simply the proportion of times it will happen if you try it again and again forever. This is certainly true, but not very practical. These frequentists went on to insist that a probability is a fixed but unknown quantity, which we can estimate but not make any definite assertions about. For them, Bayes’ terminology is a foolish slip from the unsophisticated 1700s; there is no such thing as the chance of a probability lying between X and Y, it either is or isn’t. The fact that we don’t know it does not mean you can treat it as random and start talking about the chance of being between X and Y. Bayes’ “chance” is a hyper-probability, but frequentist probabilities are only meaningful for things that can be repeated indefinitely under identical circumstances. The problem here is that you can’t then talk about the probability of rain tomorrow, or of New York electing a Democrat mayor next time round, because these events only happen once, and that seems to have thrown out the baby with the bath-water.
Other said that something being random simply means that you don’t know it. Data you have yet to collect are random, coming from a distribution that you can see when you draw a histogram of the data so far. Then they went beyond that to say that unknown parameters that govern the universe, like probabilities, are also random, even though it’s harder to learn about their distribution over repeated samples. These people alighted on Bayes’ turn of phrase and took him as their figurehead, calling themselves Bayesians. Some also asserted that probability could only mean a level of personal belief in unknowns, and so it is subjective, but Bayes never suggested this and so I shall mention it no further. (I think it is true and elegant, but not at all useful. Many very wise and learned people would disagree with me there, but their counter-arguments don’t convince me. Let’s not get into it now.) Did he really believe in the chance of a probability, or was that just sloppy terminology? In fact, Bayes wrote, when setting out his definitions at the start of the Essay, “By chance, I mean the same as probability”. So he clearly meant there to be probabilities about probabilities, or in other words that an unknown parameter is just as random as an unknown (future) measurement. He definitely was a Bayesian, although probably (!) not subjective.
We need to bear in mind when reading Bayes that his Essay was edited by Price, who seemed to have found it irresistible to employ mathematics, whenever he could, as an attempt to prove the existence of God, and his presentation of Bayes’ Essay is no exception. Price was no great logician, and like Bayes, he was most likely in the Royal Society because at the time it was a thinkers’ club dominated by liberal nonconformists, who elected their own sort to join their ranks. Price certainly wouldn’t have liked subjective Bayesianism; by his peculiar logic it would have permitted a dangerous relativism pointing us all down the highway to hell.
So, what impact does Bayesianism have on us today? There are two ways in which it makes its presence felt, the first rather unfortunate, the second beyond Bayes’ wildest imagination. By the Second World War, the philosophers had pretty much settled the issue. Frequentism crumbles under scrutiny; it is impossible not to contradict oneself, in large part because the distinctions between data and parameter, known and unknown, population and sample, can differ from individual to individual. Sadly, nobody told the statisticians. Some polemical textbooks warned readers of the dangers of Bayes, and this word of mouth continues to this day. They are not fools, they just haven’t spared the time to think it through enough.
On the other hand, if you are willing to treat unknown parameters as random variates (note the slight semantic distinction from a random variable, one of many such concessions to the statistical frequentists), then you can set your computer to try out different values and let them meander around through the parameter’s own distribution. We have had a really practical way of doing this since 1984, called the Gibbs sampler (there are others…). A strict frequentist could not permit such nonsense; the parameter is either here or it is there, it cannot move about, and to set it going, the analyst must supply a prior distribution. This smacks of subjectivity, even if it is diffuse enough to allow the parameter to roam free, constrained only by the data. Indeed, there is no judgement-free prior, but that is also a fact of life when you come to interpret the results of any data analysis.
The trouble is that frequentist (they often like to call their stuff classical, which is amusing because it just prompts Bayesians to call their work modern) methods run out in some scenarios, or become horrendously complex, while the Bayesian methods march on. Personally, I think the distinguishing feature is that methods like the Gibbs sampler get closer to the data-generating process, and break it into steps, which helps us understand the model of the data that we are fitting to it.
Bayes’ original goal, of finding the chance that a probability is between X and Y, also lends itself to clear, intuitive understanding. You can be told that there is a 78% chance that average temperatures in England have increased since 1960, while a frequentist would have to couch it in terms of going back in time and taking repeated measurements under identical conditions an infinite number of times.
Of course, there are also wise voices in the world of stats warning us not to rely too much on methods like the Gibbs sampler. For one thing, it is not easy to spot when things go wrong, or what is influencing the results. It is also not at all clear how best to compare alternative models for your data and choose the best one. Also, the debate about subjective probability has not gone away. Finally, although the computational tools are impressive, they require much more time and power than their frequentist counterparts, and in the era of big data, you can easily get into a situation where they just won’t work any more. There’s clearly lots of fun to be had refining and developing them for many years to come.
So we take our leave of Crane Court, the former home of the Royal Society, just as Price did, stepping out into the cold night. I dare say Price – and Bayes – would have wished you, reader, a very happy Christmas, but I think I’ll steer their ghosts round the corner and onto the 341 bus as quick as I can, in case they encounter an office party or see a decorated tree. I don’t think they would have approved.