# Frequentist accuracy of Bayesian estimates

Today I sat in on the Royal Statistical Society’s webinar discussing Brad Efron’s recent paper “Frequentist Accuracy of Bayesian Estimates”. Prof Efron introduced the paper, Andrew Gelman gave a response with three questions, and there were questions from the floor. It all worked rather well in terms of logistics, and I hope the RSS does more online broadcasts of events – or broadcasts of online events. It is quite easy to do this sort of activity now, and you don’t have to do a training course beforehand: it was Prof Efron’s first ever webinar. I think the trick is to make use of existing simple technology and not to attempt to reinvent the wheel, or feel so anxious about it that one hires an expensive agency to take care of the event. I base this on the very positive experience of Harvard Med School’s GCSRT and associated blended learning courses, which built a strong curriculum out of mostly free tools like Facebook, Dropbox and Google Hangouts. Not only do the students get an online learning environment, they learn how to create their own environment for collaboration.

I wanted to write this and post it quickly because the paper is available to all for free until 4 November 2015. The recording of the webinar is on the RSS journals webinar page, and will move to Youtube soon.

The point of this paper is that when we make some estimate and an interval around it by Bayesian methods, we can interpret the interval in a frequentist framework provided the prior is informative and informed by prior evidence. That is to say, given that prior, we could reasonably imagine the study and its analysis taking place repeatedly forever in identical circumstances. Our interval, whether we call it confidence or credible, would be calculated afresh each time and 95% of them would contain the true value.

Here comes a philosophical aside; you can omit it without loss of understanding, as they say. Now, here I confess some confusion regarding what that value really is, because it seems to contain both a sample of data and a prior, but for a population the prior would cease to influence it. Is it a statistic summarising the posterior distribution, or the population? Does it matter or am I splitting hairs? Furthermore, is it any more reasonable to imagine indefinite collection of data, influenced by an empirical prior, without the prior being updated? Perhaps they have to be collected simultaneously without communication, but even if we admit the plausibility of this (remember that we cannot make inferences about one-off events like elections without bending the rules), we are faced with the problem of subjectivity lurking within. If information is available but is just not known, does that make events ‘random’ in the frequentist sense that they can be described by probability distributions? Most old-school frequentists would say no, because to say yes means that when researcher A learns of researcher B’s data, researcher A inferences collapse like a SchrÃ¶dinger wave function. What if A knows about B but nobody knows that A knows? It could be random for B but not for A: subjectivity, goddam it! Everybody for the lifeboats! This is not to mention the fact that the (human) subject of the research knows even before you ask them. On the other hand, if they say no, then they deny their own frequentist framework, because repeated experiments mean that they cannot be truly repeated, because the very existence of information means there is no such thing as probability (and they might be onto something there…but that’s another story). This, friends, is why frequentism is a steaming pile of poop. But we will leave that aside, because nothing in statistics stands up to philosophical scrutiny except (I think!) personally subjective Bayes, and probably (boom boom!) Jaynes’s fuzzy logic. It doesn’t make sense but it is useful nonetheless.

So, on to the paper. If you don’t have an empirically informed prior (and not many of us use them), what would the meaning be of the frequentist properties of intervals? In such circumstances, we have to some extent made a subjective choice of our prior. This immediately made me think of Leamer’s fragility analysis, in his paper “Let’s Take The Con Out Of Econometrics” (I’m grateful to Andrew Chesher for pointing me to this very entertaining old paper, something of a classic of econometrics but unknown to statistical me). Efron explores this question via the multivariate gradient of the log-probability of the data over values of its summary statistic(s). This gives standard errors via delta method, or you can get a bootstrap too. There is a very nice motivating example where a summary of the CDF of a particular participant in a lasso regression predicting diabetes progression gets both a Bayesian and a frequentist CI, and the ratio of their widths is explored. It turns out that there is a formula for this ratio yielding matrix whose eigenvalues give a range of possible values, and this sets bounds on the frequentist-Bayesian disagreement for any given parameter – potentially a useful diagnostic.

Prof Gelman liked the non-ideological nature of the paper (I do too, despite the rant above), but wondered what the practical conclusion was: to trust Bayes because it is close to frequentist under some specific conditions, or to do the comparison with theoretical simulation each time one does a Bayesian analysis (and to use empirical priors… well, that might not work very often). What if one used Weakly Informative Priors (WIPs), falling somewhere between informative and non-informative? Could inference be done on some measure of the width of the prior? Efron liked this idea.

Gelman also liked the “creative incoherence” between the method employed for the bootstrap, and the estimator it aims at. This is a bit like a confidence interval for difference of medians alongside a rank-based test, as discussed in these very pages a few months back. Why is coherence good? Why not be incoherent and see more than the two sides of the coin familiar from asymptotics: test and interval? Why not employ different methods to understand complementary answers about a Bayesian model?