The world’s favourite stats papers

People often say that Bland and Altman’s paper where they set out the eponymous plot for comparing two measures in medical statistics is the most-cited stats paper ever. I thought I would poke around on Google Scholar and see what the citations looked like there.

Martin Bland (left) & Doug Altman, Cambridge 1981. Photo courtesy of Martin Bland’s homepage at York.

In terms of total citations, and given all the shortcoming of this as a measure of anything, there are two ahead of B&A, and they needn’t feel cheated, as we’re talking about titans of statistics here. Here’s the rankings for the seven papers I could think of testing:

  1. Cox (1972) Regression and life tables: 35,512 citations. 
  2. DLR (1977) Maximum likelihood from incomplete data via the EM algorithm: 34,988
  3. Bland & Altman (1986) Statistical methods for assessing agreement between two methods of clinical measurement: 27,181
  4. Geman & Geman (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images: 15,106
  5. Efron (1979) Bootstrap methods: another look at the jackknife: 9686
  6. Tibshirani (1996) Regression shrinkage and selection via the lasso: 8744
  7. Nelder & Wedderburn (1972) Generalized linear models: 3818

Can anyone think of any other landmark papers to look up?

Nelder & Wedderburn invented GLMs, so you’d think they should be pretty darn near the top, but for two things, I suppose. Firstly, the most popular of these models, logistic regression and Poisson regression, are so commonplace that people no longer cite them, and secondly, the book by McCullagh and Nelder (following Robert Wedderburn’s tragic death at the age of 28) attracts most of the citations. Adding all the variants on it in Google Scholar, you get 24,297 citations, which would take GLMs up to third place, overtaking B&A, but then that is rather unfair on others with much-cited books like Little & Rubin or David Cox.

When considered per year since publication, you have to remember Google Scholar is not measuring the same thing each year. Since it got going, Google have put effort into going back into the archives and getting more books, reports and grey lit on the system. Recent years are going to produce more citations simply because of an inclusion bias, not to mention the fact that a lot more gets written and published each year now (most of it rubbish). But, given all that, B&A come out on top with 1007 citations per year, DLR second with 972, and Cox third with 866.



  1. Thought I should also look up structural equation models. Jöreskog (of LISREL fame) notched up a lot of citations, but spread over many publications. Oddly, “Evaluating structural equation models with unobservable variables and measurement error” by Fornell and Larcker 1981 has gathered 17,738, which would put it in 4th place. Perhaps because it is a decent all-round SEM (inflexible by today’s standards of course) and, crucially, in a marketing journal.

  2. Brennan Kahan wrote to ask:

    “Rob, can you explain why Bland/Altman plots are cited almost as often as the Cox model, and way more often than the bootstrap for example? I just don’t get it (not that I don’t love Bland/Altman plots, I just don’t get why they’re so highly cited…)”

    To which I replied:

    “I don’t know why but it is the done thing to cite B-A at every opportunity. It makes one look clever without actually having to grapple with algebra or calculus. (No offence, their papers were and remain beacons of clear writing) Also, their citations are not too spread across lots of papers. Cox’s book mostly gets cited rather than the 2 partial likelihood papers. And the bootstrap just doesn’t get done anywhere near as often as it should. If all the duffers out there started bootstrapping, they’d probably be citing something else, because Efron is not exactly light bedtime reading!”

    And Brennan wrote back with this point, crucial to understanding these citations:

    “Good points – also, BA seems to have just the right amount of popularity, where it’s used quite often, but it’s not so standard that it doesn’t really require a citation (like the Cox model, Chi sq test, logistic regression, etc). “

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s