People often say that Bland and Altman’s paper where they set out the eponymous plot for comparing two measures in medical statistics is the most-cited stats paper ever. I thought I would poke around on Google Scholar and see what the citations looked like there.

Martin Bland (left) & Doug Altman, Cambridge 1981. Photo courtesy of Martin Bland’s homepage at York.

In terms of total citations, and given all the shortcoming of this as a measure of anything, there are two ahead of B&A, and they needn’t feel cheated, as we’re talking about titans of statistics here. Here’s the rankings for the seven papers I could think of testing:

- Cox (1972) Regression and life tables: 35,512 citations.
- DLR (1977) Maximum likelihood from incomplete data via the EM algorithm: 34,988
- Bland & Altman (1986) Statistical methods for assessing agreement between two methods of clinical measurement: 27,181
- Geman & Geman (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images: 15,106
- Efron (1979) Bootstrap methods: another look at the jackknife: 9686
- Tibshirani (1996) Regression shrinkage and selection via the lasso: 8744
- Nelder & Wedderburn (1972) Generalized linear models: 3818

Can anyone think of any other landmark papers to look up?

Nelder & Wedderburn invented GLMs, so you’d think they should be pretty darn near the top, but for two things, I suppose. Firstly, the most popular of these models, logistic regression and Poisson regression, are so commonplace that people no longer cite them, and secondly, the book by McCullagh and Nelder (following Robert Wedderburn’s tragic death at the age of 28) attracts most of the citations. Adding all the variants on it in Google Scholar, you get 24,297 citations, which would take GLMs up to third place, overtaking B&A, but then that is rather unfair on others with much-cited books like Little & Rubin or David Cox.

When considered *per year since publication*, you have to remember Google Scholar is not measuring the same thing each year. Since it got going, Google have put effort into going back into the archives and getting more books, reports and grey lit on the system. Recent years are going to produce more citations simply because of an inclusion bias, not to mention the fact that a lot more gets written and published each year now (most of it rubbish). But, given all that, B&A come out on top with 1007 citations per year, DLR second with 972, and Cox third with 866.

### Like this:

Like Loading...

*Related*

Filed under Uncategorized

Tagged as statistics

Thought I should also look up structural equation models. Jöreskog (of LISREL fame) notched up a lot of citations, but spread over many publications. Oddly, “Evaluating structural equation models with unobservable variables and measurement error” by Fornell and Larcker 1981 has gathered 17,738, which would put it in 4th place. Perhaps because it is a decent all-round SEM (inflexible by today’s standards of course) and, crucially, in a marketing journal.

Pingback: The most-cited statistics papers ever « Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science

If you haven’t just got here from there, you should look at Andrew Gelman’s post on this topic at http://andrewgelman.com/2014/03/31/cited-statistics-papers-ever/ where more critique of citations ensues, and some glaring omissions on my part are revealed, like Kaplan & Meier (how on earth did I not think of that…)

Brennan Kahan wrote to ask:

To which I replied:

And Brennan wrote back with this point, crucial to understanding these citations: