Update 10 June 2013 – I replaced the two graph images. No idea why, but they weren’t displaying properly in some browsers.
Many of you may already know this occasionally updated analysis by Bob Muenchen – if not, go and check it out. The results, and the discussion in the comments, are fascinating. The strange pattern I see is that the total number of Google Scholar citations which mention any stats software peaked in 2007 and has been rapidly declining since then.
So, I wondered whether we used to have a fashion for naming the software in learned papers, and have more recently given up on this. You know the sort of thing: “All analyses were conducted in Stata/SE version 11.2 (StataCorp, 2010)”. Textbooks on writing for publication say that is good practice, but how many of us still bother? If so, we have given up on it very fast.
As a percentage of this whole, SPSS has been declining since 2005, SAS has stayed pretty much in the same ballpark, while R and Stata have increased in very similar fashion. R might be slightly ahead, but I wouldn’t want to call it. It looks like the switch is from SPSS to Stata or R. Either people are learning new packages, or the SPSS generation is retiring, though the unquenchable appetite of my students for all things point-and-click and Andy Field suggest it is not the latter. I suspect that IBM’s hilarious re-re-re-branding exercises haven’t exactly helped. I mean, how can you cite the software in the paper when you don’t even know what it’s called?
There is probably a hidden side to SAS in pharma company reports that never make it into the public eye, which accounts for its continuing dominance in the job market. And that in turn is down to the perceived preference of the FDA, which is not actually true, it’s just the preference of the boss. And as one of the commentators pointed out on Bob Muenchen’s blog, how can you say SAS is validated when you can’t see the code and see what it is doing?