To R or not to R

Recently I attended a meeting at the Royal Statistical Society with this title. The people attending that I spoke to all seemed to be in the position of having got into using R software and yet being the only one in their department or organisation. The recurring themes over coffee were how the boss was obsessed with Stata or the university won’t let them teach anything but SPSS. At least one obstacle removed is the notion that FDA requires SAS for drug trials.

My own view is that R is pretty hard to beat for analysis, but difficult to teach with given that some people learn visually and benefit massively from a good GUI (which R Commander is not yet imho). However, the fact that it is free is a big selling point to students – or anyone for that matter.

When you are providing outputs to non-statisticians on a consultancy basis, R will be pretty opaque to them, but it does give you a lot of tools for constructing interactive reports or websites. With the consultancy angle in mind, I particularly enjoyed talks by Wayne Jones from Shell on his experience of embedding R within, or interfacing with, other software so as to provide the client with interactive outputs.  He showed us some very impressive interactive reports for scientifically-minded but non-statistical colleagues, made using Rcom, R2HTML, Sweave (and presumably soon to include Knitr as well), and Rpanel. Give them this sort of glossy product and you will soon be far more popular than your colleague who just exports the SPSS output to .rtf format (eurrrgh!).

Andy Field from Sussex Uni also gave a talk – the author of the ever-popular books on SPSS, SAS and now R as well. As a psychology lecturer his students are probably not very different to mine, so I was quite keen to find out what side of the fence he was going to come down on in terms of stats software. And yet he too had no simple answer to the dilemma between SPSS and R, although he made two very good points: if you are going to teach with R you’d better have some backup among your colleagues when it comes to assignment / dissertation time or you will have 200 students queueing at your door (not to mention the academic politics of tampering with inter-dependent modules), and don’t assume SPSS is clearer or easier to learn – it has accumulated over the years a huge number of menu options, some inconsistent, some duplicating others, and it uses a bewildering range of exotic sounding tests without any apparent unifying framework. This struck a chord for me; it’s easy for us who have already learnt to overlook how confusing it is to have to choose an option from the SPSS “Analyze” menu when you are also struggling to relate it to your research question and remember how to use the software. Add onto that the creeping malaise from version 20 onward where SPSS offers to choose the test for you, and the race is definitely not won yet. All R needs to take over is a really slick GUI that does basic stuff. Some people will be happy with R CommanderJGR or Deducer but nobody wants to teach three or more ways of using the same software in the hope the student will be happy with one of them.

So for my part, I am preparing online material for next academic year that will introduce both SPSS and R (the only packages available on our network for students) and let students choose for themselves. There are a few tasks like sample size calculations, making random allocation lists or calculating relative risks and their confidence intervals where they will have to leave SPSS anyway because it just won’t do them.


