# The delights of re-reading the GAISE College Report

I am revising an introductory stats lecture for some Masters courses starting this term and thought I would re-read the GAISE College Report. This excellent document is not known (and followed) nearly as much as it should be among stats lecturers. It is a set of recommendations for effective statistics teaching in higher education, published in 2005 and revised 2010 by the American Statistical Association.

Before giving you some choice quotes that I think sum up the problem of applied stats teaching outside a “mathematics” or “statistics” course, I shall just pause to say that the report opens with a uniquely lucid history of statistics university courses through the 20th century, from those based around Fisher’s and Snedecor’s famous textbooks from the 20s and 30s, to the exploratory data analysis emphasised by Tukey in the 70s and the double-edged sword of computing power and accessible software which has been the major force changing stats education since about 1990. They note that:

In the early years, statistics had to lean heavily on probability for its legitimacy.

Yes indeed, and there are still many books and courses around where we bore and confuse students in the first part of their study by endless talk about flipping coins and opening doors to reveal goats. No wonder they don’t see stats as relevant to their lives! Probability is a wonderful weapon in our armoury, but we generally don’t start to use it in earnest until we are quite advanced in our practice as data analysts and are looking to do something a bit bespoke. If you are training nurses or physiotherapists to be researchers, you have to face facts that most of them have no intention of going that far down the road.

As a little aside, I too have got this wrong in the past. I thought that understanding how probability or risk can be a long-run based on lots of data (ischaemic heart disease caused 17.4% of all deaths in England and Wales), or a one-off event based on some data and a lot of assumptions (Obama was given a 90.9% chance of winning in 2012 by Nate Silver), or a subjective belief (I think there is a 3% chance my train home today will be delayed by more than 10 minutes), would be useful for my students. Actually, I don’t think any of them recalled that distinction by the end of the course. It had been pushed out (if it ever went in) by what the GAISE College Report calls “recipes”: simple algorithms that show you what test to pick and therefore how to pass the exam. These recipes have a powerful lure for students, there’s no good blaming them for being attracted to the recipe when there are books and websites and YouTube videos full of them. My attempt at deeper understanding failed because the relevance was not emphasised.

OK, time for some of those great quotes.

[Some courses teach] students to become statistically literate and wise consumers of data; this is somewhat similar to an art appreciation course. Some… teach students to become producers of statistical analyses; this is analogous to the studio [fine] art course. Most…are a blend of consumer and producer.

That is a great analogy, and it runs deeper than it might appear at first. To be a good data analyst, you need to learn some classic skills, and have the vision to understand what you are trying to communicate and the creativity to break the rules to better effect. I recently found an essay by composer and conductor Pierre Boulez which draws this parallel in the practice of music and math, which I have quoted in a forthcoming article in Significance on visualization (A life in stats – Nathan Yau).

In week 1 of the carpentry (statistics) course, we learned how to use various kinds of planes (summary statistics). In week 2, we learned about using hammers (confidence intervals). Later, we learned about the characteristics of different types of wood (tests). By the end of the course, we had covered many aspects of carpentry (statistics). But I wanted to learn how to build a table…and I never learned how to do that.

Teaching all the tools in the box is not the same as teaching how to think like a statistician. The latter is much harder and takes longer! But that is what we must aspire to. The satisfaction the student has at being able to do t-tests and report 95% CIs for the mean difference (and so pass their exam) will soon fade when they do the same with some real-life data on a scale with a ceiling effect and get told by peer reviewers that they should have bootstrapped. They would be justified in complaining that they were taught such a simplified set of tools as to be useless in the real world. But if we taught them how to think about the problem and investigate it quantitatively, and how to find help or learn new tricks, they would be much better equipped for using their new skills in earnest.

While demands for dealing with data in an information age continue to grow, advances in technology and software make tools and procedures easier to use and more accessible to more people, thus decreasing the need to teach the mechanics of procedures, but increasing the importance of giving people a sounder grasp of the fundamental concepts needed to use and interpret those tools intelligently.

And this leads to a point the GAISE College Report doesn’t make, but I would like to promote: stats graduates now need to understand what is going on inside their computers to some extent. I don’t mean getting them to do a Wilcoxon signed-rank test by hand, I mean talking them through some of the key issues about storing data, precision and rounding errors, efficient parameterisation, and achieving global optima with iterative algorithms. To do this you don’t need to teach a load of algebra and calculus (though it would help, of course…) and you can start to introduce other computer-intensive concepts like bootstrapping or MCMC. This, surely, is the direction we will have to expand our courses into in coming years. How many papers have you seen recently that just had classic descriptive stats, tests and nothing more advanced than a log-transform or linear regression? We are doing our students a disservice if we send them out into the world equipped to deal with data analysis in the 1980s.