Last summer I was fortunate to attend three stats conferences in the USA. It was a mixture of exciting travel and hard slog, with some great learning and some unexpected surprises; among them, attempting to explain the Gloucestershire cheese-rolling race to a Navajo family, and wondering whether I could bring a dried buffalo scrotum back through Heathrow (it would’ve made a great pen-holder).

However, the academic highlight for me was ICOTS, the ninth International Conference On Teaching Statistics, in Flagstaff, Arizona. All I knew about Flagstaff I learnt from the Rolling Stones, so it was reassuring to find that my hotel room did indeed look over Route 66.

So, I want to leave aside the rest of my trip and share the biggest themes from ICOTS. I’m a university lecturer, so that’s what I’m focussing on, though there’s plenty to say about schools too. But first, a little background from my point of view.

When statistics evolved as a distinct academic discipline in the mid-20th century, it was invariably in the shadow of the mathematics department. To be taken seriously as an academic, one had (and still has) to display the trappings of expertise and rigour. Yet this could be done either by lots of abstraction and mathematics, or by lots of aspplication and real-life problems. Some of greatest names of that era, like Frank Wilcoxon and George Box, learned their skills from application (as did Fisher fifty years earlier), but mostly the maths won; it was the path of least resistance in a larger university setting, and that informed the teaching.

However, as years went by everybody wanted to learn some stats: economists, ecologists, archaeologists, doctors, you name it. But they typically weren’t so good at maths, at least in Europe and North America. Personally, I like to glibly attribute this to hippie school teachers, but that’s a little unfair. So, to accommodate these students, many introductory statistics courses for non-statisticians dumbed down. The mathematical foundations were largely dumped and replaced with recipes. You all know the sort:

- Do a Kolmogorov-Smirnov test.
- If p<0.05, do a Mann-Whitney.
- If not, do a t-test.
- Either way, if p<0.05, you can say the difference is ‘significant’.

This has the effect of getting people to pass exams, but then have no idea what to do in real life, or worse, have inflated notions of their own competence. It is surface learning, not deep understanding. The choice between mathemania and cookbooks is the reason why people feel comfortable telling you that statistics was the only course they failed at university (cf http://www.marketmenot.com/mcdonalds-on-that-grind-commercial/), or – even more worrying – that they got an A but never understood what was going on.

The movement to revive introductory statistics courses is really focussed around the American Statistical Association’s Guidelines for Assessment and Instruction in Statistics Education (GAISE). This is the only set of guidelines on how to teach statistics, yet if you are a British statistics teacher you will probably never have heard of them. They are fairly widely used in the USA, Australia and Canada, though not universally by any means, but are wholeheartedly adopted in New Zealand, where they inform the national policy on teaching statistics. The principles are:

- use real data, warts and all
- introduce inference (the hardest bit) with simulation, not asymptotic formulas
- emphasise computing skills (not a vocational course in one software package)
- emphasise flexible problem-solving in context (not abstracted recipes)
- use more active learning (call this “flipping”, if you really must)

The guidelines include a paragraph called “The Carpentry Analogy”, which I like so much I shall reproduce it here:

In week 1 of the carpentry (statistics) course, we learned to use various kinds of planes (summary statistics). In week 2, we learned to use different kinds of saws (graphs). Then, we learned about using hammers (conﬁdence intervals). Later, we learned about the characteristics of different types of wood (tests). By the end of the course, we had covered many aspects of carpentry (statistics). But I wanted to learn how to build a table (collect and analyze data to answer a question) and I never learned how to do that.

The ICOTS crowd are preoccupied with how to achieve this in real life, and I will group the ideas into 3 broad topics:

- reversing the traditional syllabus
- inference by simulation
- natural frequencies

, and then describe 2 other ideas which are interesting but less clearly defined how they could be implemented.

**Reversing the traditional syllabus**

Most introductory statistics courses follow an order unchanged since Snedecor’s 1937 textbook: the first to be aimed at people studying statistics (rather than learning how to analyse their own research data). It may begin with probability theory, though sometimes this is removed along with other mathematical content. At any rate, a problem here is that, without the mathematics that appears later for likelihood and the properties of distributions, the role of probability is unclear to the student. It is at best a fun introduction, full of flipping coins, rolling dice and goats hiding behind doors. But the contemporary, vocationally-focussed student (or customer) has less patience for goats and dice than their parents and grandparents did.

Next, we deal with distributions and their parameters, which also introduces descriptive statistics, although the distribution is an abstract and subtle concept, and there are many statistics which are not parameters. Again, the argument goes, once the properties of estimators was removed so as not to scare the students, it was no longer obvious why they should learn about parameters and distributions.

Then we move to tests and confidence intervals, though we may not talk much about the meaning or purpose of inference in case it is discouraging to the students. This is where they are at danger of acquiring the usual confusions: that the sampling distribution and data distribution are the same, that p-values are the chance of being wrong, and that inference can be done without consideration for the relationship between the sample and the population. Students can easily commit to memory magic spells such as “…in the population from which the sample was drawn…” and deploy them liberally to pass exams, without really understanding. Evidence from large classes suggests this is the point where marks and attendance drop.

Then we introduce comparison of multiple groups and perhaps some experimental design. There may be some mention of fixed and random effects (but necessarily vague) before we move to the final, advanced topic: regression. The appearance of regression at the end is Snedecor’s choice; if presented mathematically, that’s probably (!) the right order, because it depends on other concepts already introduced, but if we drop the maths, we can adopt a different order, one that follows the gradual building of students’ intuition and deeper understanding.

Andy Zieffler and colleagues at Minnesota have a programme called CATALST (http://iase-web.org/icots/9/proceedings/pdfs/ICOTS9_8B1_ZIEFFLER.pdf). This first introduces simulation from a model (marginal then conditional), then permutation tests, then bootstrapping. This equates to distributions, then regression, then hypothesis tests, then confidence intervals. This flips around Snedecor’s curriculum, and was echoed in a different talk by David Spiegelhalter. CATALST emphasises model+data throughout as an overarching framework. However, Zieffler noted that after 5 weeks the students do not yet have a deep concept of quantitative uncertainty (so don’t expect too much too quickly). Spiegelhalter’s version is focussed on dichotomous variables: start with a problem, represent it physically, do experiments, represent the results as trees or two-way tables or Venn diagrams to get conditional proportions, talk about expectation in future experiments, and finally get to probability. Probability manipulations like Bayes or P(a,b)=P(a|b)P(b) arrive naturally at the end and then lead to abstract notions of probability rather than the other way round. Visual aids are used throughout. One growth area that wasn’t represented much at ICOTS was interactive didactic graphics in the web browser (e.g. http://www2.le.ac.uk/Members/pl4/interactive-graphs). Some groups have developed Java applets and compiled software, but this suffers from translation onto different platforms and particularly onto mobile devices. The one group that have a product that is flexible and modern is the Lock family; more on them later.

**Inference by simulation**

The GAISE recommendation on introducing inference is a particularly hot topic. The notion is that students can get an intuitive grasp of what is going on with bootstrapping and randomisation tests far more easily than if you ask them to envisage a sampling distribution, arising from an infinite number of identical studies, drawing from a population, where the null hypothesis is true. This makes perfect sense to us teachers who have had years to think about it (and we are the survivors, not representative of the students.) When you pause to reflect that I have just described something that doesn’t exist, arising from a situation that can never happen, drawn from something you can never know, under circumstances that you know are not true, you see how this might not be the simplest mental somersault to ask of your students.

A common counter-argument is that simulation is an advanced topic. But this is an accident of history: non-parametrics, randomisation tests and bootstrapping were harder to do before computers, so we had to rely on relatively simple asymptotic formulas. That just isn’t true any more, and it hasn’t been since the advent of the personal computer, which brings home for me the extent of inertia in statistics teaching. Another argument is that the asymptotics are programmed in the software, so all students have to do is choose the right test and they get an answer. But you could also see this as a weakness; for many years statisticians have worried about software making things “too easy”, and this is exactly what that worry is about, that novices can get all manner of results out, pick an exciting p-value, write it up with some technical-sounding words and get it published. Simulation is a little like a QWERTY keyboard in that it slows you down just enough so you don’t jam the keys (younger readers may have to look this up). As for bootstrapping, most of us recall thinking it was too good to be true when we first heard about it, and we may fear the same reaction from our students, but that reaction is largely a result of being trained in getting confidence intervals the hard way, by second derivatives of the log-likelihood function. I’ve been telling them about bootstrapping (which is now super-easy in SPSS) since this academic year started, without so much as a flicker of surprise on their faces. A few days after ICOTS, I was having a cappuccino and a macaroon with Brad Efron in Palo Alto (my colleague Gill Mein says I am a terrible name-dropper but I’m just telling it like it is) and I asked him about this reaction. He said that when his 1979 paper came out, everybody said “it shouldn’t have been published because it’s obviously wrong” for a week and then “it shouldn’t have been published because it’s obvious” after that. I think that’s a pretty good sign of its acceptability. I just tell the students we’re doing the next best thing to re-running the experiment many times.

After the end of the main conference, I fought off a Conference Cold and went to a workshop on teaching inference by simulation down the road on the Northern Arizona University campus

This was split into two sessions, one with Beth Chance and Allan Rossman from Cal Poly (https://www.causeweb.org/ – which contains some information on CATALST too), another with some of the Locks (http://lock5stat.com/). Here a classroom full of stats lecturers worked through some of the exercises these simulation evangelists have tested and refined on their own students. One in particular I took away and have used several times since, with my own MRes students, other Kingston Uni Masters students, doctors in the UAE, clinical audit people in Leicester, and probably some others I have unfairly forgotten. It seems to work quite well, and its purpose is to introduce p-values and null hypothesis significance testing.

I take a bag of ‘pedagogical pennies’ and hand them out. Of course the students have their own coins but this makes it a little more memorable and discourages reserved people from sitting it out. A univariate one-group scenario is given to them that naturally has H0: pi=50%. You might say that ten of your patients have tried both ice packs and hot packs for their knee osteoarthritis, and 8 say they find the ice better. Could that be convincing enough for you to start recommending ice to everyone? (Or, depending on the audience, 8 out of 10 cats prefer Whiskas: https://youtu.be/jC1D_a1S2xs) I point out that the coin is a patient who has no preference (or a cat) and they all toss the coin 10 times. Then on the flipchart, I ask how many got no heads, 1 head, 2 heads… and draw a dot plot. We count how many got 0, 1, 2, 8, 9 or 10 and this proportion of the whole class is the approximate p-value. They also get to see a normal-ish sampling distribution emerge on the chart, and the students with weird results (I recently got my first 10/10; she thought it was a trick) can see that this is real life; when they get odd results in research, they just can’t see the other potential outcomes. Hopefully that shows them what the null hypothesis is, and the logic behind all p-values. It’s close enough to 0.05 to provoke some discussion.

The fact that simulation gives slightly different answers each time is also quite useful, because you can emphasise that p-values should be a continuous scale of evidence, not dichotomised, and that little tweaks to the analysis can tip over into significance a result that should really have been left well alone. (Of course, this is a different issue to sampling error, but as an aside it seems to work quite well.) One problem I haven’t worked a way around yet is that, at the end, I tell the students that of course they would really do this in the computer, which would allow them to run it 1000 times, not limited by their class size, and I fear that is a signal for them to forget everything that just happened. The best I can offer right now is to keep reminding them about the coin exercise, ten minutes later, half an hour later, at the end of the day and a week later if possible. I also worry that too much fun means the message is lost. Give them a snap question to tell you what a p-value is, with some multiple choices on a flipchart, offering common misunderstandings, and then go through each answer in turn to correct it. This is a tricky subject so it won’t be instant.

The Locks have a book out, the first to give a comprehensive course in stats with inference-by-simulation at its heart. It’s a great book and I recommend you check it out on their website. They also have some interactive analyses and graphics which allow the student to take one of their datasets (or enter their own!) and run permutation tests and bootstrap confidence intervals. It all runs in the browser so will work anywhere.

**Natural frequencies**

David Spiegelhalter spoke on the subject of natural frequencies with some passion. He has been involved in revising the content of the GCSE (16 year old) mathematics curriculum in the UK. Not every aspect in the final version was to his taste, but he made some inroads with this one, and was clearly delighted (http://understandinguncertainty.org/using-expected-frequencies-when-teaching-probability).

Some of the classic errors of probability can be addressed this way, without having to introduce algebraic notation. The important feature is that you are always dealing with a number of hypothetical people (or other units of analysis), with various things happening to some of them. It relates directly to a tree layout for probabilities, but without the annoying little fractions. I am also a fan of waffle plots for visualising proportions of a whole with a wide range of values (https://eagereyes.org/blog/2008/engaging-readers-with-square-pie-waffle-charts) and it would be nice to do something with these – perhaps bringing the interactive element in! One downside is that you often have to contrive the numbers to work out ‘nicely’ , which prevents you quickly responding to students’ “what if” questions.

**Now for the couple of extras.**

Statistical consulting is being used as a learning experience, in much the same way that you can get a cheap haircut from trainee hairdressers, at Pretoria (http://iase-web.org/icots/9/proceedings/pdfs/ICOTS9_C213_FLETCHER.pdf) & Truman State University (http://iase-web.org/icots/9/proceedings/pdfs/ICOTS9_C195_KIM.pdf). It sounds scary and hard work, but is a very innovative and bold idea, and we know that many students who are serious about using statistics in their career will have to do this to some extent, so why not give them some experience?

I was really impressed by Esther Isabelle Wilder of CUNY’s project NICHE (http://serc.carleton.edu/NICHE/index.html and http://iase-web.org/icots/9/proceedings/pdfs/ICOTS9_7D3_WILDER.pdf), which aims to boost statistical literacy in further and higher education, cutting across specialisms and silos in an institution. It acknowledges that many educators outside stats have to teach some, that they may be rusty and feel uncomfortable about it, and provides a safe environment for them to boost their stats skills and share good ideas. This is a very big and real problem and it would be great to see a UK version! Pre- and post-test among the faculty shows improvement in their comprehension, and they have to turn people away each summer because it has become so popular.

Finally, here’s **a couple of exercises** I liked the sound of:

Open three packets of M&Ms, and arrange them by colour. Get students to talk about what they can conclude about the contents of the next pack. (Fundamentalist frequentists might not like this.) This came from Markus Vogel & Andreas Eichler.

Ask students to design and conduct a study with an MP3 player, to try to determine whether the shuffling is random; this was devised by Andy Zieffler and reported by Katie Makar. We know that iPods in fact are not random, because customers initially complained that they were playing two or three songs from the same album together! I can’t vouch for other brands but Android’s built in player seems to do truly random things (as of v 4.3).