Tag Archives: teaching

Discomfiting jumps

I have been writing a book review of Efron & Hastie’s CASI for Significance magazine. Here’s a tangential half page I wrote but didn’t include.

Students of statistics or data science will typically encounter some discomfiting jumps in attitude as their course progresses. first, they may have a lot of probability theory and some likelihood-based inference for rather contrived problems, which will remind them of their maths classes at school. Ah, they think, I know how to do this. I learn the tricks to manipulate the symbols and get to the QED. Then, they find themselves suddenly in a course that provides tables of data and asks them to analyse and interpret. Suddenly it’s become a practical course that connects to the real world and leaves the maths behind for the most part. Now, there’s no QED given, and no tricks. The assessments suddenly are more like humanities subjects, there’s no right or wrong and it’s the coherence of their argument that matters. Now they have to remember which options to tick in their preferred stats software. They might think: why did we do the mathematical parts of this course at all if we’re not going to use them? Next, for some, come machine learning methods. Now, the inference and asymptotic assurances are not just hidden in the cogs of the computer but are actually absent. How do I know the random forest isn’t giving me the wrong answer? You don’t. It seems at first that when the problem gets really hard, like 21st-century-hard, land-a-job-at-Google-hard, we give up on stats as an interesting mental exercise from the 1930s in favour of “unreasonably effective” heuristics and greedy algorithms.

One really nice thing they do in CASI is to emphasise that all estimation, from standard deviations of samples to GAMs, are algorithms. The inference (I prefer to say “uncertainty”) for those algorithms follows later in the history of the subject. The 1930s methods had enough time to work out inference by now, but other methods are still developing their inferential procedures. This unifies things rather better, but most teaching has to catch up. One problem is that almost all the effort of reformers following George Cobb, Joan Garfield and others has been on the very early introduction to the subject. That’s probably the right place to fix first, but we need to broaden out and fix wider data science courses now.

Leave a comment

Filed under learning

Jasper tree ring fire scars – a teaching dataset

Today I’m sharing a nice little dataset that I think has some good features for teaching. Hope you like it.
I spotted this in the museum in Jasper, Alberta in 2012 and took a photo.

Jasper tree ring fire scars2

Later, I e-mailed the museum to find out who I should credit for it and we eventually found that it originated some time ago from Parks Canada, so thanks to them and I suggest you credit them as source if you use it.

No, I don’t have it in a file. I think working from the typewritten page is quite helpful as it keeps people out of stats software for this. They have to think. If you want to click buttons, there are a gazillion other datasets out there. This is a different kind of exercise.

Here we have the number of scars in tree rings that indicate fires in various years. If you look back in time through a tree’s rings, you can plot when it got damaged by fire but recovered. This could give an idea of the number of fires through the years, but only with some biases. It would be an interesting exercise for students who are getting to grips with the idea of a data-generating process. You could prompt them to think up and justify proposed biases, and hopefully they will agree on stuff like:

  • there’s a number of fires each year; we might be able to predict it with things like El Nino/a years, arrival of European settlers and other data sources*
  • the most ancient years will have few surviving trees, so more and more fires will get missed as you go back in time.
  • This might not be random, if the biggest (oldest) trees were more likely to get felled for wood
  • there will be a point (perhaps when Jasper became a national park) after which fires in the backwoods are actively prevented and fought, at which point the size of the fires, if not the number, should drop
  • the bigger the fire area, the more scars will be left behind; they have to decide to work with number of fires, or size (or both…)
  • the variables for size of the fire will be quite unreliable in the old days, but a good link from number of fires to number of scars otherwise
  • can we really trust the area of burn in the older years? to 2 decimal places in 1665?
  • and other things that are very clever and I haven’t dreamt of

* – once they are done with the data generating process, if they are confident enough with analysis, you could give them this dataset of Canada-wide forest fires, which I pulled together from a few years ago. It’s not without its own quirks, as you’ll see, but they might enjoy using it to corroborate some of their ideas.

I would ask them to propose a joint Bayesian model for the number of fires and area burnt over the years, including (if they want) predictions for the future (bearing in mind the data ends at 1971). You could also ask for sketched dataviz in a poster presentation, for example.

Finally, I highly recommend a trip to Jasper. What a beautiful part of the world!

Leave a comment

Filed under learning, Visualization

Complex systems reading

Tomorrow I’ll be giving a seminar in our faculty on inference in complex systems (like the health service, or social services, or local government, or society more generally). It’s the latest talk on this subject that is really gelling now into something of a manifesto. Rick Hood and I intend to send off the paper version before Xmas, so I won’t say more about the substance of it here (and the slides are just a bunch of aide-memoire images), other than to list the references, which contains some of my favourite sources on data+science:


I deliberately omit the methodologically detailed papers from this list, but in the main you should look into Bayesian modelling, generalised coarsening, generalised instrumental variable models, structural equation models, and their various intersections.

Leave a comment

Filed under Bayesian, research

Everything you need to make R Commander locally (packages, dependencies, zip files)

I’ve been installing R Commander on laptops for our students to use in tutorials. It’s tedious to put each one online with my login, download it all, then disable the internet (so they don’t send lewd e-mails to the vice-chancellor from my account, although I could always plead that I had misunderstood the meaning of his job title). I eventually got every package it needed downloaded and I’ve done it all off a USB stick. But I didn’t find a single list of all the Rcmdr dependencies, recursively. Maybe it’s out there but I didn’t find it. So, here it is. You might find it useful.


I suppose this is one of my less engaging posts…


Filed under learning, R

Overpowered and underpowered chi-squared tests – or are they?

This is a very quick thought for teaching, in passing. People often talk about chi-squared tests being overpowered when n is large. It occurs to me that a good way to broach this concept in an intuitive way is to point out that they are no different to t-tests and the like, but do not provide a meaningful point estimate. When you see the mean difference in blood pressure from drug X is 0.3mmHg, with p<0.001, you know it is clinically meaningless. When you see X2=3.89, nobody knows what to think. So perhaps the best thing to do is to mention this alongside non-parametric rank-based procedures, when you explain that they don’t give you an estimate or confidence interval.

Leave a comment

Filed under learning

Trends in teaching statistics (reporting from ICOTS 2014)

Last summer I was fortunate to attend three stats conferences in the USA. It was a mixture of exciting travel and hard slog, with some great learning and some unexpected surprises; among them, attempting to explain the Gloucestershire cheese-rolling race to a Navajo family, and wondering whether I could bring a dried buffalo scrotum back through Heathrow (it would’ve made a great pen-holder).

However, the academic highlight for me was ICOTS, the ninth International Conference On Teaching Statistics, in Flagstaff, Arizona. All I knew about Flagstaff I learnt from the Rolling Stones, so it was reassuring to find that my hotel room did indeed look over Route 66.

Get your teaching tricks on Route 66

Get your teaching tricks on Route 66

So, I want to leave aside the rest of my trip and share the biggest themes from ICOTS. I’m a university lecturer, so that’s what I’m focussing on, though there’s plenty to say about schools too. But first, a little background from my point of view.

When statistics evolved as a distinct academic discipline in the mid-20th century, it was invariably in the shadow of the mathematics department. To be taken seriously as an academic, one had (and still has) to display the trappings of expertise and rigour. Yet this could be done either by lots of abstraction and mathematics, or by lots of aspplication and real-life problems. Some of greatest names of that era, like Frank Wilcoxon and George Box, learned their skills from application (as did Fisher fifty years earlier), but mostly the maths won; it was the path of least resistance in a larger university setting, and that informed the teaching.

However, as years went by everybody wanted to learn some stats: economists, ecologists, archaeologists, doctors, you name it. But they typically weren’t so good at maths, at least in Europe and North America. Personally, I like to glibly attribute this to hippie school teachers, but that’s a little unfair. So, to accommodate these students, many introductory statistics courses for non-statisticians dumbed down. The mathematical foundations were largely dumped and replaced with recipes. You all know the sort:

  1. Do a Kolmogorov-Smirnov test.
  2. If p<0.05, do a Mann-Whitney.
  3. If not, do a t-test.
  4. Either way, if p<0.05, you can say the difference is ‘significant’.
It's nice to name statistical tests after the pioneers (like this here feller) but not so helpful for the students.

It’s nice to name statistical tests after the pioneers (like this here feller) but not so helpful for the students.

This has the effect of getting people to pass exams, but then have no idea what to do in real life, or worse, have inflated notions of their own competence. It is surface learning, not deep understanding. The choice between mathemania and cookbooks is the reason why people feel comfortable telling you that statistics was the only course they failed at university (cf http://www.marketmenot.com/mcdonalds-on-that-grind-commercial/), or – even more worrying – that they got an A but never understood what was going on.

The movement to revive introductory statistics courses is really focussed around the American Statistical Association’s Guidelines for Assessment and Instruction in Statistics Education (GAISE). This is the only set of guidelines on how to teach statistics, yet if you are a British statistics teacher you will probably never have heard of them. They are fairly widely used in the USA, Australia and Canada, though not universally by any means, but are wholeheartedly adopted in New Zealand, where they inform the national policy on teaching statistics. The principles are:

  • use real data, warts and all
  • introduce inference (the hardest bit) with simulation, not asymptotic formulas
  • emphasise computing skills (not a vocational course in one software package)
  • emphasise flexible problem-solving in context (not abstracted recipes)
  • use more active learning (call this “flipping”, if you really must)

The guidelines include a paragraph called “The Carpentry Analogy”, which I like so much I shall reproduce it here:

In week 1 of the carpentry (statistics) course, we learned to use various kinds of planes (summary statistics). In week 2, we learned to use different kinds of saws (graphs). Then, we learned about using hammers (confidence intervals). Later, we learned about the characteristics of different types of wood (tests). By the end of the course, we had covered many aspects of carpentry (statistics). But I wanted to learn how to build a table (collect and analyze data to answer a question) and I never learned how to do that.

The ICOTS crowd are preoccupied with how to achieve this in real life, and I will group the ideas into 3 broad topics:

  • reversing the traditional syllabus
  • inference by simulation
  • natural frequencies

, and then describe 2 other ideas which are interesting but less clearly defined how they could be implemented.

Reversing the traditional syllabus

Most introductory statistics courses follow an order unchanged since Snedecor’s 1937 textbook: the first to be aimed at people studying statistics (rather than learning how to analyse their own research data). It may begin with probability theory, though sometimes this is removed along with other mathematical content. At any rate, a problem here is that, without the mathematics that appears later for likelihood and the properties of distributions, the role of probability is unclear to the student. It is at best a fun introduction, full of flipping coins, rolling dice and goats hiding behind doors. But the contemporary, vocationally-focussed student (or customer) has less patience for goats and dice than their parents and grandparents did.

Next, we deal with distributions and their parameters, which also introduces descriptive statistics, although the distribution is an abstract and subtle concept, and there are many statistics which are not parameters. Again, the argument goes, once the properties of estimators was removed so as not to scare the students, it was no longer obvious why they should learn about parameters and distributions.

Then we move to tests and confidence intervals, though we may not talk much about the meaning or purpose of inference in case it is discouraging to the students. This is where they are at danger of acquiring the usual confusions: that the sampling distribution and data distribution are the same, that p-values are the chance of being wrong, and that inference can be done without consideration for the relationship between the sample and the population. Students can easily commit to memory magic spells such as “…in the population from which the sample was drawn…” and deploy them liberally to pass exams, without really understanding. Evidence from large classes suggests this is the point where marks and attendance drop.

Then we introduce comparison of multiple groups and perhaps some experimental design. There may be some mention of fixed and random effects (but necessarily vague) before we move to the final, advanced topic: regression. The appearance of regression at the end is Snedecor’s choice; if presented mathematically, that’s probably (!) the right order, because it depends on other concepts already introduced, but if we drop the maths, we can adopt a different order, one that follows the gradual building of students’ intuition and deeper understanding.

Andy Zieffler and colleagues at Minnesota have a programme called CATALST (http://iase-web.org/icots/9/proceedings/pdfs/ICOTS9_8B1_ZIEFFLER.pdf). This first introduces simulation from a model (marginal then conditional), then permutation tests, then bootstrapping. This equates to distributions, then regression, then hypothesis tests, then confidence intervals. This flips around Snedecor’s curriculum, and was echoed in a different talk by David Spiegelhalter. CATALST emphasises model+data throughout as an overarching framework. However, Zieffler noted that after 5 weeks the students do not yet have a deep concept of quantitative uncertainty (so don’t expect too much too quickly). Spiegelhalter’s version is focussed on dichotomous variables: start with a problem, represent it physically, do experiments, represent the results as trees or two-way tables or Venn diagrams to get conditional proportions, talk about expectation in future experiments, and finally get to probability. Probability manipulations like Bayes or P(a,b)=P(a|b)P(b) arrive naturally at the end and then lead to abstract notions of probability rather than the other way round. Visual aids are used throughout. One growth area that wasn’t represented much at ICOTS was interactive didactic graphics in the web browser (e.g. http://www2.le.ac.uk/Members/pl4/interactive-graphs). Some groups have developed Java applets and compiled software, but this suffers from translation onto different platforms and particularly onto mobile devices. The one group that have a product that is flexible and modern is the Lock family; more on them later.

Inference by simulation

The GAISE recommendation on introducing inference is a particularly hot topic. The notion is that students can get an intuitive grasp of what is going on with bootstrapping and randomisation tests far more easily than if you ask them to envisage a sampling distribution, arising from an infinite number of identical studies, drawing from a population, where the null hypothesis is true. This makes perfect sense to us teachers who have had years to think about it (and we are the survivors, not representative of the students.) When you pause to reflect that I have just described something that doesn’t exist, arising from a situation that can never happen, drawn from something you can never know, under circumstances that you know are not true, you see how this might not be the simplest mental somersault to ask of your students.

A common counter-argument is that simulation is an advanced topic. But this is an accident of history: non-parametrics, randomisation tests and bootstrapping were harder to do before computers, so we had to rely on relatively simple asymptotic formulas. That just isn’t true any more, and it hasn’t been since the advent of the personal computer, which brings home for me the extent of inertia in statistics teaching. Another argument is that the asymptotics are programmed in the software, so all students have to do is choose the right test and they get an answer. But you could also see this as a weakness; for many years statisticians have worried about software making things “too easy”, and this is exactly what that worry is about, that novices can get all manner of results out, pick an exciting p-value, write it up with some technical-sounding words and get it published. Simulation is a little like a QWERTY keyboard in that it slows you down just enough so you don’t jam the keys (younger readers may have to look this up). As for bootstrapping, most of us recall thinking it was too good to be true when we first heard about it, and we may fear the same reaction from our students, but that reaction is largely a result of being trained in getting confidence intervals the hard way, by second derivatives of the log-likelihood function. I’ve been telling them about bootstrapping (which is now super-easy in SPSS) since this academic year started, without so much as a flicker of surprise on their faces. A few days after ICOTS, I was having a cappuccino and a macaroon with Brad Efron in Palo Alto (my colleague Gill Mein says I am a terrible name-dropper but I’m just telling it like it is) and I asked him about this reaction. He said that when his 1979 paper came out, everybody said “it shouldn’t have been published because it’s obviously wrong” for a week and then “it shouldn’t have been published because it’s obvious” after that. I think that’s a pretty good sign of its acceptability. I just tell the students we’re doing the next best thing to re-running the experiment many times.

After the end of the main conference, I fought off a Conference Cold and went to a workshop on teaching inference by simulation down the road on the Northern Arizona University campus

Yes, that's a rack for students' skateboards. Cool, huh?

Yes, that’s a rack for students’ skateboards. Cool, huh?

This was split into two sessions, one with Beth Chance and Allan Rossman from Cal Poly (https://www.causeweb.org/ – which contains some information on CATALST too), another with some of the Locks (http://lock5stat.com/). Here a classroom full of stats lecturers worked through some of the exercises these simulation evangelists have tested and refined on their own students. One in particular I took away and have used several times since, with my own MRes students, other Kingston Uni Masters students, doctors in the UAE, clinical audit people in Leicester, and probably some others I have unfairly forgotten. It seems to work quite well, and its purpose is to introduce p-values and null hypothesis significance testing.

I take a bag of ‘pedagogical pennies’ and hand them out. Of course the students have their own coins but this makes it a little more memorable and discourages reserved people from sitting it out. A univariate one-group scenario is given to them that naturally has H0: pi=50%. You might say that ten of your patients have tried both ice packs and hot packs for their knee osteoarthritis, and 8 say they find the ice better. Could that be convincing enough for you to start recommending ice to everyone? (Or, depending on the audience, 8 out of 10 cats prefer Whiskas: https://youtu.be/jC1D_a1S2xs) I point out that the coin is a patient who has no preference (or a cat) and they all toss the coin 10 times. Then on the flipchart, I ask how many got no heads, 1 head, 2 heads… and draw a dot plot. We count how many got 0, 1, 2, 8, 9 or 10 and this proportion of the whole class is the approximate p-value. They also get to see a normal-ish sampling distribution emerge on the chart, and the students with weird results (I recently got my first 10/10; she thought it was a trick) can see that this is real life; when they get odd results in research, they just can’t see the other potential outcomes. Hopefully that shows them what the null hypothesis is, and the logic behind all p-values. It’s close enough to 0.05 to provoke some discussion.

The fact that simulation gives slightly different answers each time is also quite useful, because you can emphasise that p-values should be a continuous scale of evidence, not dichotomised, and that little tweaks to the analysis can tip over into significance a result that should really have been left well alone. (Of course, this is a different issue to sampling error, but as an aside it seems to work quite well.) One problem I haven’t worked a way around yet is that, at the end, I tell the students that of course they would really do this in the computer, which would allow them to run it 1000 times, not limited by their class size, and I fear that is a signal for them to forget everything that just happened. The best I can offer right now is to keep reminding them about the coin exercise, ten minutes later, half an hour later, at the end of the day and a week later if possible. I also worry that too much fun means the message is lost. Give them a snap question to tell you what a p-value is, with some multiple choices on a flipchart, offering common misunderstandings, and then go through each answer in turn to correct it. This is a tricky subject so it won’t be instant.

The Locks have a book out, the first to give a comprehensive course in stats with inference-by-simulation at its heart. It’s a great book and I recommend you check it out on their website. They also have some interactive analyses and graphics which allow the student to take one of their datasets (or enter their own!) and run permutation tests and bootstrap confidence intervals. It all runs in the browser so will work anywhere.


Natural frequencies

David Spiegelhalter spoke on the subject of natural frequencies with some passion. He has been involved in revising the content of the GCSE (16 year old) mathematics curriculum in the UK. Not every aspect in the final version was to his taste, but he made some inroads with this one, and was clearly delighted (http://understandinguncertainty.org/using-expected-frequencies-when-teaching-probability).

Some of the classic errors of probability can be addressed this way, without having to introduce algebraic notation. The important feature is that you are always dealing with a number of hypothetical people (or other units of analysis), with various things happening to some of them. It relates directly to a tree layout for probabilities, but without the annoying little fractions. I am also a fan of waffle plots for visualising proportions of a whole with a wide range of values (https://eagereyes.org/blog/2008/engaging-readers-with-square-pie-waffle-charts) and it would be nice to do something with these – perhaps bringing the interactive element in! One downside is that you often have to contrive the numbers to work out ‘nicely’ mathtex, which prevents you quickly responding to students’ “what if” questions.

Now for the couple of extras.

Statistical consulting is being used as a learning experience, in much the same way that you can get a cheap haircut from trainee hairdressers, at Pretoria (http://iase-web.org/icots/9/proceedings/pdfs/ICOTS9_C213_FLETCHER.pdf) & Truman State University (http://iase-web.org/icots/9/proceedings/pdfs/ICOTS9_C195_KIM.pdf). It sounds scary and hard work, but is a very innovative and bold idea, and we know that many students who are serious about using statistics in their career will have to do this to some extent, so why not give them some experience?

I was really impressed by Esther Isabelle Wilder of CUNY’s project NICHE (http://serc.carleton.edu/NICHE/index.html and http://iase-web.org/icots/9/proceedings/pdfs/ICOTS9_7D3_WILDER.pdf), which aims to boost statistical literacy in further and higher education, cutting across specialisms and silos in an institution. It acknowledges that many educators outside stats have to teach some, that they may be rusty and feel uncomfortable about it, and provides a safe environment for them to boost their stats skills and share good ideas. This is a very big and real problem and it would be great to see a UK version! Pre- and post-test among the faculty shows improvement in their comprehension, and they have to turn people away each summer because it has become so popular.

Finally, here’s a couple of exercises I liked the sound of:

Open three packets of M&Ms, and arrange them by colour. Get students to talk about what they can conclude about the contents of the next pack. (Fundamentalist frequentists might not like this.) This came from Markus Vogel & Andreas Eichler.

Ask students to design and conduct a study with an MP3 player, to try to determine whether the shuffling is random; this was devised by Andy Zieffler and reported by Katie Makar. We know that iPods in fact are not random, because customers initially complained that they were playing two or three songs from the same album together! I can’t vouch for other brands but Android’s built in player seems to do truly random things (as of v 4.3).

Leave a comment

Filed under learning

More active learning in statistics classes – and hypothesis testing too

Most statistics teachers would agree that our face-to-face time with students needs to get more ‘active’. The concepts and the critical thinking so essential to what we do only sinks in when you try it out. That applies as much to reading and critiquing other’s statistics as it does to working out your own. One area of particular interest to me is communicating statistical findings, something for which evidence of effective strategies is sorely lacking, so it remains most valuable to learn by doing.

It’s so easy to stand there and talk about what you do, but there’s no guarantee they get it or retain that information a week later. I always enjoy reading Andrew Gelman’s blog and a couple of interesting discussions about active learning came up there recently, which I’ll signpost and briefly summarise.

Firstly, thinking aloud about activating a survey class (and a graphics / comms one, but most of the responses are about the familiar survey topics). The consensus seems to be to let the students discover – painfully if necessary – for themselves. That means letting them collect and grapple with messy data, not contrived examples. There’s some nice pointers in there about stage-managing the student group experience (obviously we don’t really let them grapple unaided).

The statistical communication course came back next, with a refreshing theme that we don’t know how to do this (me neither, but we’re getting closer, I’d like to think). Check out O’Rourke’s suggested documents if nothing else!

Then, the problem of hypothesis testing. The dialogue between Vasishth and Gelman particularly crystallises the issue for practising analysts. It came back a couple of weeks later; I particularly like the section about a third of the way down after Deborah Mayo appears, like an avenging superhero, to demolish the widely used, over-simplified interpretation of hypothesis testing in a single sentence, after which Anonymous and Gelman cover a situation where two researchers look at the same data. Dr Good has a pre-specified hypothesis, tests it and finds a significant result, stops there and reports it. Dr Evil intends to keep fishing until he or she finds something sexy they can publish, but happens by chance to start with the same test as Dr Good. Satisfied with the magical p<0.05, they too stop and write it up. Is Evil’s work equivalent to Good’s? Is the issue with motivation or selection? Food for thought, but we have strayed from teaching into some kind of Socratic gunfight (doubly American!). However, I think there is no harm in exposing students (especially those already steeped in some professional practice like the healthcare professionals I teach) to these problems, because they already recognise them from published literature, although they might not formulate them quite so clearly. Along the way, someone linked to this rather nice post by Simine Vazire.

(I don’t want you to think I’ve wimped out, so here’s my view, although that’s really not what this post is about: Rahul wrote “The reasonable course might be for [Dr Evil] to treat this analysis as exploratory in the light of what he observed. Then collect another data set with the express goal of only testing for that specific hypothesis. And if he again gets p<0.01 then publish.” – which I agree with, but for me all statistical results are exploratory. They might be hypothesis testing as well, but they are never proving or disproving stuff, always stacking evidence quantitatively in the service of a fluffier mental process called abduction or Inference to the Best Explanation. They are merely a feeble attempt to make a quantitative, systematic, less biased representation of our own thoughts.)

Now, if you like a good hypothesis testing debate, consider the journal that banned tests, and keep watching StatsLife for some forthcoming opinions on the matter.

Leave a comment

Filed under learning