More active learning in statistics classes – and hypothesis testing too

Most statistics teachers would agree that our face-to-face time with students needs to get more ‘active’. The concepts and the critical thinking so essential to what we do only sinks in when you try it out. That applies as much to reading and critiquing other’s statistics as it does to working out your own. One area of particular interest to me is communicating statistical findings, something for which evidence of effective strategies is sorely lacking, so it remains most valuable to learn by doing.

It’s so easy to stand there and talk about what you do, but there’s no guarantee they get it or retain that information a week later. I always enjoy reading Andrew Gelman’s blog and a couple of interesting discussions about active learning came up there recently, which I’ll signpost and briefly summarise.

Firstly, thinking aloud about activating a survey class (and a graphics / comms one, but most of the responses are about the familiar survey topics). The consensus seems to be to let the students discover – painfully if necessary – for themselves. That means letting them collect and grapple with messy data, not contrived examples. There’s some nice pointers in there about stage-managing the student group experience (obviously we don’t really let them grapple unaided).

The statistical communication course came back next, with a refreshing theme that we don’t know how to do this (me neither, but we’re getting closer, I’d like to think). Check out O’Rourke’s suggested documents if nothing else!

Then, the problem of hypothesis testing. The dialogue between Vasishth and Gelman particularly crystallises the issue for practising analysts. It came back a couple of weeks later; I particularly like the section about a third of the way down after Deborah Mayo appears, like an avenging superhero, to demolish the widely used, over-simplified interpretation of hypothesis testing in a single sentence, after which Anonymous and Gelman cover a situation where two researchers look at the same data. Dr Good has a pre-specified hypothesis, tests it and finds a significant result, stops there and reports it. Dr Evil intends to keep fishing until he or she finds something sexy they can publish, but happens by chance to start with the same test as Dr Good. Satisfied with the magical p<0.05, they too stop and write it up. Is Evil’s work equivalent to Good’s? Is the issue with motivation or selection? Food for thought, but we have strayed from teaching into some kind of Socratic gunfight (doubly American!). However, I think there is no harm in exposing students (especially those already steeped in some professional practice like the healthcare professionals I teach) to these problems, because they already recognise them from published literature, although they might not formulate them quite so clearly. Along the way, someone linked to this rather nice post by Simine Vazire.

(I don’t want you to think I’ve wimped out, so here’s my view, although that’s really not what this post is about: Rahul wrote “The reasonable course might be for [Dr Evil] to treat this analysis as exploratory in the light of what he observed. Then collect another data set with the express goal of only testing for that specific hypothesis. And if he again gets p<0.01 then publish.” – which I agree with, but for me all statistical results are exploratory. They might be hypothesis testing as well, but they are never proving or disproving stuff, always stacking evidence quantitatively in the service of a fluffier mental process called abduction or Inference to the Best Explanation. They are merely a feeble attempt to make a quantitative, systematic, less biased representation of our own thoughts.)

Now, if you like a good hypothesis testing debate, consider the journal that banned tests, and keep watching StatsLife for some forthcoming opinions on the matter.

Leave a comment

Filed under learning

Adjust brightness in LXDE Linux

This is a little diversion from the usual stats.

I’ve been running LXDE Debian Linux on my small laptop for a while, and I’m really pleased with it. It handles all sorts of stuff and the fact that L stands for Lightweight hardly ever holds me back. But it doesn’t have any screen brightness controls, and it seems lots of people have asked about this on forums. Usually the issue is mixed up with allocating brightness control to a combination of keys, but that’s a bigger problem which depends on exactly the hardware you have. I just fixed it with a simple crude hack, and as it bothers everyone, I thought I’d share it here.

Take a look in your /sys/class/backlight folder. I’ve got a /samsung folder inside that, you might have something different, but whatever you have, look around until you find a file called brightness, and another called max_brightness. Open them in your text editor of choice. In my case max_brightness simply contains the number 8, and brightness 1. To change brightness of the screen, you change the number inside the brightness file. Make a new text file called go-dim, which contains this:

echo 1 > /sys/class/backlight/samsung/brightness

Then, one called go-bright, which contains this:

echo 4 > /sys/class/backlight/samsung/brightness

You don’t have to use 4 as a bright value, you can choose something else (less than or equal to the value inside max_brightness). Then save them somewhere easily accessible like the Desktop, open the terminal and type:

chmod a+x Desktop/go-dim

and

chmod a+x Desktop/go-bright

Now, you can double click those files on your desktop, choose “execute” and they will do their thing for you. Obviously, if you save them somewhere else, you need to type the correct path in the chmod command.

Leave a comment

Filed under computing

Kingston & St George’s Stats surgeries 2015

This year’s stats surgeries for our postgraduate students have just gone up on my website at http://www.robertgrantstats.co.uk/students.html

Leave a comment

Filed under learning

My talk for Calculating and Communicating Uncertainty 2015 – and an experimental D3 page

Tomorrow and Wednesday I’ll be at the CCU2015 conference in Westminster. I’m talking in a workshop session on Wednesday on interactive graphics – and in particular, how they can be used to communicate uncertainty- so I thought I would make an experimental page showing a couple of different ways of trying this. You can read my slides here and the experiment is here. While I’m sitting in the conference, I’ll probably tidy up a couple of rough edges, and in due course, I’ll make some more experiments. Please let me know what you think, and watch out for tweets with #ccu2015, although it means different things to different folks.

ccu-ex1

Leave a comment

Filed under animation, JavaScript, Visualization

Best dataviz of 2014

I expect everyone in the dataviz world would tell you this year was better than ever. It certainly seemed that way to me. I’m going to separate excellent visualisation for the purpose of communicating data from that for communicating methods.

In the first category, the minute I saw “How the Recession Reshaped the Economy, in 255 Charts“, it was so clearly head and shoulders above everything else that I could have started writing this post right then. It’s beautiful, intriguing and profoundly rich in information. And! quite unlike anything I’d seen in D3 before, or that’s to say it brings together a few hot trends, like scrolling to go through a deck, in exemplary style.

recession

Next, the use of JavaScript as a powerful programming language to do all manner of clever things in your web browser. Last year I was impressed by Rasmus Bååth’s MCMC in JavaScript, allowing me to do Bayesian analyses on my cellphone. This year I went off to ICOTS in Flagstaff AZ and learnt about StatKey, a pedagogical collection of simulation / randomisation / bootstrap methods, but you can put your own data in so why not use them in earnest? It is entirely written in JavaScript, and you know what that means – it’s open source, so take it and adapt it, making sure to acknowledge the work of this remarkable stats dynasty!

statkey

So, happy holidays. If the good Lord spares me, I expect to enjoy even more amazing viz in 2015.

Leave a comment

Filed under JavaScript, Visualization

An audit of audits

In England, and to some extent other parts of the UK (it’s confusing over here), clinical audits with a national scope are funded by HM Government via the Healthcare Quality Improvement Partnership (HQIP). Today, they have released a report from ongoing work to find out how these different audits operate. You can download it here. I am co-opted onto one of the sub-groups of the NHS England committee that decides which projects to fund, and as a statistician I always look for methodological rigour in these applications. The sort of thing that catches my eye, or more often worries me by its absence: plans for sampling, plans for data linkage, plans for imputing missing data, plans for risk adjustment and how these will be updated as the project accumulates data. Also, it’s important that the data collected is available to researchers, in a responsible way, and that requires good record-keeping, archiving and planning ahead.

I’ve just looked through the audit-of-audits report for statistical topics (which are not its main focus) and want to pick up a couple of points. In Table 3, we see that the statistical analysis plan is the area most likely to be missed out of an audit’s protocol. It’s amazing really, considering how central that is to their function. 24/28 work streams provide a user manual including data dictionary to the poor devils who have to type in their patients’ details late at night when they were supposed to have been at their anniversary party long ago (that’s how I picture it anyway); this really matters because the results are only as good what got typed in at 1 am. 4 of them take a sample of patients, rather than aiming for everyone, and although they can all say how many they are aiming for, only one could explain how they check for external validity and none could say what potential biases existed in their process. 20/28 use risk-adjustment, 16 of whom had done some form of validation.

Clearly there is some way to go, although a few audits achieve excellent standards. The problem is in getting those good practices passed along. Hopefully this piece of work will continue to get support and to feed into steady improvements in the audits.

Leave a comment

Filed under healthcare

Slice bivariate densities, or the Joy Division “waterfall plot”

This has been on my to-do list for a long old time. Lining up slices through a bivariate density seems a much more intuitive way of depicting it than contour plots or some ghastly rotating 3-D thing (urgh). Of course, there is the danger of features being hidden, but you know I’m a semi-transparency nut, so it’s no surprise I think that’s the answer to this too.

slicedens

Here’s an R function for you:

# x, y: data
# slices: number of horizontal slices through the data
# lboost: coefficient to increase the height of the lines
# gboost: coefficient to increase the height of the graph (ylim)
# xinc: horizontal offset for each succesive slice 
# (typically something like 1/80)
# yinc: vertical offset for each succesive slice
# bcol: background color
# fcol: fill color for each slice (polygon)
# lcol: line color for each slice
# lwidth: line width
# extend: Boolean to extend lines to edge of plot area
# densopt: list of strings containing density() arguments that
# are passed verbatim.
# NB if you want to cycle slice colors through vectors, you
# need to change the function code; it sounds like a
# pretty bad idea to me, but each to their own.
slicedens<-function(x,y,slices=50,lboost=1,gboost=1,xinc=0,yinc=0.01,
 bcol="black",fcol="black",lcol="white",lwidth=1,extend=FALSE,
 densopt=NULL) {
 ycut<-min(y)+((0:(slices))*(max(y)-min(y))/slices)
 height<-gboost*((slices*yinc)+max(density(x)$y))
 plot( c(min(x),max(x)+((max(x)-min(x))/4)),
 c(0,height),
 xaxt="n",yaxt="n",ylab="",xlab="")
 rect(par("usr")[1],par("usr")[3],par("usr")[2],par("usr")[4],col=bcol)
 for(i in slices:1) {
 miny<-ycut[i]
 maxy<-ycut[i+1]
 gx<-(i-1)*(max(x)-min(x))*xinc
 gy<-(i-1)*(height)*yinc
 dd<-do.call(density,append(list(x=x[y>=miny & y<maxy]),
 densopt))
 polygon(dd$x+gx,lboost*dd$y+gy,col=fcol)
 lines(dd$x+gx,lboost*dd$y+gy,col=lcol,lwd=lwidth)
 if(extend) {
 lines(c(par("usr")[1],dd$x[1]+gx),
 rep(lboost*dd$y[1]+gy,2),col=lcol)
 lines(c(dd$x[length(dd$x)]+gx,par("usr")[2]),
 rep(lboost*dd$y[length(dd$y)]+gy,2),col=lcol)
 }
 }
}
# Example 1:
y<-runif(5000,min=-1,max=1)
x<-runif(5000,min=-1,max=1)+rnorm(5000,mean=1/(y+1.1),sd=0.8-(y*0.5))
slicedens(x,y,lboost=0.2,fcol=rgb(0,0,0,200,maxColorValue=255))
# Example 2:
library(iris)
slicedens(x=iris$Sepal.Width,y=iris$Sepal.Length,slices=12,
 lboost=0.02,fcol=rgb(0,0,0,200,maxColorValue=255),
 extend=TRUE,densopt=list(kernel="cosine",adjust=0.5))

Some places call this a waterfall plot. Anyway, the white-on-black color scheme is clearly inspired by the Joy Division album cover. Enjoy.

Edit 9 October 2014: added the “extend” and “densopt” arguments.

3 Comments

Filed under R, Visualization