Monthly Archives: April 2014

What is the significance of surgeons?

This is just a brief post to elaborate on my response to this tweet:

Everybody appeals to significance, but few have stopped to think about what it really means. You have to get a little philosophical (only a very little). I’m not sure Spiegelhalter actually used the word in this context, except as a rough shorthand when quickly answering questions, because I’m pretty sure he knows what it means!

Significance, along with confidence intervals and p-values, is one of the trappings of inference. In fact, significance just divides the p-values into whether they are above or below some threshold, making it the least informative of the three. Anyway, the important point is you have a sample and you are trying to say something about the population from which it was drawn*. If you have data on every patient the surgeons operated on last year (as we increasingly do), then inference to a population is meaningless. Your sample is the population. On the other hand, if you have a sample of last year’s patients, then you can make inference (if you believe you truly know the sampling mechanism) about the population of last year’s patients. But that almost certainly is not what you want to know. You want to know what next year will be like, whether you should go to Mr X or Miss A to have some odd growth chopped off. And, as Leckie and Goldstein showed us with school league tables, the accumulation of changes in the system make comparisons on past data almost completely uninformative. This is as strong or even stronger as an effect in healthcare, certainly in the UK where the National Health Service has been “liberated” by a “no top-down re-organisation” re-organisation, and in the USA where Obamacare has come into being (and the murmurs suggest waking up to cost-effectiveness next, if we’ve all got over Sarah Palin’s death panel).

So, the problem is not that they are significance or non-significant, it is that differences are taken to be clinically important (actually, they might be too small to worry anyone) and informative of patient and commissioner choice (they’re not). They are helpful for the professionals and their peers to learn from one another and shame the dullards into catching up, but only when combined with a heap of local insights into what’s going on. It’s important that we are transparent and publish them, but they need strong caveats, because at present they are sold to the public (and doctors) as the true objective measure of ‘quality’. The final warning from this gloomy post is that we try to extend the indicators of procedures and outcomes that are available to us into some vague global concept of quality. I leave it to the reader as an exercise (not of my own invention) to write down a definition of quality.

* – there may yet be some surviving ultra-frequentists out there who take it even further and believe you can only do inference if you can carry out infinite repetitions of the same data collection; presumably they would not permit any inference with data like these.

Your author is a member of the National Advisory group on Clinical Audit and Confidential Enquiries. This piece is his own personal view as a practicing statistician with an interest in healthcare quality indicators, and philosophy of science. It is not the view of NAGCAE, NHS England or Her Majesty’s Government (obviously, I would have thought).

Leave a comment

Filed under Uncategorized

Stats in bed, part 1: Ubuntu Touch

Round at the RSS Statistical Computing committee, we were having a chuckle at the prospect of a meeting about Stats In Bed. By which I mean analysis on mobile devices, phones and tablets (henceforth phablets), not some sort of raunchy performance indicator. This is something that has been nagging at the back of my mind for a while. Why, in this day and age, can’t we just run any software anywhere? Well, it’s because the major manufacturers have narrowed the scope for tinkering on phablets.

Although it seems silly, I know there is some time each day when I could do little useful tasks with a tablet that I can’t do with a laptop, even a very small lightweight one. One of the issues is time to start up or come out of hibernation in Windows, so I turned my attention to Linux, and in particular Ubuntu Touch.

I had just acquired a Nexus 10 tablet (comes with Android installed) for this purpose, and was also a Linux noob, so I was stumbling about as I experienced the first rays of dawn, like Bertie Wooster after a particularly libatious evening at the Drones. In this post, I’ll describe my first line of attack, which didn’t prove successful, but could be at some point in the future when the software develops further. My goal was to be able to run R, Julia and ideally Stan (therefore C++ compilation).

Ubuntu Touch is a work in progress, a new version of Ubuntu Linux designed for touch screens. Converting an operating system to work with touch screens is not a trivial job, although Ubuntu seem to have set themselves a dangerously optimistic timescale for this collective effort. A Linux smartphone has been waved around, and the USP is supposedly the goal of using the phablet on the move, then docking it in the office, where it carries on but scales the apps to work on your monitor*, keyboard and mouse. This is kind of cool, but I don’t feel a great need for it. I need a desktop that handles unpleasant computation quickly and without distracting itself, and frankly, although I groan every time my various versions of Windows grind to a halt to do… what? something obscure and, I suspect, unnecessary in the so-called background, I can’t live in the real world with other people and ditch Windows any time soon. If I was a hackathon hipster then I could, but I am somewhere in a hospital in the London suburbs, with employers and colleagues who require Windows.

* – my father-in-law actually used the term VDU the other day. I laughed, rather unfairly.

As Ubuntu Touch stands, it can be installed on a desktop for you to try apps out and contribute to the development, but installing on the phablet is rather more limited. Still, it’s not hard to do. The first step is to invalidate your warranty and root the phablet. Like all other first-timers, I had a vision of trying to sell it on for a couple of bucks after I had totally killed it, but it turns out to be much safer than that. If anything goes wrong, you can just replace the Android exactly as it came from the factory. You hold down all the buttons on the phablet together until the alarming sight appears of the little green robot on its back, with its guts opened up for some Android surgery. (I felt guilty at taking risks with the little guy.) You need a computer with Linux, you connect the phablet, and follow the (now deprecated) instructions from the link above. It’s amazingly simple. Then you have a Linux phablet, like this:


The username and password are phablet, which serves to remind you that this is a kind of preview version. And yet it works pretty well, despite some odd shadows lingering after windows have been swiped off to one side. The home screen looks like this:


and there are some other apps, none of which interested me:


But you can’t install new ones at this stage, even though it looks like you can. There is a dash search, and indeed a terminal app appears:


Which has tiny font, but whatever…


At this stage, I was getting pretty excited. From the terminal I could install R… maybe. It didn’t come with the Touch installation. It behaves like the desktop should when you issue an apt-get command to update your software, prior to installing the new R and Julia and so on.


But then fails. You see, a large part of the memory is not writeable, because this is an evaluation version. I just hope that is a temporary measure, and not yet another OS for phablets that blocks the user from poking around under the bonnet. Time will tell when (if?) the real Ubuntu Touch comes out.


So, that was an interesting diversion, but it was back to the drawing board and back to Android. One day it might be the choice to go with if you want to run R and friends on a phablet, but the small space given to the terminal wasn’t enough, and the keyboard was too chunky. Hopefully someone could write a full-screen terminal app, and a better keyboard, and hopefully the OS will allow it. Until then, you can read how I made friends with the little green Android dude in the second installment.


Filed under Julia, R

Data visualization online lecture, and deck.js markdown coming soon

I have a new set of dataviz slides online here, aimed at a clinical audit audience but of broad interest, I hope. It is rather updated from last year’s slides, and shares a lot of examples and points with the RSS talk from March. It seems a cliche to say new things are appearing every day, but in the case of this particular day, it’s true; my slides are already out of date a few hours after the talk.

You’ll note that, having played around with Slidify and other online slide libraries, I’ve come back to deck.js, which is a pleasure to work with. In fact, I made a little markdown program in R to take plain text and turn it into a ‘deck’. I’ve got a couple more bits to add to that, then it’s going up here, on the software page, and at Github. I’m thinking of compiling it too. I’m also rather pleased with my rlg.css which you can save into your /themes/style folder for a cool black background.

Leave a comment

Filed under Visualization

Animated graphs hits the Stata blog

Chuck Huber of StataCorp (the voice behind those great YouTube videos) has just been blogging about animated graphs. He looks into using Camtasia software as well as my ffmpeg approach. And even if you’re not interested in making any such graph, go and look at some of his wonderful GIFs which would make great teaching tools, for example around power and sample size.

The more I use ffmpeg, the more I appreciate it. Working with video files is a real pain nowadays. There used to be more compatibility across software and operating systems and browsers, but not they all seem to be closing ranks. This is a good overview; although the terror of 2011 turned out to be a little overstated, the direction of travel is there and the HTML5 video tag remains flawed through lack of support from the software houses. Just today I’d been messing about moving video files from one computer to another in the vague hope that somewhere I would find the right combination of permissions that could open them, edit them and save them again. It was a struggle. The closest I got was the oldest OS I had on a laptop: XP (no, I’m not going to update it because the support ended yesterday! It was the last good one!). Then in the end I realised I could just do it all from the command line with ffmpeg. Plus you get to look like a badass hacker if anyone looks over your shoulder!

ffmpeg in action (compare and contrast with your favourite proprietary video software NOT in action). Borrowed from

Leave a comment

Filed under animation, Stata

Look down the datascope

Maarten Lambrechts has a great post over at his blog. It’s all about interactive dataviz, regarding it as a datascope, that – like a telescope – lets you look deep into the data and see stuff you couldn’t otherwise. You must read it! But just to give you the punchline:

A good datascope

  1. unlocks a big amount of data
  2. for everyone
  3. with intuitive controls
  4. of which changes are immediately represented by changes in the visual output
  5. that respects the basic rules of good data visualization design
  6. and goes beyond what can be done with static images.

Maybe I should add a 7th rule: a facet or view of the datascope should be saveable and shareable.

Thanks to Diego Kuonen for sharing on Twitter

Leave a comment

Filed under Visualization

Including trials reporting medians in meta-analysis

I’ve been thinking a lot about how best to include trials that report incomplete stats (or just not the stats you want) in a meta-analysis. This led me to a 2005 paper by Hozo, Djulbegovic & Hozo. It’s a worthwhile read for all meta-analysts. They set out estimators for the mean & variance given the median, range & sample size. The process by which they got these estimators was a cunning use of inequalities.
However, I was left wondering about uncertainty around the estimates. Because I’ve been taking a Bayesian approach, I really want a conditional distribution for the unknown stats given what we do know. There is one point where the authors try a little sensitivity analysis by varying the mean and standard deviation that came from their estimators, and they found a change in the pooled estimate from their exemplar meta-analysis that is too big to ignore. They do give upper and lower bounds, but that’s not the same thing.
Another interesting problem is that the exemplar meta-analysis seems to have some substantial reporting bias; the studies reporting medians get converted to smaller means than those that reported means. A fully Bayesian approach would allow you to incorporate some prior information about that.

1 Comment

Filed under Bayesian

Data detective work: work out the numerator or denominator given a percentage

Here’s some fun I had today. If you are looking at some published stats and they tell you a percentage but not the numerator & denominator, you can still work them out. That’s to say, you can get your computer to grind through a lot of possible combinations and find which are compatible with the percentage. Usually you have some information about the range in which the numerator or denominator could lie. For example, I was looking at a paper which followed 63 people who had seen a nurse practitioner when they attended hospital, and the paper told me that 18.3% of those who responded had sought further healthcare. But not everyone had answered the question; we weren’t told how many but obviously it was less than or equal to 63. It didn’t take long to knock an R function together to find the compatible numerators given a range of possible denominators and the percentage, and later I did the opposite. Here they are:

 # deducing numerator from percentage and range of possible denominators
whatnum<-function(denoms,target,dp) {
	for (i in 1:(length(denoms))) {
		if(round(lo/d, digits=dp)==target) {
			if(round(hi/d, digits=dp)==target) {
				warning(paste("More than one numerator is compatible with denominator ",d,"; minima are returned",sep=""))
		else if(round(hi/d, digits=dp)==target) nums[i]<-hi
# and the opposite 
whatdenom<-function(nums,target,dp) {
	for (i in 1:(length(nums))) {
		if(round(n/lo, digits=dp)==target) {
			if(round(n/hi, digits=dp)==target) {
				warning(paste("More than one denominator is compatible with numerator ",n,"; minima are returned",sep=""))
		else if(round(n/hi, digits=dp)==target) denoms[i]<-hi

By typing
I could find straight away that the only possibility was 11/60.
That particular paper also had a typo in table 4 ("995.3%") which meant it could be 99.5% or 99.3% or 95.3%. I could run each of those through and establish that it could only possibly have been 95.3%. Handy for those pesky papers that you want to stick in a meta-analysis but are missing the raw numbers!


Filed under R