Monthly Archives: October 2013

New simulation study of algorithms for propensity score matching

Peter Austin of the University of Toronto has written a useful paper that’s just out in Stats In Med. He looks at 12 different ways of matching exposed and unexposed cases on their propensity scores and covariates, using a simulation study. The algorithms range from computationally complex OR-type approaches where every potential pair is evaluated and an overall optimum matching found, to simpler ‘greedy’ ones where each exposed observation grabs the nearest unexposed observation without consideration for the overall optimality. Then, you have the choice of putting the paired unexposed observations back into the pot to be paired again to another exposed observation. Finally, you could define the closest match in some different ways, which were investigated back in 1985 in a very readable paper by Rosenbaum and Rubin.


The solid black circle is the exposed case. You might choose the closest unexposed observation on the basis of the covariates alone (Mahalanobis distance – the triangle), the propensity score alone (the square), both (Maha again – the star), or both having restricted to a subset within a certain distance in the propensity score, which Rosenbaum and Rubin memorably call “calipers” (the hollow circle).

The trouble with not using the calipers is two-fold: greater computing challenge and the propensity score being swamped by the many dimensions of covariates in the Mahalanobis distance. Rosenbaum & Rubin showed that the effect was to achieve balance in covariates, but quite bad imbalances in the propensity score. It also reduces the number of matrix manipulations by a few degrees of magnitude.

Austin’s final conclusion is that in most situations greedy calipers without replacement are best. There are some more subtle points made too, so if you are interested in propensity score matching, get a hold of this paper!

Leave a comment

Filed under Uncategorized

On multiplicity

Last week, the New Scientist ran a most enjoyable editorial and accompanying article on wrong study findings in neuroscience. They start with the old chestnut of the dead salmon and move on to John Ioannides and his claim that most research is wrong, backed up with a more recent critique of underpowered and publication-biased studies, with Katherine Button and other colleagues. If there’s little chance of detecting a genuine effect, goes their argument, then whenever you find something that looks interesting, it’s more likely to be a false alarm.

In the New Scientist article, another critic of dodgy neuroscience is worried about a media witch hunt against neuroscience results, producing “this kind of global nihilism that all of neuroscience is bullshit”. I recognise this in my own students. I have to teach them to be critical – very critical – of published research, and I show them some corkers published in great journals which are fundamentally flawed. After a talk and a group exercise in critiquing, there is usually a fairly large proportion of the room saying that everything is b.s. and that you can’t trust stats. I’d like to think I bring them back to the middle ground at that point, but I fear that sometimes they go back into the wild with this strange pessimistic notion.

The more interesting angle is on multiplicity. Genome wide association studies had to crack this in a principled way, because when you compare millions or billions of potential risk factors, many of them are going to come out as significant. Thousands or millions are going to give the fabled p<0.001, even if nothing is going on. Brain imaging types tend to be more inclined to machine learning ideas (in its crudest form, divide the sample, look for patterns in the first half, then see if they are confirmed in the other) than p-adjustment and Bayes factors, from what I see, but the problem is the same.

But I think the most intriguing angle in this whole disconcerting mess is that we deal with multiplicitous studies all the time. As Andrew Gelman described recently, there are many ways you could pick your hypothesis, define your variables and analyse your data, and you don’t account for that multiplicity other than in an unspoken gut-feeling kind of way. Ioannides commented on this too, in “Why most research findings are false”, with a nice turn of phrase:

We should then acknowledge that statistical significance testing in the report of a single study gives only a partial picture, without knowing how much testing has been done outside the report and in the relevant field at large. Despite a large statistical literature for multiple testing corrections, usually it is impossible to decipher how much data dredging by the reporting authors or other research teams has preceded a reported research finding.

That is not cause for throwing up our hands and giving up (like the students), but just a fact of every scientific endeavour. Openness is the answer, making the data, the analysis code and all the protocols and analysis plans available for others to pick over. And it’s the journal publishers who will drive this, because even if we researchers do fervently believe in openness, we are stopped by copyright, fear of criticism, and the burning need to get on with the next piece of work.

Leave a comment

Filed under Uncategorized

Maths on trial at the Conway Hall

This should be interesting:

Sunday Nov 3rd, 11am-1pm, Conway Hall Ethical Society, Red Lion Sq, Holborn.

Coralie Colmez will talk on ‘Maths on Trial. How numbers get used and abused in the courtroom’

Thanks to Jay Ginn for posting on Radstats.

Leave a comment

Filed under Uncategorized

How open are your data?

I’ve just been looking at this interesting piece of work by the Open Knowledge Foundation which looks at the data made available by various states. Essentially, they looked at topics such as government budget, election results and national maps in each country and checked whether these subjects had data available that was up-to-date, free of charge, online, available in bulk, machine-readable, publicly available, openly licensed…

As you may know, I get twitchy about taking rich data like this and boiling them down to an index, let alone a league table, but I guess that’s the way to get the headlines. The UK is the most open country, which is nice. Those at the bottom are characterized not by their secretive dictatorships (they didn’t bother with North Korea), but by their incompetence. In these cases, the data either doesn’t exist, or nobody knows if it is available or not. That sounds to me like an index that measures more than just openness. There are also some countries that get marked down for such data not existing in the first place, for example election results in China. Is this an index of democracy or open data? Switzerland looks bad because the government doesn’t provide timetables for the famously reliable public transport; I wonder if anyone cares.

I’m not convinced that all the topics carry equal weight anyway. Transport timetables get marked down if the government doesn’t provide them, but the notion that the government must do this job is a very social democratic one. I’m there, but I don’t see why everyone should be. National maps might come as a surprise to those unfamiliar with less relaxed milieux; when I was a schoolboy in Apartheid era South Africa we (white kids) were taught how to read topographical maps, navigate by the sun etc. At the end of each lesson the maps got counted back in and locked in the school vaults. You couldn’t buy one or look at them in the library, lest they fell into the “wrong hands”. Now, looking back, I realise our lessons were more about war than geography. Blocking access to maps and the internet is way more serious than who prints the bus timetable.

I’d have gone to gaol for this

And another question that springs to mind is who accesses the open data? Nerds steeped in N-triples and JSON parsing? Because that ain’t open in my book.

The presentation is a nice color-coded chart with popup info on clicking, although irritatingly there is no way of closing the popup. But it’s thought-provoking stuff.

Leave a comment

Filed under Uncategorized

Social network analysis of physicians and their attitudes to evidence-based medicine

Here’s an interesting paper just out in BMC HSR. The authors sent a questionnaire to all (presumably, although they are a bit vague about that) physicians in an Italian healthcare organisation. They were asked some demographics, to name their peers with whom they discuss cases or seek or give advice, and then their attitudes to evidence-based medicine*.


Figure 1 from the paper: the social network of 297 physicians. CC-BY Mascia et al 2013

Then they found that being having positive EBM views predicted being at the core and well-connected in the network. Strange, really, because I would have seen the causality as the other way round, but never mind, the association is the same, albeit harder to explain the effect size. This is based on a core vs periphery classification but I wonder whether a spatial model (after multidimensional scaling perhaps?) would be better at utilising all the information, rather than splitting the network into two areas. Also, a structural equation model would make better use of the attitude data than just adding them together to get a score. Some parts of the network may score high in certain attitudes and low in others, thus cancelling out in an overall score.

Interesting stuff though, and worth doing in many other topics around clinician attitudes and preferences, which are such a mystery in a lot of what we do (“confounding by indication”).

* – if you haven’t encountered this term before, evidence-based medicine means treating your patients based on scientific research, not just what your professor told you when you were at med school (because he got it from his prof, and so on back to Ibn Sina, Galen and Aesclepios). People outside the healthcare world are generally rather shocked to find out there is anything other than EBM being practiced!

Leave a comment

Filed under Uncategorized

The world’s favourite stats papers

People often say that Bland and Altman’s paper where they set out the eponymous plot for comparing two measures in medical statistics is the most-cited stats paper ever. I thought I would poke around on Google Scholar and see what the citations looked like there.


Martin Bland (left) & Doug Altman, Cambridge 1981. Photo courtesy of Martin Bland’s homepage at York.

In terms of total citations, and given all the shortcoming of this as a measure of anything, there are two ahead of B&A, and they needn’t feel cheated, as we’re talking about titans of statistics here. Here’s the rankings for the seven papers I could think of testing:

  1. Cox (1972) Regression and life tables: 35,512 citations. 
  2. DLR (1977) Maximum likelihood from incomplete data via the EM algorithm: 34,988
  3. Bland & Altman (1986) Statistical methods for assessing agreement between two methods of clinical measurement: 27,181
  4. Geman & Geman (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images: 15,106
  5. Efron (1979) Bootstrap methods: another look at the jackknife: 9686
  6. Tibshirani (1996) Regression shrinkage and selection via the lasso: 8744
  7. Nelder & Wedderburn (1972) Generalized linear models: 3818

Can anyone think of any other landmark papers to look up?

Nelder & Wedderburn invented GLMs, so you’d think they should be pretty darn near the top, but for two things, I suppose. Firstly, the most popular of these models, logistic regression and Poisson regression, are so commonplace that people no longer cite them, and secondly, the book by McCullagh and Nelder (following Robert Wedderburn’s tragic death at the age of 28) attracts most of the citations. Adding all the variants on it in Google Scholar, you get 24,297 citations, which would take GLMs up to third place, overtaking B&A, but then that is rather unfair on others with much-cited books like Little & Rubin or David Cox.

When considered per year since publication, you have to remember Google Scholar is not measuring the same thing each year. Since it got going, Google have put effort into going back into the archives and getting more books, reports and grey lit on the system. Recent years are going to produce more citations simply because of an inclusion bias, not to mention the fact that a lot more gets written and published each year now (most of it rubbish). But, given all that, B&A come out on top with 1007 citations per year, DLR second with 972, and Cox third with 866.


Filed under Uncategorized

Healthcare quality is trendy again

Yesterday I received a promotional email from a publishing company telling me that next week is “Healthcare Quality Week” (but perhaps only in the USA). Who’da thought it? Apparently a great way to celebrate this would be to subscribe to one of their journals. Well, I managed to say no to a danish with my coffee at East Croydon station this morning, so I think I can resist this lesser temptation too.

But the quality of care in the UK is definitely getting a lot of attention. It sort of was trendy for a while in the noughties, discredited through Star Ratings and Sunday newspaper league tables, went away and is now coming back into vogue. It’s worth doing, but it’s jolly hard to look across indicators, specialisms, care settings and disease topics and find an underlying pattern identifying the “bad apples”. (When I say jolly hard, I actually mean impossible. It’s British understatement.)


Yet finding the wrong uns is what people generally expect will happen. People like our Secretary of State for Health, Jeremy Hunt, who said:

As an MP I know how well each school in my constituency is doing thanks to independent and thorough Ofsted inspections. But because the Care Quality Commission only measures whether minimum standards have been reached, I do not know the same about hospitals and care homes. I am not advocating a return to the old ‘star ratings’ but the principle that there should be an easy-to-understand, independent and expert assessment of how well somewhere is doing relative to its peers must be right.

“Right” as in good, I suppose, not “right” as in correct. Well, as the song goes, we are where we are, let’s all get on with it.

The Nuffield Trust and Health Foundation are prominent in this, launching this week something called QualityWatch, joining the field as another publisher of league tables and interactive graphics using existing official data. There is no government endorsement, although that may come. Nuffield Trust ran Mr Hunt’s consultation on rating healthcare providers earlier this year, giving it the thumbs-up, so you can imagine they are in favour.

There is a long history of statisticians versus league tables, which is why I am now writing a retrospective with this title, perhaps for Significance magazine if they like it, perhaps for a medical / health services research audience. I’m going to focus on health because it’s my field but you should look up Harvey Goldstein’s equally critical tone in education, particularly given that “I know how well each school is doing” comment.

I am a member of NHS England’s National Advisory Group on Clinical Audit and Confidential Enquiries. This is entirely my own opinion and not that of NAGCAE, NHS England or HM Government.

Leave a comment

Filed under Uncategorized