Category Archives: Uncategorized

False hope (of methodological improvement)

I had a paper out in January with my social worker colleague Rick Hood, called “Complex systems, explanation and policy: implications of the crisis of replication for public health research”. The journal page is here or you can grab the post-print here. It’s a bit of a manifesto for our research standpoint, and starts to realise a long-held ambition of mine to make statistical thinking (or lack thereof) and philosophy of science talk to one another more.

We start from two problems: the crisis of replication and complex systems. Here’s a fine history of the crisis of replication from Stan chief of staff Andrew Gelman. By complex systems, we mean some dynamic system which is adaptive (responds to inputs) and non-linear (a small input might mean a big change in output, but hey! because it’s adaptive, you can’t guarantee it will keep doing that). In fact, we are in particular interested in systems that contain intelligent agents, because this really messes up research. They know they are being observed and can play games, take short-term hits for long-term goals, etc. Health services, society, large organisations, all fit into this mould.

There have been some excellent writers who have tackled these problems before and we bring in Platt, Gigerenzer, Leamer, Pawson, Manski. I am tempted to give nutshells of their lives’ work but you can get it all in the paper. Sadly, although they devoted a lot of energy and great ideas to making science work better, they are almost unknown among scientists of all stripes. Reviewers said they enjoyed reading the paper and found it fresh, but felt that scientists knew about these problems already and knew how to tackle them. You have to play along and be respectful to reviewers but we thought this was wishful thinking. Bad science is everywhere and a lot of it involves that deadly combination of our two problems; public health (focus of our paper) is more susceptible than most fields because of the complex system (government, health providers, society…) and often multi-faceted interventions requiring social traction to be effective. At the same time it draws on a medical research model that originated in randomised controlled trials, rats and Petri dishes. The reviewers and us disagree on just how far most of that research has evolved from its origins.

My experience of health and social care research in a realist-minded faculty is that the more realist and mixed-method the research gets, and the more nuanced and insightful the conclusions are, the less it is attended to by the very people who should learn from it. Simple statistics and what Manski called “incredible certitude” are much more beguiling. If you do this, that will follow. Believe me.

Then, we bring in a new influence that we think will help a lot with this situation: Peter Lipton. He was a philosopher at Cambridge and his principal contribution to his field was the concept of “inference to the best explanation” (also the title of his excellent book which I picked up somewhat by mistake in Senate House Library one day in 2009, kick starting all of this), which takes Peirce’s abductive reasoning and firms it up into something almost concrete enough to actually guide science and reasoning. The fundamental problem with all these incredible certitude studies is that they achieve statistical inference and take that to be the same thing as explanatory inference. As you know, we are primed to detect signals, and evolved to favour false positives, and have to quantify and use statistics to help us get past that. The same problem arises in explanation, but without the quantification to help.

cheese v bedsheets

Cheese consumption vs death entangled in bedsheets: you believe it more than the famous Nicolas Cage films vs swimming pool deaths correlation because it has a more (almost) plausible explanation. From Tyler Vigen’s Spurious Correlations.

A good explanation make a statistical inference more credible. It’s what you’re hoping to achieve at the end of Platt’s “strong inference” process of repeated inductive – deductive loops. This is Lipton’s point: as humans, when we try to learn about the world around us, we don’t just accept the likeliest explanation, as statistics provides, but we want it to be “lovely” too. As I used to enjoy having on my university profile of relevant skills (on a few occasions a keen new person in comms would ask me to take it down but I just ignored them until they left for a job elsewhere that didn’t involve pain-in-the-backside academics):

Screen Shot 2017-03-08 at 17.39.09

By loveliness, Lipton meant that it gives you explanatory bangs for bucks: it should be simple and it should ideally explain other things beyond the original data. So, Newton’s gravitation is lovely because it has one simple formula and that works for both apples and planets. Relativity seems too hard to comprehend to be lovely, but as the phenomena explained by it stack up, it becomes a winner. Wave-particle duality likewise. In each case, they are accepted not for their success in statistical but explanatory inference. It’s not just laws of physics but more sublunar cause and effect too: if you impose a sugar tax, will people be healthier in years to come? That’s the extended worked example we use in the paper.

Now, there are problems with explanations:

  • we don’t know how to search systematically for them
  • we don’t know where they come from; they generally just “come to mind”
  • we don’t know how to evaluate and choose from alternatives among them
  • we usually stop thinking about explanation as soon as we hit a reasonably good candidate, but the more you think, the more refinements you come up with
  • we seem to give too much weight to loveliness compared to likelihood

and with loveliness itself too. Firstly, it’s somewhat subjective; consider JFK’s assassination. If you are already interested in conspiracy theories and think that government spooks do all sorts of skullduggery, then the candidate explanation that the CIA did it is lovely – it fits with other explanations you’ve accepted, and perhaps explains other things – they did it because he was about to reveal the aliens at Roswell. If you don’t go for that stuff then it won’t be lovely because there are no such prior beliefs to be co-explained. In neither the CIA candidate nor the Oswald candidate explanation are there enough data to allow likelihood to get in there and help. It would be great if we could meaningfully quantify loveliness and build it into the whole statistical process which was supposed to help us get over our leopard-detecting bias for false positives, but that seems very hard. Lipton, in fact, wrote about this and suggested that it might be possible via Bayesian priors. I’ll come back to this.

So, here’s a couple of examples of loosely formed explanations that got shot from the hip after careful and exacting statistical work.

Long ago, when I was a youngster paying my dues by doing data entry for the UK clinical audit of lung cancer services, we put out a press release about differences by sex in the incidence of different types of tumour. I’m not really sure why, because that’s an epidemiological question and not one for audit, but there ya go. It got picked up in various places and our boss was going to be interviewed on Channel 5 breakfast news. We excitedly tuned in. “Why is there a difference?” asked the interviewer. They had heard the statistical inference and now they wanted an explanation.

Of course, we didn’t know. We just knew it was there, p=whatever. But it is human nature to seek out explanation and to speculate on it. The boss had clearly thought about it already: “It may be the feminine way in which women hold their cigarettes and take small puffs”. Whaaat? Where did that come from? I’d like to say that, before dawn in my shared apartment in Harringay, I buried my face in my hands, but that requires some understanding of these problems which I didn’t acquire until much later, so I probably just frowned slightly at the stereotype. I would have thought, as earnest young scientists do, that any speculation on why was not our business. Now, I realise two things: the scientist should propose explanations lest someone less informed does, and they should talk about them, so that the daft ones can get polished up, not stored until they are released pristine and unconstrained by consensus onto national television. It would be nice if, as our paper suggests, these explanations got pre-specified like statistical hypotheses should be, and thus the study can be protected against the explanatory form of multiple testing.

Then here’s a clip from the International New York Times (erstwhile weekly paper edition in the UK) dated on my 40th birthday (I never stop looking for good stuff to share with you, readers).

It’s all going well until the researcher starts straying from ‘we found an association’ into ‘this is why it happens’. “There are more than a thousand compounds in coffee. There are a few candidates, but I don’t know which is responsible.” The opposite problem happens here: by presupposing that there must be a chemical you can attribute effects to (because that’s what he was shown in med school), we can attribute it to an unknown one, and thus by begging the question back up the statistical inference with a spurious explanatory one. Here, there is a lack of explanation, and that should make us rightly suspicious of the conclusion.

On these foundations, we tentatively propose some steps researchers could take to improve things:

  • mixed-methods research, because the qual informs the explanation empirically
  • Leamer’s fragility analysis
  • pre-specify a mapping of statistical inference to explanation
  • have an analysis monitoring committee, like trials have a data monitoring committee
  • more use of microsimulation / agent-based modelling
  • more use of realist evaluation

Further in the future, we need:

  • methodological work on Bayesian hyperpriors for loveliness
  • better education, specifically dropping the statistical cookbook and following the ASA GAISE guidelines

This is strong medicine; funders and consumers of research will not give a damn for this time-consuming expense, bosses and collaborators will tell concerned researchers not to bother, and some of it could be so hard as to be practically impossible. In particular, Bayesian hyperpriors for loveliness are in the realm of methodological fancy, although some aspects exist, notably bet-on-sparcity, and I’ll return to that in a future post. But setting that to one side, if researchers do things like our recommendations, then over time we will all learn how to do this sort of thing well, and science will get better.


Wrong. None of this will happen any time soon. And this is, ironically, for the same reason that the problems arise in the first place: science happens in a complex system, and an intervention like ours can have an adverse effect, or no effect at all. Researchers respond to several conflicting forces and the psychosocial drivers of behaviour, stronger than appealing to their good nature, remain unchanged. They still scoff at navel-gazing philosophical writers and lump us into that category, they still get told to publish or perish, and they still get rewarded for publication and impact, regardless of the durability of their work. So if I was to talk about the future in the same form that Gelman wrote about the past, it would be a more pessimistic vision. Deep breath:


A storm hits the city and the lights go out before I can prepare

This crisis is known to scientists in only a few areas, where the problem is particularly egregious (not to say to it won’t one day be revealed to have been bigger elsewhere, like public health, but that in these areas it is both quite bad and quite obvious): social and behavioural psychology most notably, although brain imaging and genetics have their own problems and believe they have fixed it by looking for really small p-values (this, lest you be mistaken, will not help). For most other fields of quantitative scientific endeavour, they don;t even realise they are about to get hit. I recall being introduced to a doctor by a former student when we bumped into each other in a cafe:
“Robert’s a statistician.”
“Oh, good, we need people like you to get the p-values going in the right direction”
Now, I know that was a light-hearted remark, but it shows the first thing that comes to mind with statistics. They have no idea what’s coming.

The whole of downtown looks dark like no one lives there

Statistical practice is so often one of mechanistic calculation. You can use recipes to crank the handle and then classify as significant (go to Oslo, collect Nobel prize) or non-significant (go to Jobcentre, collect welfare). There is no sign of explanation up front; it is grubbed up after the fact. It’s as though all human thought was abandoned the minute they turned on the computer. I just can’t understand why you would do that. Have more pride!

Why does this happen? These are at least some of the psychosocial forces I mentioned earlier:

  • The risk is carried by early career people: the junior academic or the budding data scientist. The older mentor is not expected to control every detail, and doesn’t take personal responsibility (it was Fox’s fault!)
  • Only a few such analyses are used (maybe one) to evaluate the junior person’s ability
  • Impact is valued; a reasonable idea for a whole organisation or programme of work, but not for projects, because there will always be a certain failure rate; paradoxically, this is also why academics play it safe with uninspiring incremental advances
  • Discovery and novelty are valued – as above
  • This sort of work is badly paid. You have to succeed quickly before you have to drop out and earn some cash.
  • The successful ones get the habit and can carry on misbehaving.

There’s a party uptown but I just don’t feel like I belong at all (do I?)

So what would happen when people operating in these psychosocial forces get confronted? We’ve seen some of this already in the recent past. Call the critics bullies, just ignore them, pretend that what they say is hilariously obscure left-bank tosh, say that the conclusion didn’t change anyway, suddenly decide it was only intended to be exploratory, find a junior or sub-contractor scapegoat, say you absolutely agree and make only a superficial change while grandstanding about how noble you are to do so, and of course there are more strategic ways for the bad guys to fight back that I listed previously. Medicine will prove to be much worse than psychology and resistant to (or oblivious of any need to) reform. There are reasons for this:

  • it’s full of cliques
  • they live and breathe hierarchy from school to retirement
  • whatever they tell you, their research is uni-disciplinary
  • there’s a DIY ethic that comes from that unfettered confidence in one’s own ability to do whatever anyone else does
  • they venerate busyness (no time for learning the niceties) and discovery (just get to the p-value)

I considered politicians with the same list and concluded that we don’t have to worry about them, reform will come from statistics up and post-truth, if such a thing exists, is transient. This might not apply to Trump types, of course, because they are not politicians. Cliques are open to coming and going, there is an expectation of advancing and taking turns in the hierarchy, their research is done by others and they can then blame the experts if it goes wrong, they do nothing themselves, they did humanities courses and venerate turning the idea over and over.

Let me leave you with a quote from the late, great Hans Rosling: “You can’t understand the world without statistics. You can’t understand the world with statistics alone.”

“False Hope” by Laura Marling is (c) Warner / Chappell Music Inc


Leave a comment

Filed under Uncategorized

Dataviz of the week: 1/3/17

Scribbly states” is not done with felt-tip pens but with some sweet use of D3 Javascript by Noah Veltman. I admire his attention to the little details, making it more human-like and commenting on the situations where it doesn’t work. Turns out if you follow the links, that the method came out of Apple, who patented it way back. Didn’t someone like @inconvergent have a script to make coffee rings? You could chuck that on top for extra authenticity.


Leave a comment

Filed under Uncategorized, Visualization

I’m going freelance

At the end of April 2017, I will leave my university job and start freelancing. I will be offering training and analysis, focusing on three areas:

  • Health research & quality indicators: this has been the main applied field for my work with data over the last nineteen years, including academic research, audit, service evaluation and clinical guidelines
  • Data visualisation: interest in this has exploded in recent years, and although there are many providers coming from a design or front-end development background, there are not many statisticians to back up interactive viz with solid analysis
  • Bayesian modeling: predictive models and machine learning techniques are big business, but in many cases more is needed to achieve their potential and avoid a bursting Data Science bubble, and this is where Bayes helps to capture expert knowledge, acknowledge uncertainty and give intuitive outputs for truly data-driven decisions

Considering the many “Data Science Venn Diagrams”, you’ll see that I’m aiming squarely at the overlaps from stats to domain knowledge, communication and computing. That’s because there’s a gap in the market in each of these places. I’m a statistician by training and always will be, but having read the rule book and found it eighty years out of date, I’m have no qualms in rewriting it for 21st century problems. If that sounds useful to you, get in touch at

This blog will continue but maybe less frequently, although I’ll still be posting a dataviz of the week. I’ll still be developing StataStan and in particular writing some ‘statastanarm’ commands to fit specific models. I’ll still be tinkering with fun analyses and dataviz like the London Café Laptop Map or Birdfeeders Live, and you’re actually more likely to see me around at conferences. I’ll keep you posted of such movements here.

1 Comment

Filed under Uncategorized

Stats and data science, easy jobs and easy mistakes

I have been writing some JavaScript, and I was thinking about how web dev / front-end people are obliged to use the very latest tools, not so much for utility as for kudos. This seems mysterious to me but then I realised: it’s because the basic job — make a website — is so easy. The only way to tell who’s really seriously in the game is by how up to date they are. Then, this is the parallel that occurred to me: statistics is hard to get right, and a beginner is found out over and over again on the simplest tasks. On the other hand, if you do a lot of big data or machine learning or both, then you might screw stuff up left, right, and centre, but you are less likely to get caught. Because…

  • nobody has the time and energy to re-run your humungous analysis
  • it’s a black box anyway*
  • you got headhunted by Uber last week

And maybe that’s one reason why there is more emphasis on having the latest shizzle in a data science job that’s more of a mixture of stats and computer science influences. I’m not taking a view that old ways are the best here, because I’m equally baffled by statisticians who refuse to learn anything new, but the lack of transparency and accountability (oh what British words!) is concerning.

* – this is not actually true, but it is the prevailing attitude

Leave a comment

Filed under Uncategorized

Holiday reading

Enough work, here’s some recommended reading for Christmas and New Year. All of it is free, online, and I enjoyed each one. Many of these come via the New York Times’ “What We’re Reading” email, which is a thing of joy.

The town where everyone got free money

A brief history of buildings that spin

Parenting by the Books: ‘On The Banks of Plum Creek’

Get them on the blower

My Saga: Karl Ove Knausgaard travels through North America


Sound check: the quietest place in the US

The most exclusive restaurant in America

Why do we work so hard?

The coddling of the American mind

The untold story of Silk Road

The town without wi-fi

The strange and curious tale of the last true hermit

A wrenching choice for Alaska towns in the path of climate change

Meet the man who flies around the world for free

The eagle huntress story

Firestorm: poor planning and tactical errors fueled a wildfire catastrophe

Today we are his family

On tickling the dragon’s tail

Leave a comment

Filed under Uncategorized

Things I discovered in 2016

Edit: I remembered! It was Jeffrey Rosenthal who also advocated improv for scientists, and I read it in PPF – review appearing in Significance soon.

Being a list, sometimes with minimal explanation, and not to be taken entirely seriously. These might be influenced by living in Croydon, working in London and hanging around with people younger than myself who work in tech.

For many of these, I kept saying to myself “how did I not know about this before”. You might find them useful too. Others are true, but less didactic, and they are scattered like the proverbial Marvellous Aphorisms.

bootstrap.js – because every template I have used has ended up causing more trouble than it saved to begin with. I’m not a full-time web dev and I need something quick. It’s really easy. Do it.

Git + Atom with packages such as merge-conflict. Damn, this is good, but the obscurity of Git to most people is not going to go away any time soon. I had shoved some stuff cack-handedly onto GitHub but it was working with stan-devs last year and this year that really pushed me into using Git for version control in everyday work. You should really consider Atom if you use Git.

node.js (yeah, I had been dodging this too and feeling inadequate)

react but it’s December and nobody really uses react any more

Why do academics not put some time, energy and budget into acquiring presentation skills? People keep telling me I am awesome etc, and I have really not done much to get awesome, so I rather doubt it, and it must be by association with others who are really bad. On a related note:

cat_with_stringhipster slides. I mean, ditch beamer and powerpoint and all that crap and just put one massive black and white picture of a kitten on the screen. Preferably with one word across it in humungous letters, possibly a lurid colour. If you don’t have your own typeface designed for you (what a loser!) then use Helvetica (and by implication, do not stand up in front of anyone to talk with an obviously Windows machine unless it is done ironically). Try to have as few slides as possible. Like Van Morrison, I’m working towards having no slides at all. On another related note:

a lot of people are talking about improv classes as the key scientific skill of 2017. OK, nobody is, but there’s Alan Alda and @alice_data and Jeffrey Rosenthal, and the classic books by Keith Johnstone which are sitting on my shelf calling to me. I read them like a million years ago and I feel they might be even more relevant now.

finally started teaching myself Python. last year I decided to reduce the number of languages, whether scripting or programming, that I have to carry around in my head, so I could be less awful at all of them. I dropped Julia, although I think it will be brilliant one day soon. That meant that I needed to boost my C++ (I learnt some long, long ago) to get R speedy when required, and that had a few spin-offs. I expect the Python skillz will be important too in 2017, if my brain can accommodate it all. The eagle-eyed among you will notice I’m back to the same number I started with (groan).

The secret Stata 14 command graph export myfilename.svg. Yes, SVG. God’s own graphics format. Just imagine what you could do… thanks to Tim Morris for spotting this. Goodness only knows why he was trying out file extensions for a laugh, or what else he tried that didn’t work. .123 anyone? But seriously, thanks StataCorp for taking this step, I know I have been droning on about it for years and now I’m really pleased with it.

Deep Work. You should seriously read this book. I now spend the start of each working day in a cafe of undisclosed location doing some deep work.

Ingrid Burrington’s work on internet infrastructure and what it tells us about secretive practices. Really eye-opening; you should get the book Networks of New York. I nearly lost my copy in the cafe of undisclosed location, but phew, they saved it for me.

Pinker’s Sense Of Style. Likewise.

Laura Marling, who I then listened to almost non stop this year. I’m not exaggerating. Perhaps responsible for the pessimistic tone creeping into recent writing on whether scientific practice will get better at replication, explanation and all that. More on that in the new year.

Rebecca Solnit’s “Field Guide To Getting Lost”. You’ll either get this or you won’t. If you do, you’ll be thanking me before long.

Mike Monteiro’s keynote talk at interaction ’15. Mentally find-and-replace designer with statistician and you have some important messages right there, plus a lot of swearing.

The Dear Data book, obvs

Cole Nussbaumer Knaflic’s book, which is the one I recommend now to viz noobs. It’s nurturing, if a little slow, and has the best coverage of perception issues that I’ve seen.

I read dataviz “classics” by Bertin and Wilkinson. Now I realise people talk about them a lot but haven’t actually read them, like Ulysses. The difference is I quite like Ulysses but these are just weird and not useful. Not good-weird, like EDA. You have to forgive Bertin a little for being a paid-up French semiologist of the 1950s, I mean it was his job not to say anything clear, but old Wilko seems to have written Grammar of Graphics while on a mind-expanding retreat.

Did a stack of reading around neural networks. They’re cool, and of course massively hyped. Feature selection and measuring uncertainty are the things to think about really hard before doing them. I’m doing NVIDIA’s two-day deep learning course in January ’17.

I decided that any complex set of predictor variables (without a clearly pre-defined subset based on contextual information) should be analysed in a number of ways, combining those from a traditional statistics training with those from a machine learning background: some kind of penalised linear model, some kind of tree, and some kind of non-linear feature combination. Maybe lasso, random forest and neural network. Consider boosting.

Did a stack of reading around AI. Interesting. A lot of compsci ML people seem to fly into a rage at the merest suggestion of killer robots (I can see where they’re coming from), and extend that to any ethical discussion (bad move, I think). You should read Nick Bostrom’s book (the ML guys hate it of course). Why does everyone assume it’s a bad thing to have humans wiped out by robots? We’re not really up to the job of running the planet. One thing I should write right now is that ML is not AI and statistical models like logistic regression do not really constitute ML either. You can relax for a few decades.

Every time I thought up some USSR – New Public Management – University life connection, I thought I was pretty damn clever, but of course Craig Brandist did it all before. What a guy. I bet they have a file on him.

I don’t like bananas, and come to that, cucumbers either. If I got to 42 and am still not sure whether I like a fruit, it’s time to stop trying. Likewise I expect to stop doing a lot of things in 2017, very much in the manner of Bilbo Baggins.

Leave a comment

Filed under Uncategorized

Dataviz of the week, 15 November 2016

I used to have an office door until this week when we moved to open plan space elsewhere in the medical school. I used to stick a chart of the week on that door, a student’s suggestion that proved to be a bottomless mine of goodies. So, I thought I would carry on visualising here.

We begin with some physical dataviz courtesy of Steven Gnagni and spotted by Helen Drury. Spoor of human activity etc etc. More like this next week.


Which pencil has been used most?


Filed under Uncategorized