Tag Archives: statistics

The peer-review log

As an academic, I started a page on this blog site that documented each peer review I did for a journal. I never quite got round to going back in time from the start, but there isn’t much of interest there that you won’t get from the stuff I did capture. Now that I am hanging up my mortarboard, it doesn’t make sense to be a page any more so I am moving it here. Enjoy the schadenfreude if nothing else.


Statisticians are in short supply, so scientific journals find it hard to get one of us to review the papers that have been submitted to them. And yet the huge majority of these papers rely heavily on stats for their conclusions. As a reviewer, I see the same problems appearing over and over, but I know how hard it is for most scientists to find a friendly statistician to help them make it better. So, I present this log of all the papers I have reviewed, anonymised, giving the month of review, study design and broad outline of what was good or bad from a stats point of view. I hope this helps some authors improve the presentation of their work and avoid the most common problems.

I started this in November 2013, and am working backwards as well as recording new reviews, although the retrospective information might be patchy.

  • November 2012, randomised controlled trial, recommended rejection. Sample size was based on an unrealistic Minimum Clinically Important Difference from prior research uncharacteristic of the primary outcome, and thus the study was unable to demonstrate benefit, and unethical because the primary outcome was about efficiency of the health system while benefit to patients had already been demonstrated, yet the intervention was withheld in the control group. Power to detect adverse events was even lower as a result, yet bold statements about safety were made. A flawed piece of work that put hospital patients at risk with no chance of ever demonstrating anything, this study should never have been approved in the first place. Of interest to scholars of evidence-based medicine, this study has now been printed by Elsevier in a lesser journal, unchanged from the version I reviewed. Such is life; I only hope the authors learnt something from the review to outweigh the reward they felt at finally getting it published.
  • November 2013, cross-sectional survey, recommended rejection. Estimates were adjusted for covariates (not confounders) when it was not relevant to do so, grammar was poor and confusing in places, odds ratios were used when relative risks would be clearer, t-tests and chi-squareds were carried out and reported without any hypothesis being clearly stated or justified
  • November 2013, exploratory / correlation study, recommended major revision then rejection when authors declined to revise the analysis. Ordinal data analysed as nominal, causing an error crossing p=0.05.
  • March 2014, randomised controlled trial, recommended rejection. Estimates were adjusted for covariates when it was not relevant to do so, bold conclusions are made without justification.
  • April 2014, mixed methods systematic review, recommended minor changes around clarity of writing and details of one calculation.
  • May 2014, meta-analysis, recommended acceptance – conducted to current best practice, clearly written and on a useful topic.
  • July 2014, ecological analysis, recommended major revision. Pretty ropy on several fronts, but perhaps most importantly that any variables the authors could find had been thrown into an “adjusted” analysis with clearly no concept of what that meant or was supposed to do. Wildly optimistic conclusions too. Came back for re-review in September 2014 with toned-down conclusions and clarity about what had been included as covariates but the same issue of throwing the kitchen sink in. More “major revisions”; and don’t even think about sending it voetstoots to a lesser journal because I’ll be watching for it! (As of September 2015, I find no sign of it online)
  • July 2014, some other study I can’t find right now…
  • September 2014, cohort study. Clear, appropriate, important. Just a couple of minor additions to the discussion requested.
  • February 2015, secondary analysis of routine data, no clear question, no clear methods, no justification of adjustment, doesn’t contribute anything that we haven’t already known for 20 years and more. Reject.
  • February 2015, revision of some previously rejected paper where the authors try to wriggle out of any work by refuting basic statistical facts. Straight to the 5th circle of hell.
  • March 2015, statistical methods paper. Helpful, practical, clearly written. Only the very merest of amendments.
  • April 2015, secondary analysis of public-domain data. Inappropriate analysis, leading to meaningless conclusions. Reject.
  • April 2015, retrospective cohort study, can’t find the comments any more… but I think I recommended some level of revisions
  • September 2015, survey of a specific health service in a hard-to-reach population. Appropriate to the question, novel and important. Some amendments to graphics and tables were suggested. Minor revisions.
  • March 2016, case series developing a prognostic score. Nice analysis, written very well, and a really important topic. My only quibbles were about assuming linear effects. Accept subject to discretionary changes.
  • October 2016, cohort study. Adjusted for stuff that probably isn’t confounding, and adjusting (Cox regression) for competing risks when they should be recognised as such. Various facts about the participants that are not declared. Major revisions.
  • October 2016 diagnostic study meta-analysis. Well done, clearly explained. A few things could be spelled out more. Minor revisions.
  • November 2016, kind of a diagnostic study…, well-done, well-written, but very limited in scope and hard to tell what the implications for practice might be. Left in the lap of the gods editors.
  • December 2016, observational study of risk factors, using binary outcomes but would be more powerful with time-to-event if possible. Competing risks would have to be used in that case. Otherwise, nice.

Leave a comment

Filed under research

Performance indicators and routine data on child protection services

The parts of social services that do child protection in England get inspected by Ofsted on behalf of the Department for Education (DfE). The process is analogous to the Care Quality Commission inspections of healthcare and adult social care providers, and they both give out ratings of ‘Inadequate’, ‘Requires Improvement’, ‘Good’ or ‘Outstanding’. In the health setting, there’s many years’ experience of quantitative quality (or performance) indicators, often through a local process called clinical audit and sometimes nationally. I’ve been involved with clinical audit for many years. One general trend over that time has been away from de novo data collection and towards recycling routinely collected data. Especially in the era of big data, lots of organisations are very excited about Leveraging Big Data Analytics to discover who’s outstanding, who sucks, and how to save lives all over the place. Now, it may not be that simple, but there is definitely merit in using existing data.

This trend is just appearing on the horizon for social care though, because records are less organised and electronic, and because there just hasn’t been that culture of profession-led audit. Into this scene came my colleagues Rick Hood (complex systems thinker) and Ray Jones (now retired professor and general Colossus of UK social care). They wanted to investigate recently open-sourced data on child protection services and asked if I would be interested to join in. I was – and I wanted to consider this question: could routine data replace Ofsted inspections? I suspected not! But I also suspected that question would soon be asked on the cash-strapped corridors of the DfE, and I wanted to head it off with some facts and some proper analysis.

We hired master data wrangler Allie Goldacre, who combed through, tested and verified and combined together the various sources:

  • Children in Need census, and its predecessor the Child Protection and Referrals returns
  • Children and Family Court Advisory and Support Service records of care proceedings
  • DfE’s Children’s Social Work Workforce statistics
  • SSDA903 records of looked-after children
  • Spending statements from local authorities
  • Local authority statistics on child population, deprivation and urban/rural locations.

Just because the data were ‘open’ didn’t mean they were useable. Each set had its own quirks and each local authority had its own problems and definitions in some cases. The data wrangling was painstaking and painful! As it’s all in the public domain, I’m going to add the data and code to my website here, very soon.

Then, we wrote this paper investigating the system and this paper trying to predict ‘Inadequate’ ratings. The second of these took all the predictors in 2012 (the most complete year for data) and tried to predict Inadequates in 2012 or 2013. We used the marvellous glmnet package in R and got down to three predictors:

  • Initial assessments within the target of 10 days
  • Re-referrals to the service
  • The use of agency workers

Together they get 68% of teams right, and that could not be improved on. We concluded that 68% was not good enough to replace inspection, and called it a day.

But lo! Soon afterwards, the DfE announced that they had devised a new Big Data approach to predict Inadequate Ofsted scores, and that (what a coincidence!) it used the same three indicators. Well I never. We were not credited for this, nor indeed had our conclusion (that it’s a stupid idea) sunk in. Could they have just followed a parallel route to ours? Highly unlikely, unless they had an Allie at work on it, and I get no impression of the nuanced understanding of the data that would result from that.

Ray noticed that the magazine Children and Young People Now were running an article on the DfE prediction, and I got in touch. They asked for a comment and we stuck it in here.

A salutary lesson that cash-strapped Gradgrinds, starry eyed with the promises of big data after reading some half-cocked article in Forbes, will clutch at any positive message that suits them and ignore the rest. This is why careful curation of predictive models matters. The consumer is generally not equipped to make the judgements about using them.

A closing aside: Thomas Dinsmore wrote a while back that a fitted model is intellectual property. I think it would be hard to argue that coefficients from an elastic-net regression are mine and mine only, although the distinction may well be in how they are used, and this will appear in courts around the world now that they are viewed as commercially advantageous.

Leave a comment

Filed under research

The sad ossification of Cochrane reviews

Cochrane reviews made a huge difference to evidence-based medicine by forcing consistent analysis and writing on systematic reviews, but now I find them losing the plot in a rather sad way. I wanted to write a longer critique while still indemnified by being a university employee and after the publication of a review I have nearly completed with colleagues (all of whom say “never again”). But those two things will not overlap. So, I’ll just point you to some advice on writing a Summary Of Findings table (the only bit most people read) from the Musculo-skeletal Group:

  • “Fill in the NNT, Absolute risk difference and relative percent change values for each outcome as well as the summary statistics for continuous outcomes in the comments column.”

“Summary”, you say? Well, I’m all for relative + absolute measures, but the NNT is a little controversial nowadays (cf Stephen Senn everywhere) and are all those stats going to have appropriate measures of uncertainty, or will they be presented as gospel truth? With continuous outcomes, we were required to state means, SDs, % difference, and % change in either arm, which seems a bit over the top to me, and, crucially, relies on some pretty bold assumptions about distributions: assumptions that are not necessary elsewhere in the review.

  • “When different scales are used, standardized values are calculated and the absolute and relative changes are put in terms of the most often used and/or recognized scale.”

I can see the point of this but that requires a big old assumption about the population mean and standard deviation of the most often used scale, as well as assumption of normality. Usually, these scales have floor/ceiling effects.

  • “there are two options for filling in the baseline mean of the control group: of the included trials for a particular outcome, choose the study that is a combination of the most representative study population and has a large weighting in the overall result in RevMan. Enter the baseline mean in the control group of this study. […or…] Use the generic inverse variance method in RevMan to determine the pooled baseline mean. Enter the baseline mean and standard error (SE) of the control group for each trial”

This is an invitation to plug in your favourite trial and make the effect look bigger or smaller than it came out. Who says there is going to be one trial that is most representative and has a precise baseline estimate? There will be fudges and trade-offs aplenty here.

  • “Please note that a SoF table should have a maximum of seven most important outcomes.”

Clearly, eight would be completely wrong.

  • “Note that NNTs should only be calculated for those outcomes where a statistically significant difference has been demonstrated”

Jesus wept. I honestly can’t believe I have to write this in 2017. Reporting only significant findings allows genuine effects and noise to get through, and the quantity of noise can actually be huge, certainly not 5% of results (cf John Ioannides everything being false, and Andrew Gelman on types of error).

On calculating some absolute changes in % terms (all under 10%), reviewers then came back and told us that they should all be described as “slight improvement”, the term “slight” being reserved for absolute changes under a certain size. They also recommend using Cohen’s small-medium-large classification quite strictly, in a handy spreadsheet for authors called Codfish. I thought Cohen’s D and his classification had been thrown out long ago in favour of, you know, thinking. This is rather sad, as we see the systematic approach being ossified into a rigid set of rules. I suspect that the really clever methodologists involved in Cochrane are not aware of this, nor would they approve, but it is happening little by little in the specialist groups.

Archaeopteryx_lithographica_(Eichstätter_Specimen)

Archaeopteryx lithographica (Eichstätter specimen). H. Raab CC-BY-SA-3.0

This advice for reviewers is not on their website but needs proper statistical review and revision. We shouldn’t be going backwards in this era of Crisis Of Replication.

Leave a comment

Filed under healthcare

Dataviz of the week, 19/4/17

 

This is just the greatest thing I’ve seen in a while, and definitely in the running for dataviz o’ the year already. Emoji scatterplot:Screen Shot 2017-04-18 at 23.58.23

And another:

Screen Shot 2017-04-19 at 00.02.51

There’s also a randomisation test which I’ll leave you to discover for yourself.

 

Leave a comment

Filed under Visualization

Dataviz of the week, 29/3/17

Here’s a graphic of a really deep oil well by Fuel Fighter via Visual Capitalist. This is rather reminiscent (ahem) of the long, tall graphics by the Washington Post (and the eerily similar one from the Guardian a few days later which they had to admit they had nicked) about flight MH370 at the bottom of the ocean. The WP graphic works because you have to scroll down, and down, and down, and down, and down (wow, that’s deep!), and down, and down (no way), and down before you get to the sea bed. Yes, all the usual references are there, hot air balloons and Burj Khalifas and Barad-Dûrs and what have you, but they don’t matter because it’s the scrolling that does it, giving you GU2 (“Conveying the sense of the scale and complexity of a dataset”) and GU6 (“Attracting attention and stimulating interest.”) The references don’t mean anything to me (or probably you); I may have seen the Burj Khalifa and thought it was amazingly tall, but I have no grasp of how tall and that is what matters: I’d have to have an intuitive feel for what 3 BKs are compared to the height of a jet aircraft, and I don’t have that, so why should I care about the references?

Screen Shot 2017-03-21 at 08.42.38

My problem with the Fuel Fighter graphic is that it doesn’t have that same sense of depth. The image file is 796 x 4554 pixels, which is an aspect ratio of 1:17. The WP image (SVG FTW) is 539 x 16030 or 1:30, which is pretty extreme! It feels to me like you’d have to get past 1:20 before it started to have enough impact.

 

Leave a comment

Filed under Visualization

Dataviz of the week, 22/3/17

The Washington Post have an article about the US budget out by Kim Soffen and Denise Lu. It’s not long, but brings in four different graphical formats to tell different aspects of the data story. A bar showing parts of the whole (see, you don’t need a pie for this!)

Screen Shot 2017-03-21 at 08.21.25

then a line/dot/whatever-you-want-to-call-it chart of the change in relative terms

Screen Shot 2017-03-21 at 08.21.38

then a waffle of that change in absolute terms, plus a sparkline of the past.

Screen Shot 2017-03-21 at 08.21.55

there’s also a link to full department-specific stories under each graphic. I think this is really good stuff, though I can image some design-heads wanting to reduce it further. It shows how you can make a good data-driven story out of not many numbers.

Leave a comment

Filed under Visualization

False hope (of methodological improvement)

I had a paper out in January with my social worker colleague Rick Hood, called “Complex systems, explanation and policy: implications of the crisis of replication for public health research”. The journal page is here or you can grab the post-print here. It’s a bit of a manifesto for our research standpoint, and starts to realise a long-held ambition of mine to make statistical thinking (or lack thereof) and philosophy of science talk to one another more.

We start from two problems: the crisis of replication and complex systems. Here’s a fine history of the crisis of replication from Stan chief of staff Andrew Gelman. By complex systems, we mean some dynamic system which is adaptive (responds to inputs) and non-linear (a small input might mean a big change in output, but hey! because it’s adaptive, you can’t guarantee it will keep doing that). In fact, we are in particular interested in systems that contain intelligent agents, because this really messes up research. They know they are being observed and can play games, take short-term hits for long-term goals, etc. Health services, society, large organisations, all fit into this mould.

There have been some excellent writers who have tackled these problems before and we bring in Platt, Gigerenzer, Leamer, Pawson, Manski. I am tempted to give nutshells of their lives’ work but you can get it all in the paper. Sadly, although they devoted a lot of energy and great ideas to making science work better, they are almost unknown among scientists of all stripes. Reviewers said they enjoyed reading the paper and found it fresh, but felt that scientists knew about these problems already and knew how to tackle them. You have to play along and be respectful to reviewers but we thought this was wishful thinking. Bad science is everywhere and a lot of it involves that deadly combination of our two problems; public health (focus of our paper) is more susceptible than most fields because of the complex system (government, health providers, society…) and often multi-faceted interventions requiring social traction to be effective. At the same time it draws on a medical research model that originated in randomised controlled trials, rats and Petri dishes. The reviewers and us disagree on just how far most of that research has evolved from its origins.

My experience of health and social care research in a realist-minded faculty is that the more realist and mixed-method the research gets, and the more nuanced and insightful the conclusions are, the less it is attended to by the very people who should learn from it. Simple statistics and what Manski called “incredible certitude” are much more beguiling. If you do this, that will follow. Believe me.

Then, we bring in a new influence that we think will help a lot with this situation: Peter Lipton. He was a philosopher at Cambridge and his principal contribution to his field was the concept of “inference to the best explanation” (also the title of his excellent book which I picked up somewhat by mistake in Senate House Library one day in 2009, kick starting all of this), which takes Peirce’s abductive reasoning and firms it up into something almost concrete enough to actually guide science and reasoning. The fundamental problem with all these incredible certitude studies is that they achieve statistical inference and take that to be the same thing as explanatory inference. As you know, we are primed to detect signals, and evolved to favour false positives, and have to quantify and use statistics to help us get past that. The same problem arises in explanation, but without the quantification to help.

cheese v bedsheets

Cheese consumption vs death entangled in bedsheets: you believe it more than the famous Nicolas Cage films vs swimming pool deaths correlation because it has a more (almost) plausible explanation. From Tyler Vigen’s Spurious Correlations.

A good explanation make a statistical inference more credible. It’s what you’re hoping to achieve at the end of Platt’s “strong inference” process of repeated inductive – deductive loops. This is Lipton’s point: as humans, when we try to learn about the world around us, we don’t just accept the likeliest explanation, as statistics provides, but we want it to be “lovely” too. As I used to enjoy having on my university profile of relevant skills (on a few occasions a keen new person in comms would ask me to take it down but I just ignored them until they left for a job elsewhere that didn’t involve pain-in-the-backside academics):

Screen Shot 2017-03-08 at 17.39.09

By loveliness, Lipton meant that it gives you explanatory bangs for bucks: it should be simple and it should ideally explain other things beyond the original data. So, Newton’s gravitation is lovely because it has one simple formula and that works for both apples and planets. Relativity seems too hard to comprehend to be lovely, but as the phenomena explained by it stack up, it becomes a winner. Wave-particle duality likewise. In each case, they are accepted not for their success in statistical but explanatory inference. It’s not just laws of physics but more sublunar cause and effect too: if you impose a sugar tax, will people be healthier in years to come? That’s the extended worked example we use in the paper.

Now, there are problems with explanations:

  • we don’t know how to search systematically for them
  • we don’t know where they come from; they generally just “come to mind”
  • we don’t know how to evaluate and choose from alternatives among them
  • we usually stop thinking about explanation as soon as we hit a reasonably good candidate, but the more you think, the more refinements you come up with
  • we seem to give too much weight to loveliness compared to likelihood

and with loveliness itself too. Firstly, it’s somewhat subjective; consider JFK’s assassination. If you are already interested in conspiracy theories and think that government spooks do all sorts of skullduggery, then the candidate explanation that the CIA did it is lovely – it fits with other explanations you’ve accepted, and perhaps explains other things – they did it because he was about to reveal the aliens at Roswell. If you don’t go for that stuff then it won’t be lovely because there are no such prior beliefs to be co-explained. In neither the CIA candidate nor the Oswald candidate explanation are there enough data to allow likelihood to get in there and help. It would be great if we could meaningfully quantify loveliness and build it into the whole statistical process which was supposed to help us get over our leopard-detecting bias for false positives, but that seems very hard. Lipton, in fact, wrote about this and suggested that it might be possible via Bayesian priors. I’ll come back to this.

So, here’s a couple of examples of loosely formed explanations that got shot from the hip after careful and exacting statistical work.

Long ago, when I was a youngster paying my dues by doing data entry for the UK clinical audit of lung cancer services, we put out a press release about differences by sex in the incidence of different types of tumour. I’m not really sure why, because that’s an epidemiological question and not one for audit, but there ya go. It got picked up in various places and our boss was going to be interviewed on Channel 5 breakfast news. We excitedly tuned in. “Why is there a difference?” asked the interviewer. They had heard the statistical inference and now they wanted an explanation.

Of course, we didn’t know. We just knew it was there, p=whatever. But it is human nature to seek out explanation and to speculate on it. The boss had clearly thought about it already: “It may be the feminine way in which women hold their cigarettes and take small puffs”. Whaaat? Where did that come from? I’d like to say that, before dawn in my shared apartment in Harringay, I buried my face in my hands, but that requires some understanding of these problems which I didn’t acquire until much later, so I probably just frowned slightly at the stereotype. I would have thought, as earnest young scientists do, that any speculation on why was not our business. Now, I realise two things: the scientist should propose explanations lest someone less informed does, and they should talk about them, so that the daft ones can get polished up, not stored until they are released pristine and unconstrained by consensus onto national television. It would be nice if, as our paper suggests, these explanations got pre-specified like statistical hypotheses should be, and thus the study can be protected against the explanatory form of multiple testing.

Then here’s a clip from the International New York Times (erstwhile weekly paper edition in the UK) dated on my 40th birthday (I never stop looking for good stuff to share with you, readers).

It’s all going well until the researcher starts straying from ‘we found an association’ into ‘this is why it happens’. “There are more than a thousand compounds in coffee. There are a few candidates, but I don’t know which is responsible.” The opposite problem happens here: by presupposing that there must be a chemical you can attribute effects to (because that’s what he was shown in med school), we can attribute it to an unknown one, and thus by begging the question back up the statistical inference with a spurious explanatory one. Here, there is a lack of explanation, and that should make us rightly suspicious of the conclusion.
coffee-protects-the-liver-nyt-22oct2014

On these foundations, we tentatively propose some steps researchers could take to improve things:

  • mixed-methods research, because the qual informs the explanation empirically
  • Leamer’s fragility analysis
  • pre-specify a mapping of statistical inference to explanation
  • have an analysis monitoring committee, like trials have a data monitoring committee
  • more use of microsimulation / agent-based modelling
  • more use of realist evaluation

Further in the future, we need:

  • methodological work on Bayesian hyperpriors for loveliness
  • better education, specifically dropping the statistical cookbook and following the ASA GAISE guidelines

This is strong medicine; funders and consumers of research will not give a damn for this time-consuming expense, bosses and collaborators will tell concerned researchers not to bother, and some of it could be so hard as to be practically impossible. In particular, Bayesian hyperpriors for loveliness are in the realm of methodological fancy, although some aspects exist, notably bet-on-sparcity, and I’ll return to that in a future post. But setting that to one side, if researchers do things like our recommendations, then over time we will all learn how to do this sort of thing well, and science will get better.

Right?

Wrong. None of this will happen any time soon. And this is, ironically, for the same reason that the problems arise in the first place: science happens in a complex system, and an intervention like ours can have an adverse effect, or no effect at all. Researchers respond to several conflicting forces and the psychosocial drivers of behaviour, stronger than appealing to their good nature, remain unchanged. They still scoff at navel-gazing philosophical writers and lump us into that category, they still get told to publish or perish, and they still get rewarded for publication and impact, regardless of the durability of their work. So if I was to talk about the future in the same form that Gelman wrote about the past, it would be a more pessimistic vision. Deep breath:

Icefields_parkway

A storm hits the city and the lights go out before I can prepare

This crisis is known to scientists in only a few areas, where the problem is particularly egregious (not to say to it won’t one day be revealed to have been bigger elsewhere, like public health, but that in these areas it is both quite bad and quite obvious): social and behavioural psychology most notably, although brain imaging and genetics have their own problems and believe they have fixed it by looking for really small p-values (this, lest you be mistaken, will not help). For most other fields of quantitative scientific endeavour, they don;t even realise they are about to get hit. I recall being introduced to a doctor by a former student when we bumped into each other in a cafe:
“Robert’s a statistician.”
“Oh, good, we need people like you to get the p-values going in the right direction”
Now, I know that was a light-hearted remark, but it shows the first thing that comes to mind with statistics. They have no idea what’s coming.

The whole of downtown looks dark like no one lives there

Statistical practice is so often one of mechanistic calculation. You can use recipes to crank the handle and then classify as significant (go to Oslo, collect Nobel prize) or non-significant (go to Jobcentre, collect welfare). There is no sign of explanation up front; it is grubbed up after the fact. It’s as though all human thought was abandoned the minute they turned on the computer. I just can’t understand why you would do that. Have more pride!

Why does this happen? These are at least some of the psychosocial forces I mentioned earlier:

  • The risk is carried by early career people: the junior academic or the budding data scientist. The older mentor is not expected to control every detail, and doesn’t take personal responsibility (it was Fox’s fault!)
  • Only a few such analyses are used (maybe one) to evaluate the junior person’s ability
  • Impact is valued; a reasonable idea for a whole organisation or programme of work, but not for projects, because there will always be a certain failure rate; paradoxically, this is also why academics play it safe with uninspiring incremental advances
  • Discovery and novelty are valued – as above
  • This sort of work is badly paid. You have to succeed quickly before you have to drop out and earn some cash.
  • The successful ones get the habit and can carry on misbehaving.

There’s a party uptown but I just don’t feel like I belong at all (do I?)

So what would happen when people operating in these psychosocial forces get confronted? We’ve seen some of this already in the recent past. Call the critics bullies, just ignore them, pretend that what they say is hilariously obscure left-bank tosh, say that the conclusion didn’t change anyway, suddenly decide it was only intended to be exploratory, find a junior or sub-contractor scapegoat, say you absolutely agree and make only a superficial change while grandstanding about how noble you are to do so, and of course there are more strategic ways for the bad guys to fight back that I listed previously. Medicine will prove to be much worse than psychology and resistant to (or oblivious of any need to) reform. There are reasons for this:

  • it’s full of cliques
  • they live and breathe hierarchy from school to retirement
  • whatever they tell you, their research is uni-disciplinary
  • there’s a DIY ethic that comes from that unfettered confidence in one’s own ability to do whatever anyone else does
  • they venerate busyness (no time for learning the niceties) and discovery (just get to the p-value)

I considered politicians with the same list and concluded that we don’t have to worry about them, reform will come from statistics up and post-truth, if such a thing exists, is transient. This might not apply to Trump types, of course, because they are not politicians. Cliques are open to coming and going, there is an expectation of advancing and taking turns in the hierarchy, their research is done by others and they can then blame the experts if it goes wrong, they do nothing themselves, they did humanities courses and venerate turning the idea over and over.

Let me leave you with a quote from the late, great Hans Rosling: “You can’t understand the world without statistics. You can’t understand the world with statistics alone.”

“False Hope” by Laura Marling is (c) Warner / Chappell Music Inc

 

Leave a comment

Filed under Uncategorized