Performance indicators and routine data on child protection services

The parts of social services that do child protection in England get inspected by Ofsted on behalf of the Department for Education (DfE). The process is analogous to the Care Quality Commission inspections of healthcare and adult social care providers, and they both give out ratings of ‘Inadequate’, ‘Requires Improvement’, ‘Good’ or ‘Outstanding’. In the health setting, there’s many years’ experience of quantitative quality (or performance) indicators, often through a local process called clinical audit and sometimes nationally. I’ve been involved with clinical audit for many years. One general trend over that time has been away from de novo data collection and towards recycling routinely collected data. Especially in the era of big data, lots of organisations are very excited about Leveraging Big Data Analytics to discover who’s outstanding, who sucks, and how to save lives all over the place. Now, it may not be that simple, but there is definitely merit in using existing data.

This trend is just appearing on the horizon for social care though, because records are less organised and electronic, and because there just hasn’t been that culture of profession-led audit. Into this scene came my colleagues Rick Hood (complex systems thinker) and Ray Jones (now retired professor and general Colossus of UK social care). They wanted to investigate recently open-sourced data on child protection services and asked if I would be interested to join in. I was – and I wanted to consider this question: could routine data replace Ofsted inspections? I suspected not! But I also suspected that question would soon be asked on the cash-strapped corridors of the DfE, and I wanted to head it off with some facts and some proper analysis.

We hired master data wrangler Allie Goldacre, who combed through, tested and verified and combined together the various sources:

  • Children in Need census, and its predecessor the Child Protection and Referrals returns
  • Children and Family Court Advisory and Support Service records of care proceedings
  • DfE’s Children’s Social Work Workforce statistics
  • SSDA903 records of looked-after children
  • Spending statements from local authorities
  • Local authority statistics on child population, deprivation and urban/rural locations.

Just because the data were ‘open’ didn’t mean they were useable. Each set had its own quirks and each local authority had its own problems and definitions in some cases. The data wrangling was painstaking and painful! As it’s all in the public domain, I’m going to add the data and code to my website here, very soon.

Then, we wrote this paper investigating the system and this paper trying to predict ‘Inadequate’ ratings. The second of these took all the predictors in 2012 (the most complete year for data) and tried to predict Inadequates in 2012 or 2013. We used the marvellous glmnet package in R and got down to three predictors:

  • Initial assessments within the target of 10 days
  • Re-referrals to the service
  • The use of agency workers

Together they get 68% of teams right, and that could not be improved on. We concluded that 68% was not good enough to replace inspection, and called it a day.

But lo! Soon afterwards, the DfE announced that they had devised a new Big Data approach to predict Inadequate Ofsted scores, and that (what a coincidence!) it used the same three indicators. Well I never. We were not credited for this, nor indeed had our conclusion (that it’s a stupid idea) sunk in. Could they have just followed a parallel route to ours? Highly unlikely, unless they had an Allie at work on it, and I get no impression of the nuanced understanding of the data that would result from that.

Ray noticed that the magazine Children and Young People Now were running an article on the DfE prediction, and I got in touch. They asked for a comment and we stuck it in here.

A salutary lesson that cash-strapped Gradgrinds, starry eyed with the promises of big data after reading some half-cocked article in Forbes, will clutch at any positive message that suits them and ignore the rest. This is why careful curation of predictive models matters. The consumer is generally not equipped to make the judgements about using them.

A closing aside: Thomas Dinsmore wrote a while back that a fitted model is intellectual property. I think it would be hard to argue that coefficients from an elastic-net regression are mine and mine only, although the distinction may well be in how they are used, and this will appear in courts around the world now that they are viewed as commercially advantageous.

How the REF hurts isolated statisticians

In the UK, universities are rated by expert panels on the basis of their research activities, in a process called the REF (Research Excellence Framework). The resulting league table not only influences prospective students’ choices of where to study, but also the government’s allocation of funding. More money goes to research-active institutions in a ‘back the winner’ approach that aims explicitly to produce a small number of excellent institutions out of the dense (and possibly over-supplied) field that exists at present. The recent publication of the Stern Review into this process has been widely welcomed. I have been involved with institutional rankings, albeit hospitals rather than universities, for a long time, and of all the scoring systems and league tables that could be produced, the REF’s 2014 iteration is as close to a perfectly bad system as could be conceived. It might have been written by a room full of demons pounding at infernal typewriters until a sufficient level of distortion and perversity was achieved. Universities are incentivised to neglect junior researchers and save the money until a last minute frenzied auction to headhunt established academics nearing retirement. The only thing that counts is a few peer-reviewed papers by a few academics, and despite assurances of holistic, touchy-feely assessment, everybody knows it comes down to some kind of summary statistic of the journal impact factors.

Stern tries to tackle some of that, and I won’t rehash the basics as you can read that elsewhere. I want to focus on the situation that isolated statisticians, in the ASA’s sense of the term, find themselves in. Many statisticians in academia end up ‘isolated’, in that they are the only statistician in another department. Whatever their colleagues’ background, and whatever the job description may say, the isolated statistician exists to some extent as a helpdesk for the colleagues who are lacking in stats skills. I am one such, the only statistician in a faculty of 282 academic staff. Most of my publications are the result of colleagues’ projects, and only occasionally as a result of my own methodological interests. Every university department has to submit its best (as defined by REF) outputs into one particular “unit of assessment”, which in our case is “Allied Health Professions, Dentistry, Nursing and Pharmacy”.

This mapping of departments into units goes largely uncriticised — because it largely doesn’t matter — but it excludes those people like isolated statisticians who don’t belong to the same profession as the rest of the unit. All my applied work with clinical / social worker colleagues, which is the bulk of the day job, can count (and of course, I chip into so many applied projects that I actually look like a superhero in the metric of the REF), but any methodological spin-offs do not, yet they are the bit that really is Statistics, the bit that I would want to be acknowledged if I were looking for a job in a statistics department. I’m not looking for that job, but a lot of young applied jobbing statisticians are. Why is it necessary to have that crude categorisation of whole departments to a unit of assessment? It doesn’t strike me as making the assessment any easier for the REF staff, because they rate the individual submissions and then aggregate them across units. The work-around is to have joint appointments into different university departments, so applied work counts here and methodological there, except that REF would not allow that. You must belong to one unit. This may not matter so much to statisticians, who have the most under-supplied and sexiest job of the new century, because we can always up sticks and head for Silicon Valley or the City, but is it really the intention of the REF to promote professional ghettos free from methodologists throughout academia? We have seen from the psychology crisis of replication what happens when people get A Little Knowledge and only ever talk to others like themselves.

They each wanted to improve education; together, they ruined it

If you have any interest in using data to improve public services like education or healthcare, whether enthusiastic or sceptical, read this article. The story is a familiar one to me but rarely sees the public eye in such careful detail as it does here. The road to hell, as you know, is paved with dashboards, performance indicators and league tables.

Righton Johnson, a lawyer with Balch & Bingham who sat in on interviews, told me that it became clear that most teachers thought they were committing a victimless crime. “They didn’t see the value in the test, so they didn’t see that they were devaluing the kids by cheating,” she said. Unlike recent cheating scandals at Harvard and at Stuyvesant High School, where privileged students were concerned with their own advancement, those who cheated at Parks were never convinced of the importance of the tests; they viewed the cheating as a door they had to pass through in order to focus on issues that seemed more relevant to their students’ lives.

Lewis said, “I know that sometimes when you’re in the fight, and you’re swinging, you want to win so badly that you don’t recognize where your blows land.”

There have been similar stories in the UK news recently, but you can get it all from Parks, so I suggest just reading this and carrying those cautionary ideas around with you.

