The paper out two days ago in the New England Journal of Medicine that details latest epidemiological information from this outbreak has some really thoughfully produced graphics. It also provokes some in-depth statistical pondering. It’s worth a look. I can’t reproduce the figures here without waiting for copyright permissions first, so I’ll just link you straight to the paper thus, and you can see them and the accompanying text for yourself.
Figure 1 seems to suggest that the first three provinces (Shanghai, Zhejiang and Jiangsu) to have more than an isolated case saw a similar rise then fall in the numbers. See those colored bars rise and then fall again? Maybe there is a localised outbreak, transmission for a few days, and then it dies out. Well, no, I don’t think so, although it’s tempting to infer a common history like that. There are two reasons argue against it for me. One, the cases are surprisingly widespread geographically (see Figure 2). The distance from eastern Henan to Shanghai is 800 km, which is the same as Land’s End to Dumfries, or New York to Quebec City. Two, the stacking of the bars make the ones on top look at a glance like they are rising even if it just the bars underneath that are moving.
It seemed to me that there were a lot of small numbers of cases away from the coast where the patient still alive. Now, this is very flawed because I should include the days since symptoms appeared, and I don’t know that, but I made a Poisson Q-Q plot using the data from Figure 2. Shanghai looks quite different to the other locations:
In fact, if you base the quantiles on the mean death risk from all the sites except Shanghai, they all lie along the line, which suggests they are Poisson-distributed but something else is going on in Shanghai, producing a higher death rate, or a lower proportion of cases that survive and recover are being captured. I don’t think it is that Shanghai started having cases first, so they have had longer in which to die (sorry to be morbid folks, it’s what I do for a living), because the median time from onset to death is 11 days (IQR 7-20) and we have cases going back to March in three provinces, while Shanghai’s bulk of cases only really got going at the same time as everywhere else, 4 weeks ago.
One more thing struck me: how much information we are given about the patients. We would never write all that potentially identifying information here. Is it all right if (a) the data come from a country where they are not so keen on anonymity in research, (b) if the future of humanity is at stake and a snippet of information in there could be the clue that saves us (at this stage, I can’t honestly tell you that my choice of words is entirely flippant), or (c) if they said it was all right? Discuss.