Tag Archives: epidemiology

Dataviz of the week, 17/5/2017

nextstrain.org is a website that offers real-time tracking of pathogens as they evolve (flu, ebola, dengue, all your favourites are here). Data gets pulled in from various monitoring systems worldwide and represented with interactive content in several pretty ways:

Screen Shot 2017-05-16 at 15.24.19Screen Shot 2017-05-16 at 15.25.02Screen Shot 2017-05-16 at 15.24.37Screen Shot 2017-05-16 at 15.25.19Screen Shot 2017-05-16 at 15.25.36

They have their own libraries called fauna, augur and auspice, the last of these doing the dataviz stuff, and as far as I could tell built on D3. I don’t pretend to understand the genetic and genomic work that has to go on to process the raw data but that is clearly substantial.

Leave a comment

Filed under Visualization

H7N9 bird flu in China

The paper out two days ago in the New England Journal of Medicine that details latest epidemiological information from this outbreak has some really thoughfully produced graphics. It also provokes some in-depth statistical pondering. It’s worth a look. I can’t reproduce the figures here without waiting for copyright permissions first, so I’ll just link you straight to the paper thus, and you can see them and the accompanying text for yourself.

Figure 1 seems to suggest that the first three provinces (Shanghai, Zhejiang and Jiangsu) to have more than an isolated case saw a similar rise then fall in the numbers. See those colored bars rise and then fall again? Maybe there is a localised outbreak, transmission for a few days, and then it dies out. Well, no, I don’t think so, although it’s tempting to infer a common history like that. There are two reasons argue against it for me. One, the cases are surprisingly widespread geographically (see Figure 2). The distance from eastern Henan to Shanghai is 800 km, which is the same as Land’s End to Dumfries, or New York to Quebec City. Two, the stacking of the bars make the ones on top look at a glance like they are rising even if it just the bars underneath that are moving.

It seemed to me that there were a lot of small numbers of cases away from the coast where the patient still alive. Now, this is very flawed because I should include the days since symptoms appeared, and I don’t know that, but I made a Poisson Q-Q plot using the data from Figure 2. Shanghai looks quite different to the other locations:


In fact, if you base the quantiles on the mean death risk from all the sites except Shanghai, they all lie along the line, which suggests they are Poisson-distributed but something else is going on in Shanghai, producing a higher death rate, or a lower proportion of cases that survive and recover are being captured. I don’t think it is that Shanghai started having cases first, so they have had longer in which to die (sorry to be morbid folks, it’s what I do for a living), because the median time from onset to death is 11 days (IQR 7-20) and we have cases going back to March in three provinces, while Shanghai’s bulk of cases only really got going at the same time as everywhere else, 4 weeks ago.



One more thing struck me: how much information we are given about the patients. We would never write all that potentially identifying information here. Is it all right if (a) the data come from a country where they are not so keen on anonymity in research, (b) if the future of humanity is at stake and a snippet of information in there could be the clue that saves us (at this stage, I can’t honestly tell you that my choice of words is entirely flippant), or (c) if they said it was all right? Discuss.

Leave a comment

Filed under Uncategorized

Seminar at St George’s: new weapons against residual confounding

I will be giving a seminar within the Faculty of Health and Social Care Sciences on 11 October 2012, 13:00 to 14:00, introducing for the first time some methodological work that has taken up a lot of energy over the last two years. All are welcome; it would be helpful if you e-mailed N.Greenwood@sgul.kingston.ac.uk so we can get the numbers right. I will be submitting the paper soon so it could be a while before this appears again in the public eye! I am not one to exaggerate my own work but I think this is an important new method for epidemiology and observational studies generally.

Title: A new method for dealing with residual confounding: a practical introduction for researchers


In this seminar I will outline recent work I have carried out to develop a new statistical method, in practical and non-mathematical terms. Confounding is an almost universal problem in observational (non-randomised) studies, where the predictor of interest is correlated with other factors, causing one to over- or under-estimate the effect it truly has on an outcome. There are many tools for separating the predictor of interest from the confounder, but these fail if the confounder has been imperfectly measured, for example recording smoking simply as current / ex / never. This is the situation called residual confounding, and the received wisdom is that nothing can be done about it.

It is however possible to adapt modern methods for missing data (multiple imputation) and use this to correct the imperfections and remove residual confounding. Some assumptions have to be made and these are safer in some situations than others. I will explain what is required in terms of information and expertise, and show some examples.

Leave a comment

Filed under noticeboard, research