Monthly Archives: December 2012

This ‘videographic’ is snow good

I just gave in and looked up the weather forecast for Christmas Day. The Met Office (UK) had a “videographic” on their web page. Well, I could hardly contain my excitement!

Sadly, it fell a long way short of expectations (probably like the presents I got for Mrs Grant this year). What is with the first part? It goes on way too long. The years are in random order – why would you go to the effort of doing that? Nobody can follow what’s going on as a result. And I was trying to work out what the x-axis indicated, before finally deciding it meant nothing at all! then we have a line chart joining the depth of snow on years when snow fell on Christmas day, which has the advantage of familiarity, but there are no gaps for the years with no snow. Aaargh!

I don’t want to be too harsh – they tried and that is really, really important. Keep trying, and check out some of the amazing work at flowingdata.com for ideas

Ho ho ho!

Leave a comment

Filed under Uncategorized

High Court judgment on whether anonymised data are still “personal data”

In an email bulletin I received today from the Administrative Data Liaison Service, there is a very important  bit of news. In the UK we have what I think I am justified in calling the strongest data protection laws anywhere in the world. In particular, there are restrictions on the use of personal data, which means data that identifies an individual or which contains enough information that someone could, with reasonable effort, identify an individual. “Sensitive personal data” are guarded even more closely, for example healthcare data. It has been unclear for a long time what the status is of detailed healthcare data which is then anonymised or made available only in aggregate form. For several years a modus operandi has developed where data are given only as counts of people in various categories, and those containing fewer than 5 are omitted for anonymity. But what if an additional dataset was also provided, which would identify people when taken together with the counts?

In a recent High Court judgment, clear guidance is given for interpreting this. You can read the details here but basically, if dataset A is provided (for example, through a Freedom Of Information request) and individuals cannot be identified from A, but can be identified once dataset B is attached to it, that means dataset A is still truly anonymous. You do not need to (and indeed, under the FOI, public sector agencies may not) withhold A out of fear of theoretical future abuse. And the fact that you, the data controller, can identify patients by attaching other datasets is irrelevant.

This is very timely for the issue of NHS data being available for research. If services are operated by private or voluntary sector organisations, your data will belong to them and you can expect them to want to keep it because data=$$$. Any excuse for keeping it in-house will be difficult for managers to resist once commercial access (a la GPRD) becomes part of their business plan. As I blogged previously, the draft beefed-up NHS Constitution could be the basic standard requiring all NHS-branded services to make data available for research. This clarification of the grey area between the Freedom of Information Act and the Data Protection Act removes one of the barriers to sharing in a partly-privatised service.

Leave a comment

Filed under Uncategorized

Data Linkage course

UCL are running a one-day course on this hot topic on 1 February 2013 at the Institute of Child Health in London. This is a rare opportunity to learn the subject, so get in there quick if you are interested! Details are here.

Leave a comment

Filed under Uncategorized

Frank Duckworth on University Challenge

I see that Frank Duckworth, editor of the RSS Newsletter and co-inventor of the Duckworth-Lewis method for setting targets in one-day cricket games interrupted by rain, will appear on University Challenge on 20 December – and hopefully will survive to semi-finals and finals. I shall be cheering and waving my copy of Neave’s elementary tables, with a big ‘6’ drawn on it, while I watch.

Leave a comment

Filed under Uncategorized

Survival of the sweetest

On receiving an advent calendar from one of our course directors, I suggested we could track each other’s chocolate consumption in a survival analysis and establish who was eating significantly more chocs. Strangely, everybody refused to take part, so I am n=1. Looking forward to being able to highlight some “shocks” on the graph.

So far, so smug

So far, so smug

If you really want to know:

library(jpeg)
today<-12
chocs<-c(5,5,7,7,7,10,10,10,12)
lc<-length(chocs)
choc.days<-rep(NA,25)
for (i in 1:25) {
choc.days[i]<-sum(chocs==i)
}
choc.surv<-25-cumsum(choc.days)
holly<-readJPEG("holly.jpg")
chocplot<-function(){
plot(1:today,choc.surv[1:today],type="s",lty=1,col="chocolate4",lwd=10,
ylim=c(0,25),xlim=c(0,25),
xlab="Day in December",ylab="Surviving chocolates",
main="Survival of chocolates in Robert's 2012 office advent calendar",
bty="n")
text(x=10,y=23,labels="Survival curve",col="chocolate4")
text(x=6.7,y=17,labels="One a day...")
text(x=18,y=20,labels="Restraint",col="red",font=3)
text(x=22,y=23,labels="Scrooge",col="green",font=3)
text(x=7,y=6,labels="Abandon",col="red",font=3)
text(x=3,y=2,labels="Nausea",col="green",font=3)
lines(x=c(1,25),y=c(25,1),lty=3)
rasterImage(holly,22,0,25,3)
}
windows()
chocplot()
jpeg("Advent_calendar.jpg")
chocplot()
dev.off(dev.cur())

Leave a comment

Filed under R

Multidimensional scaling of REM album covers: FlagSpace revisited

By way of following up on an old in-joke, and doing something constructive because I couldn’t get to sleep, I thought I would revisit the Flag Space plot that I blogged about a while back and learn how to do it. The details are here at R-bloggers and the code is here at Github. It is surprisingly simple. I actually only had 15 images (REM album covers, don’t ask), so I just saved them from Amazon. They were JPEGs not PNGs, but the readJPEG() function works in exactly the same way as shown in Github. Implementation from there is very simple indeed.

REM-PCA1

Of course, it means nothing. Maybe Dimension 1 is dark-light and 2 is colour-monochrome? The stress is 0.17 which I think is a pretty good fit to 2 dimensions. There are probably more meaningful ways of getting distance / dissimilarity matrices for raster images, and I’ll consider them next. Then at some point I would like to get to grips with the tuneR package, which might lead to the same plot with analysis of the actual music. I really mustn’t drink coffee in the evenings.

Leave a comment

Filed under R

Best visualization of 2012

As it is the time of year when bloggers are supposed to do this kind of thing, here is the visualization that for me is head-and-shoulders above all others this year.

Yes, it’s a video. Why not? From some very clever people doing good at Carbon Visuals, this video achieves several things: it’s novel, it tells an engaging story (you know the shock is coming but it still gets you), it conveys huge numbers that are hard to grasp on paper, and it relates the data to the real world in a way that makes you think about it afresh.

***** “I was so impressed I forgot where I was and said a rude word out loud in the office”

Leave a comment

Filed under Uncategorized