Get along to this, it’s sure to be good. Why the government selling your data is not the same thing as the government creating open data for the public good.
Monthly Archives: May 2014
The April 2014 issue of Significance, which has just dropped into my pigeon hole, has an excellent editorial summary of what’s been going on with the National Health Services’s care.data proposals to bundle up health data and share it, perhaps with researchers, perhaps with commercial enterprises. If you can only read one thing about care.data, make it this one. All the different aspects are together in one place.
Earlier in the week I was bloggin’ about extreme time scales and various uses of spirals in data visualisation. This morning I thought about it a little more and realised the attraction of extreme scales, like the entire lifetime of our planet, or the size of the solar system, is in large part just that it’s fun. I start my own dataviz talks with Gelman & Unwin’s 6 objectives, which I think are helpful in framing the many uses of images (for a statistician, anyway – we were trained that there is only one use of a graph and that is to check for outliers / normality briefly before it is deleted!), although I get the impression (and I would be happy to be corrected by better informed dataviz hipsters (I use the term with only the very mildest form of offense)) that those objectives are generally looked upon with some disdain as Johnnies-come-lately in a design community that has had its own goals for a much longer time. In this application, we are appealing to GU2, “conveying the sense of the scale and complexity of a dataset”. In the original paper, G&U give network graphs as an example, because they convey an overall impression but little or not concrete information, so people like me tend not to approve. I like the data to be retrievable by the viewer. But why not, if it effectively sets the scene?
A couple of unorthodox examples spring to mind: scale reconstructions of the solar system and Stamen+Nasdaq on high-frequency trading. If you wipe out the extremities with a super-log scale then you lose the fun too. (OK, it’s a sitting duck of an ugly example, but still!) Another good one is the Washington Post on Flight MH370.
And then consider two popular visualizations, US Gun Deaths and CarbonVisuals NYC. In each case, they rely on the emotional impact of the sudden acceleration or amplification of values, and they get that in very different ways. As we learnt from Haydn, the impact of the Surprise only really works the first time, but it stays fun for years afterwards.
Last week Andrew Gelman picked up on a couple of graphs of extremely long time periods. Here they are again for your convenience (when one mentions a subject such as climate change, it’s like a magnet for time-wasters, so I’ll spare you from reading through the explosion of comments at Gelman’s blog)
What’s going on in that x-axis?!
Gelman liked the spirals within spirals; not everyone did. It put me in mind of two examples I saw recently when reading Isabel Meirelles’s book “Design For Information” (which is excellent!). The first is not good, in my humble opinion:
“10 years of Wikipedia” is a series of line graphs that are bent round into a spiral. You are supposed to compare the position of the line to the ideal spiral in grey. What this adds above and beyond the area chart on the left is questionable. I find it impossible to see the patterns, and I imagine that is something to do with how our brains perceive position radiating out from a central point.
The better use is when the spiralling is metaphorical. In this image from National Geographic, the number of space exploration missions that have flown by and visited different planets and moons are shown as concentric rings. One gets an immediate feel for the number of rings.
Felix Schönbrodt has blogged recently about how a statistic (correlation, in his case) wiggles around and gradually stabilises as a sample accumulates. And he draws what I call a cumulative funnel plot. Schönbrodt seems to have basically reinvented elementary statistical inference, and I would suggest that if you read his blog, you not get excited and start referring to POSs and COSs, lest a statistician take a dim view of your new clothes. However, I think the cumulative funnel plot is a great way of conveying the notion of uncertainty from sampling error. A couple of years ago I commended it here, and although Spiegelhalter and colleagues made a valid rebuttal about false alarms, we’re aiming for different goals. I’m thinking about the longer-term goal of improving public understanding of uncertainty. A few false alarms is part of life, and I think people can handle them and don’t have to be shielded by well-meaning statisticians. As soon as the line wiggles over outside the funnel, you get interested, but you don’t swing into action and close the hospital down. I would have thought that obvious… ah well maybe not. You wait until the pattern keeps happening. Best of all, you have some prespecified endpoint and you do one significance test then, but sometimes when lives are at stake, we have to compromise on the statistical purity in order to get early warnings. But to come back to the notion of communicating inferential principles, the other thing I like about the plot is the transparent, superimposed bootstrapped trajectories. I have no doubt it is easier for the newcomer to understand this sort of depiction of uncertainty than the theoretical stuff (like the funnel). At ICOTS this summer I’ll be attending the workshop on simulation in introductory stats teaching and I hope to report back soon afterwards with some new ideas.
Around that time, I was making a Stata command for drawing these plots, and it somehow ended up on my pile of dormant projects. Maybe I’ll get it back up and running some day.
Why not? Scott Kildall has developed this idea of data crystals formed from spatial points by a clustering algorithm. Then he can print out the result in a 3-D printer.
It takes some explaining, but I think he does that on his web page. There is something of the puzzle-solving fun about it, and by the time you’ve worked it out, you’ve also learned something about the world’s population, or clustering algorithms, or animation. Job done! The trick is not to make the puzzle too hard or too easy, and this is about right for me, although it already ticked a few of my boxes (multivariate, animation, data art, mmmm) and others might find it just too weird.
Personally, I like the idea of a data installation that is more dataviz than data art. I mean arranging stuff in a 3-D space and letting people interact physically. Maybe it could get even more interactive, like controlling lights by the visitor’s selections or real time data. If anyone out there has inroads to a nice chunk of gallery space and is interested, get in touch y’all. You know curators love a science-art crossover project.
(Spotted via flowingdata.com)