Monthly Archives: September 2013

Simpson’s paradox made easy

What a pleasure it was to find this web page on Simpson’s Paradox by the Visualizing Urban Data Lab at Berkeley. They use a nicely designed, clean layout, some attractive clickable (d3.js) graphics and few words to communicate this concept really nicely. I shall be recommending it to all students. More please!

Image

Leave a comment

Filed under Uncategorized

Parallel computing survey

University of Manchester are running a survey for all statisticians and data analysts about parallel computing which you can contribute to here. It’s not just for geeks who do parallel computing, even if you are interested and don’t yet know anything about it, your views would, I’m sure, be helpful in getting a representative sample.

Impressive cluster

What our parents think our facilities look like. (Image CC-BY-SA Megware GmbH)

Leave a comment

Filed under Uncategorized

Subtracting streets from choropleths, and how it might help understand uncertainty

Two visualizations that have caught my eye recently both use the same idea of colouring in geographical blocks according to some aggregated statistic, but then removing the streets and uninhabited areas. This helps you navigate your way but also, suggests James Cheshire of Spatial.ly, might emphasise the fact that the data are aggregated and shouldn’t be assumed to be true at every sub-level of geography.

The first one is the rather beautiful LuminoCity:Image

This is the brainchild of Duncan Smith at City Geographics. It really brings home how clustered our workplaces are, which is a little silly in this day and age. The same idea was shown off by James Cheshire in a talk at the recent Royal Statistical Society conference, this time by his colleague Oliver O’Brien who has a blog called Suprageography. (Damn it, why are these geographers all so cool and connected while statisticians are, well…)

Image

There’s an almost irresistible urge with these zoomable maps to zoom in on your home, or place of work, or whatever. And this is where Cheshire reckons the advantage comes: when you see your own neighbourhood entirely coloured in the same, you realise that what you are seeing is aggregated area stats and not the absolute truth. That might just stop you from making any false interpretations of relationships by the ecological fallacy (what associations are true at aggregate level are not necessarily true at individual level, viz chocolate vs Nobel prizes). That idea is worth thinking about when drawing maps.

1 Comment

Filed under Uncategorized

R tips for moderately large data

Some useful tips recently featured on r-bloggers and originally posted at Mollie’s Research Blog are worth reading. I say moderately large because I don’t really believe there is such a thing as big data (and it looks like Mollie doesn’t either, judging by the judicious use of the word ‘large’), but there are special computational problems that appear as you go large. Maybe in ten years we’ll laugh at those problems but I suspect the data will have kept pace just ahead of our capabilities.

For example, did you know that by specifying the class of each variable (string, integer and so on) when opening a file in R, you can cut the time taken nearly in half? I certainly didn’t. What about not bothering to open it at all if it’s already in memory? That’s a good idea too. I’ll be keeping an eye on the blog for more top tips.

It would be interesting to see how many of these have parallels in other stats software.

1 Comment

Filed under R

Bayesian health economics course

This sort of course doesn’t come around very often. If you are interested in health economic evaluation and modelling you really need to make full use of Bayesian tools. The MRC biostatistics unit in Cambridge are running a course on WinBUGS for health economics on 28-29 October 2013. No prior BUGS experience assumed.

Leave a comment

Filed under Uncategorized

Dataviz evening at the RSS

This sounds good…

“Data Visualisation – Storytelling by numbers ”
Royal Statistical Society and Association of Survey Computing, 10th October 2013, 5pm 
Royal Statistical Society, 12 Errol Street, London, EC1Y 8LX

A joint meeting of the RSS Social Statistics Section and the Association of Survey Computing will take place on 10th October 2013 at 5pm with tea/coffee from 4.30pm. The speakers are:
Alan Smith, Office for National Statistics Data Visualisation Centre
Tobias Sturt & Emma Whitehead, the Guardian Digital Agency 

[…] Alan Smith of the ONS Data Visualisation Centre will speak on using visualisation to redefine the audience and outreach of official statistics. Tobias Sturt and Emma Whitehead from the Guardian’s Digital Agency will discuss some of the principles of effective data storytelling and explore best practice examples to inspire and inform your own approaches to data visualisation. 
[…]
Attendance is free and open to all, but pre-registration is recommended. You can register by email meetings@rss.org.uk or by phone (020) 7638 8998.  For further information about the meeting contact Chris Kershaw (chris.kershaw@homeoffice.gsi.gov.uk). For directions see www.rss.org.uk/findus.

Leave a comment

Filed under Uncategorized

A couple of short stats animations

Conference season is over, and I have a lot of cool stuff to catch up on blogging. Here’s part 1.

Two animations of optimisation processes, one made with Yihui Xie’s R package and the other by an eerily Grantoid nuts-n-bolts approach where a lot of png files are saved and then turned into a stop frame animation using (apparently) ImageMagick (aka ffmpeg for wimps).

First, an OR kind of problem. 8 balls are scattered about in 2 dimensions. They then have to be moved to be equidistant. This is done with the R optim function and the old BFGS algorithm. I fondly recall my undergraduate studies where I had to program it in a calculator which I still have rusting in a drawer somewhere. I can’t believe looking back that I managed to do that. I must have learnt a lot, but one thing that didn’t stay with me was all those strange BFGS names. I think there was a Goldfarb and a Shanno, but the others are lost to me.

Image

Read the code here.

Next, one I think is more fun. A Metropolis algorithm with three chains seeks out the mean and variance parameters for some data. This is a 3-D-style wireframe plot, and the three chains appear like kittens moving around under a quilt, until they converge on the target distribution like a quivering mouse. Code here.

metrop

 

As testament to the fact that gifs are a really bad way of providing animations, one of them works on this blog, and one doesn’t. No idea why, nor is it worth any effort to find out! I think if you click you’ll see those crawling kittens of likelihood.

Leave a comment

Filed under animation