Monthly Archives: November 2012

Gelman on infographics and communication

Infographic by Nigel Holmes (www.nigelholmes.com)

Andrew Gelman is giving a talk later today at MIT on graphics and infoviz. You can read about it in his blog here, including links to his slides and paper. It makes for a nice overview of the situation, the problems and the possibilities. I think he is onto something with the “graphic as puzzle” idea. That does seem a  feature of the successful ones, along with telling a story, catching your eye, provoking an emotional response and revealing something which would be really hard to do in words or tables.

The example above is from a study from the University of Saskatchewan* that showed that including “chart junk” is not necessarily a barrier to understanding or recall – although as Gelman points out, their chart junk is really good junk. I like the subtle touch – it is only the 8% that ends up on the woman’s lips – which tells the underlying story. Anyone who objects to the curvy outline in the pie chart is a tedious pedant; have you ever seen anyone look at a pie chart and then ask “now, where’s my protractor?”

* – I guess the impetus to study eye-catching shapes comes from the province itself. Like the T-shirts say:

T shirt:

Easy to draw. Hard to spell.

Leave a comment

Filed under Uncategorized

Exporting data from R into WinBUGS’s “R/S-plus list format”

Users of the world’s favourite Bayesian analysis software WinBUGS will recognise the strange format in which data is supplied inside the GUI, generally in a text file looking something like this:

list(x=c(1,2,3,4,5,6),y=c(4,8,2,6,9,9))

This is of course an R (or strictly speaking, S) language format, but it’s not how R views data, and although you can also supply the data in a fixed column width format, I have occasionally wondered how to export data from a bigger general analysis and data management package like R into that weird list format. If you start to think about it, you soon realise it isn’t simple. It hasn’t been an issue because you can just run WinBUGS from inside R, but this week I found I had to do it. I won’t bore you with details.

I found this webpage from Iowa State University which gives an R function called writeDatafileR. This works very nicely, producing a text file that you can then open in WinBUGS’s own GUI, and I am surprised it isn’t better known or on CRAN. It is apparently written by Terry Elrod but though I searched for Terry on Google and at Iowa State I couldn’t find out any more about him or her. So here’s a Pioneer-10-esque “thank you” into cyberspace that may one day find Terry.

11 Comments

Filed under R

Datasets galore

If you are looking for a classic dataset to teach from, or to test out some analytical method you are thinking of doing, there are some great assets out there like DASL or Teaching Datasets online. But Vincent Arel-Bundock has recently compiled an index of all datasets supplied with any R package, called Rdatasets. You don’t need R to be able to access them, because they all come as CSV files with a Word document explaining the variables and history (to varying degrees of completeness and clarity).

Leave a comment

Filed under Uncategorized

xkcd revisited: Frequentist vs Bayesian

Andrew Gelman blogged about this cartoon, coming gallantly to the aid of the poor frequentist who might feel victimised. Then all hell broke loose in his comments which you can read here, including the appearance of Randall “xkcd” Munroe himself. I think, optimistically, that everyone agreed to get along in the end.

I couldn’t resist one more word on this for the newcomer. It can all seem terribly esoteric and philosophical. That’s because it is! The traditional frequentist statistician will say that they make inferences about unknown parameters (like the mean blood pressure in stats lecturers) only in the sense that if you repeated the experiment many times, you would get a certain spread of different estimates. The true mean of the population blood pressure is, like the X-Files, out there, and you can never know it unless you get all the stats lecturers in the world and sphyg them. Well, that’s nice but it rather limits our ability to make inferences in complicated situations. Ronald Fisher took the probability equation that you use to model your data given the parameters, and turned it round in that the data are known, but the parameter is not. So some values of the parameter (mean population blood pressure) are more likely than others, given the data. This is called the likelihood, and mathematically it is not quite the same as a probability but we can set a computer running to find the parameter value(s) that get the highest likelihood. Hence “maximum likelihood estimation”, without which we would be still be stumbling around doing t-tests and Mann-Whitneys; there would certainly be no survival analysis, logistic regressions or multilevel models. But you’ll note that this treats the parameter as coming from a distribution, which a strict frequentist (and they are rare beasts) would object to. Most people are happy to say that likelihood is not the same as probability, and leave it there. Why trouble yourself asking what likelihood is if it works?

The term Bayesian is very confused. It can mean someone who treats parameters as coming from distributions, or someone who thinks that probability is not a long-run proportion of repeated results but rather a subjective measure of belief or personal conviction. It also gets used to describe analyses that use computational methods to find the shape of the likelihood when calculus fails us (Markov Chain Monte Carlo, usually). However you take it, there is no evidence that Rev Thomas Bayes held well-developed views compatible with them. In fact, Bayes’ theorem is just a fact of probability theory that nobody disputes.

There is also another elephant in the room. If an event is going to happen only once (e.g. Obama vs Romney) then there is no such thing as long-run proportions of repeated elections. How can we make inference about the % Obama votes? In what sense can Nate Silver say Obama had 86% chance of winning? This is tricky… one solution is to say that probabilities arise from some causal propensity to generate certain kinds of results, an approach usually attriuted to Charles Sanders Peirce that most find a little too slippery because it appeals to causality which, as Judea Pearl has shown us, is a force conceived by mortal man as something beyond mathematical formulae.

Then you have the matter of subjective Bayesian methods, distinguished by what they call “informative priors”. this says that you have a certain belief before the experiment, which is updated by the data (and their likelihood) to form your belief after the experiment – the posterior distribution. As you will have a different prior to me, you will also get a different posterior, but if we collect lots of lovely data, the likelihood will swamp the prior and we will end up believing the same thing. Neat, huh? These subjective Bayesians are the people my MSc supervisor called “mad Bayesians” and a Bayes-friendly colleague of mine calls “staunch Bayesians”. (Note the moral tone that both have adopted. Calm down, guys.) I don’t mind subjective Bayes, it’s a consistent and principled approach, but I don’t use it for one practical reason that I happen to think is the trump card in this particular argument. When you give the client their posterior distribution, it is their new belief. They are not supposed to look at it and think about it, they are supposed to adopt it without thought, because it is their thought. If the posterior says 88% certainty the drug works, well then Dr Subjective, you’d better start selecting 88% of your patients at random to give it to. And nobody behaves like that, so I see little point in mathematically modelling their cerebral processes only to be ignored.

Sadly, we are surrounded by people who “conclude that the sun has exploded”. It’s a basic logical error that afflicts all sorts, regardless of whether they think model parameters can be treated like random variates or that probability is a measure of personal certainty / belief. It is the error that translates some evidence of higher cancer rates in carrot-eaters into a headline that carrots cause cancer. It is the error of making a mathematical model which we call statistics, with a load of assumptions about reality, and then refusing ever to question the assumptions. It is the notion that science brings answers, which I’m afraid it rarely does (see Thomas Kuhn &c &c). If we all understood just a little philosophy of science, we would make better data analysts.

Leave a comment

Filed under Uncategorized

Reliability studies, missing data, and meta-analysis – lunchtime seminars

London School of Hygiene and Tropical Medicine are hosting some educational seminars on popular statistical topics. This is a great opportunity to learn some new tricks over lunch if you are near London!

I’m sorry but it won’t look like this in November…

Centre for Statistical Methodology: AUTUMN EDUCATIONAL SEMINAR SERIES

For three weeks in November, the Centre for Statistical Methodology at LSHTM will hold a series of lunchtime educational seminars

****************************************************************

Agreement, reliability and repeatability studies (Jonathan Bartlett)

Tuesday 13th November 12:45pm (basic) – LG9

Thursday 15th November 12:45pm (intermediate) – LG9

********************************************************************

Missing data (Mike Kenward and James Carpenter)

Tuesday 20th November 12:45pm (basic) – Lucas Room (LG81)

Thursday 22nd November 12:45pm (intermediate) – Bennett Room (LG80)

********************************************************************

Systematic Reviews and Meta-analysis (Phil Edwards, Alma Adler, Tim Collier and Bianca De Stavola)

Tuesday 27th November 12:45pm (Systematic Reviews) – Lucas Room (LG81)

Thursday 29th November 12:45pm (Meta-analysis) – LG9

*****************************************************************

ALL WELCOME, NO REGISTRATION REQUIRED

For further information, please contact Rhian.Daniel@LSHTM.ac.uk.

Leave a comment

Filed under learning

Moneybombs: a really rich animation of US election donations

This is a very interesting piece of work by VisPolitics. They have maps, geolocated time series, and a density plot all in one, with names appearing on the right and moving vertically and in size over time. And all these aspects look good too. My favourite section is the Obama vs Romney one, which kicks off around 0:27. I liked it because it was clearer to me what was going on (as a Brit who has never set foot in Boston).

Of course, if it was interactive that would be even better, but a huge amount of work.

Leave a comment

Filed under animation

New paper: ethnicity of newly-qualified nurses and their job prospects

I have a paper just out in the International Journal of Nursing Studies where colleagues and I surveyed newly qualified nurses who studied in London, first on the last day of their course, and then six months later. We asked about confidence and feeling prepared for various aspects of job hunting, and what success they had experienced. The causality is complex but it appeared to be very consistent across all the ‘outcome’ measures that ethnic minority nurses had worse prospects. There’s a lot of questions that arise from this that justify new research, for example focussing on the work placement and the peer support environment.

This is part of a larger project run by NHS London which will have a press release and launch at the King’s Fund on 19 November.

Leave a comment

Filed under research