Inspired by my recent re-posting on the Daily Express, here is the closest thing to a Giles cartoon on statistics for your enjoyment on a Friday afternoon.
Monthly Archives: August 2012
Thanks to the King’s Fund newsletter for bringing this rather nice website from NHS Midlands and East Quality Observatory to my attention. They provide summary PDFs giving a range of quality indicators for general practices and clinical commissioning groups. The layout interests me as it is clear, engaging and above all concise!
The devil is in the detail, as anyone who has worked with quality indicators will know (this particular surgery seems to have 90% of its patients attend A&E regularly, on average everybody goes 1.4 times a year – what is there to do in Worksop that is so dangerous?) but this is a good output at the other end of the conveyor belt.
Two courses coming up at my alma mater, the London School of Hygiene and Tropical Medicine:
An interesting thread has been discussed in the last couple of days on the Allstat mailing list. Here is Andy Cooper’s original post:
I have the following question which I am hoping native statisticians (I am a physicist by training) can help me address. To set the background as to why I am asking this question: a manuscript recently submitted to a Physics journal got rejected because 1 of the 3 reviewers claims that a result presented in the given manuscript is wrong. I would therefore be very grateful to hear the opinion of statisticians.
The issue in question is as follows: Suppose we have 6 statistics (e.g z-statistics), each derived from an independent data set (i.e 6 independent data sets in total). We can assume that the number of degrees of freedom is the same in each data set, so that the corresponding P-values are also comparable. We can further assume that each independent data set is a sample from an underlying population. Under the null hypothesis (z=0), the P-values would be distributed uniformly between 0 and 1. Now, the observed P-values are in fact (3e-9, 0.04, 0.05, 0.03, 0.02, 0.005), i.e they are all less than 0.06. It is clear, at least to me, that the chance that these P-values are drawn from a uniform distribution is pretty small (<1e-8). Yet the reviewer in question claims that there is no overall significance. His/her argument is based on the Bonferroni correction: using a threshold of 0.05/6~0.008 only 2 P-values pass this threshold, which he/she then goes on to claim is not meaningful enough.
My response to the reviewer’s comment is that the use of a Bonferroni correction to establish the overall significance of the 6 P-values is wrong. The Bonferroni correction is ill-suited for this particular application since it is overly conservative, leading to a large fraction of false negatives. Remarkably, the editor of the Physics journal in question finds the reviewers arguments (i.e using the Bonferroni correction) as “persuasive”.
I would be most grateful for your comments.
He then clarified:
I should have clarified of course that the directionality of the statistic is consistent across the 6 data sets.
This post kicked the proverbial hornet’s nest. It seems many scientists, whether statisticians or otherwise, have encountered this sort of response from a reviewer. They are just wrong, and so is the editor*. If I was commenting on a clinical trial of cardiac surgery, I wouldn’t shoot from the hip about where to stick the aorta or how many mils of Pumpmax to infuse. If I did, I would expect the surgeons to tell me to get lost and stick to the stats.
So why do people feel able to pronounce on statistics when they plainly have had a few hours’ training squeezed into their undergraduate degree? When reviewing papers, there is often a box that says “This requires review by a statistician: yes/no“. In my experience this gets ticked in a completely unpredictable way. It seems to me that teaching statistics like a flowchart or checklist is to blame. It is very tempting to do this when faced with another cohort of medical / psychology / whatever students. They memorise the basics, they pass the exam, what happens after that is someone else’s problem. In this case one could imagine the reviewer having remembered some sage advice like “when you do lots of tests, use Bonferroni and everything will be OK”. And having stashed that half-truth away, they leave with a false sense of confidence in their mastery of statistics. Lecturers are also faced with students who feel very anxious about their mathematical abilities, and they are often repeatedly reassured, rewarded for doing the basics and packed in cotton wool until they are no longer anxious. But life is not like that. Life involves data that do not match any of the methods in your textbook; life involves going back to first principles to find the method, and even though a lot of the time some very clever person has programmed the computer to do it for you, you should never forget that the maths is lurking just below the surface. When you hit a snag, you should go down the corridor and knock on the door of someone who knows how to do the maths. If you’re reviewing a paper, tick the statistician box. What you should never do is to bumble on regardless, substituting eminence for evidence.
Professional statisticians are of course outnumbered by the dilettanti and may not be able to provide insight and review on every study that is done, but perhaps it is our duty to be more publically critical, a bit nastier, dare I say it – a bit more like a cardiac surgeon. After all, if you stick the aorta in the wrong place, you can only kill a few people before the hospital hands you the contents of your desk in a cardboard box and escorts you off the premises. If you publish the wrong stats, you can kill millions, and get away with it.
* – because analysing 6 data sets gives you 6 answers, which is a lot of information. Analysing the same data 6 ways then picking the interesting stuff is not very much information, and to guard against reading too much into it, we have post-hoc adjustments, of which Bonferroni is the Fisher-Price “My First Post Hoc Adjustment” (with no offence to Fisher-Price and their fine products). The thing that really is beyond the grasp of bumblers like Andy Cooper’s reviewer is that the decision about whether or not to adjust depends on context and intention; it is as much a philosophical debate as it is methodological. Adjusted p-values seek to replace your human judgement of whether you can trust the (non-)significance of the results, and that makes them much closer in spirit to a subjective Bayesian posterior distribution than most analysts would care to admit.
Excellent and hilarious piece of work from Scott Bryan, demolishing sensational weather headlines in the Express. Since the departure of the Giles cartoons, it hasn’t had a redeeming feature.
There’s one thing the Daily Express likes to talk about.
That’s right… it is THE WEATHER.
For example, here was last Monday’s front page:
But how many times have stories about the weather appeared on the front page?
According to my own research, since September 2011:
- Stories about the WEATHER has appeared on the front page Daily Express 111 times.
- It has been the MAIN NEWS STORY OF THE DAY52 times.
- It has predicted hurricanes 3 times in the last year. It also claims that a hurricane hit Britain on the 4th January.
- There has been 12 instances in the last year where it has predicted or has claimed weather ‘chaos’.
- Here is a wordle of their top weather related headlines in the last year:
So, how many times has it accurately predicted the weather? I’m no meteorologist, but here are all of the…
View original post 3,074 more words
This course is running 1-5 October at the University of Essex. There doesn’t seem to be a website but you register by writing to firstname.lastname@example.org.
Here’s what they say in their e-mail:
Dr Werner Adler (University of Erlangen-Nuremberg; Co-author of R-packages Daim and survAUC); Dr Benjamin Hofner (University of Erlangen-Nuremberg; Author of R-packages gamboostLSS and CoxFlexBoost; Co-author of R-package mboost)
Day 1: Introduction to R (9.30am – 1pm course, 2.30pm-5pm lab) • Concepts of R (graphical user interface (GUI), editors, work flow, help system) • Basic Programming (objects, functions, vectors, matrices, data sets) • Examples and Hands-on Training
Day 2: Introduction to Statistics & Graphics (9.30am – 1pm course, 2.30pm-5pm lab) • Data Management • Descriptive Statistics • Graphics • Examples and Hands-on Training
Day 3: Diagnostic and Statistical Tests (9.30am – 1pm course, 2.30pm-5pm lab) • Diagnostic Tests (quality of diagnostic tests, ROC analysis) • Statistical Tests (binomial test, one-sample t-test, one-sample Wilcoxon signed-rank test, independent two-sample t-test, Mann-Whitney U test, two-sample t-test for paired samples, Wilcoxon signed-rank test [for dependent samples], Χ2-test, logrank test) • Examples and Hands-on Training
Day 4: Regression Analysis (9.30am – 1pm course, 2.30pm-5pm lab) • Linear Regression Models (incl. model diagnostics and variable selection) • ANOVA (incl. prognosis and model diagnostics) • Logistic Regression (short outlook) • Examples and Hands-on Training
Day 5: (optional 9.30am – 1pm lab)
• Optional discussion of statistical data analysis issues of participants • Examples and Hands-on Training
Course Prerequisites: Interest in statistical data analysis, basic statistical knowledge
Sounds pretty good! It will be a lot to take in but a great start to somebody’s academic year to learn all of that in the first week of October!
Timberlake are running a two-day course on programming with Stata for big data on 4-5 October 2012 in London. Click here for details…