A bird’s eye view of statistics in two hours

Next week I am giving a two-hour talk and discussion for Kingston University researchers and doctoral students, with the aim of being an update on statistics for those who are not active in the field. That’s an interesting and quite challenging mission, not least of all because it must fit into two hours, with the first hour being an overview for newcomers like PhD students from health and social care disciplines, and the second hour looking at big current topics. I thought I would cover these points in the second half:

  • crisis of replication: what does it mean for researchers, and how is “good practice” likely to change?
  • GAISE, curriculum reform & simulation in teaching
  • data visualization
  • big data
  • machine learning

 
The first half warrants a revised version of this handout, with the talk then structuring the ideas around three traditions of teaching and learning stats:

  • classical, mathematically grounded, stats, exemplified by Snedecor, Fisher, Neyman & Pearson, and many textbooks with either a theoretical or applied focus. Likelihood and/or adding prior to get posterior distributions are the big concepts here.
  • cookbook, exemplified by many popular textbooks out there, especially if their titles make light of statistics as a ‘hard’ subject (you could count Fisher here as the first evangelical writer in 1925, though it is harsh to put him in the same camp as some of these flimsy contemporary textbooks)
  • reformist, exemplified by Tukey in the 70s but consolidated around George Cobb and Joan Garfield’s work for the American Statistical Association. The only books for this are “Statistics: Unlocking the Power of Data” by the Lock family and “Introduction to Statistical Investigations” by Tintle et al.

It’s worth remembering that there are other great thinkers who accept the role of computational thinking and yet insist that you can’t really do statistics without being skilled in mathematics, of whom David Cox springs to mind.

eagle-over-100-000-acre-plain-at-susaki-fukagawa-juman-tsubo.jpg!Large

Hiroshige’s Eagle over the 100,000 acre plain of statistics. Note the density plot of some big data in the background.

The topics to interweave with those three traditions are models, sampling distribution versus data distribution, likelihood, significance testing as a historic aide to hand calculation, and Bayesian principles. I’ll put slides on my website when they’re ready.

While I’m on this subject, I’ll tell you about an afternoon meeting at the Royal Statistical Society on 13 October, which I have organised. The topic is making computational thinking part of learning statistics, and we have three great speakers: Helen Drury (Mathematics Mastery) representing the schools perspective, Kari Lock Morgan (Penn State University) representing the university perspective, and Jim Ridgway (University of Durham) considering what the profession should do about the changing face of teaching our subject.

Leave a comment

Filed under learning

Logic, stats and Brexit

Lots of stats are being bandied about as we prepare for the famous Brexit vote. Not all of them are good, and there are conflicts of interest everywhere, perceived or real. It is tedious to demolish bad stats over and over, so I will take a different tack that caught my eye today, and that is the application of good, solid, old-fashioned logic. A few years ago, I recall being in a session at the RSS conference, in a room with about 50 people. Ian Hunt asked for a show of hands if the audience had ever studied any logic course at school or college, and mine was the only one to go up. I really enjoyed that course, and the textbook was an old one by Wilfred Hodges (“Logic”) which has been reprinted a zillion times since it first came out. It is pithy but engaging, a real exemplar of textbook writing at an introductory level. I commend it to all humans. Its benefits last a lifetime.

Let’s apply those skills, cobwebbed since the 1990s, to this webpage and this letter (paywalled) to The Times from the Institute for New Economic Thinking (INET) at Oxford. INET is in part funded by the European Commission (inet.ox.ac.uk/files/publications/INET%20Highlights%20Report%202012-14.pdf page 58) – let’s just put that fact out there and let you make of it what you will – personally, I don’t think it counts for all that much.

Rather surprisingly, they make three arguments and each is unsupported by the data they provide, and also logically fallacious. A tour de force of blundering.*

So, arguments in three parts:

  • No 1: “History is clear: things have gone very well for Britain as a member of the EU.”
    For this, you can see a chart that shows GDP per capita relative to 1973 going up, and faster than those blasted French, Germans and Americans. Ha! That’ll teach them. Or perhaps we were just in a really bad place in 1973, and were subsequently buoyed up by sales of Pink Floyd’s Dark Side of the Moon.** More importantly though, the fact that we did well while in the EU does not imply we did well because of the EU. Our GDP per capita went up 12.3 times in the 40 years that followed, but China’s went up 43.9 times, which, by the same logic, is clear evidence that we lost out by not having the Cultural Revolution. Damn! This fallacy is called post hoc ergo propter hoc, and is a staple of politicians everywhere. You could succinctly describe it as conflation of subsequence and consequence. Furthermore, even if we did well because of the EU, that still gives only a weak level of confidence in future performance, which is the real decision to be made here.
  • No 2: “Secondly, growth in the UK was more equally shared than in the USA”
    I’m not sure what this has got to do with Brexit, other than the unspoken suggestion that if we left, our Government (more right-wing than most EU countries in economic terms, but still verging on Trotskyite from an American perspective), would gradually erode policies that promote equality. INET say this:
    “Britain has had the best of both worlds while a member of the EU — not just strong growth, but more equal growth”,
    which still has a dose of post hoc about it, but also Weak Analogy: the suggestion is that we’d better stay in because someone outside is less equal than us, and if we leave we are sure to become like them. If that’s not the implication, then it must be irrelevant. There may also be some selective quoting going on here. Why the USA? They have a World Bank Gini coefficient of 41.1 to our 32.6, which means we are not as unequal as them. Note here that South Africa is not quoted as the inequality example, despite being statistically more striking (63.4), because there are well-known historical reasons why we would not become like them. To quote South Africa would be pushing it too far. Quote the USA and you might just get away with it. Norway, famous for leaving the EU, has 25.9. Just sayin’. (Forget about their gas because inequality is not the same as GDP per capita.)
  • No 3: “At present 45% of the UK’s exports go to other EU member countries. In response to the concern that the EU might impose high tariffs or punitive measures if the UK leaves, some Brexiteers have said that we can ‘just trade with Australia and Canada’. These two countries, however, only account for a meagre 2.9% of British exports.”
    Well, they would, wouldn’t they. That’s because we’re in the EU, not some Commonwealth trading bloc. The real question is how things might change, not what they are now. So, like no 2, an unwritten implication is being made here about the future. In no 2, it was that everything would change, and here, strangely, it is that nothing will change. Why this difference? Perhaps because it fits the prior beliefs of the authors, or perhaps it is just carelessness (oops). So, if the assumption is true that nothing will change, then we will trade little tomorrow with the same people we traded little with yesterday, which proves (wait for it) that nothing will change! What mastery of the argument, what skill with the pen. This is in fact a nice example of begging the question.

I’ve nothing against them making a strong case for what they believe, and I am delighted to see an attempt to use data to support any such argument, but I think one should not do the public the disservice of misleading them through repeated abuse of both logic and statistics.

* – my wife has told me not to antagonise people online, so I do not say this without first considering whether I am truly justified.
** – this is grotesque and silly hyperbole. But it is at least not post hoc ergo propter hoc, which makes it an improved explanation on the INET letter.

Leave a comment

Filed under Uncategorized

Borderline

Over the years I have gradually become more consistent in calling p-values from 0.02 to 0.08 borderline. There’s no reason for those cutoffs other than personal experience, also known as “making the same mistakes with increasing confidence [npi] over an impressive number of years“. Type 1 errors (which never really happen, but you know what I mean) have uniformly distributed p, so that’s where you’re going to find them. Just thought I’d say that.

Leave a comment

Filed under Uncategorized

Visiting Big Bang Data

I finally got a chance to visit the exhibition Big Bang Data at the Embankment Galleries, Somerset House this week. I had heard good things about it, and of course I am a big fan of Dear Data so I couldn’t pass up the chance to see those postcards in real life.

Screen Shot 2016-03-18 at 09.59.38

My GPS trace over 4 years around Somerset House from Google location history, visualised using theopolis.me/location-history-visualizer

Good stuff: it was really quite busy. An audience almost all younger than me were obviously enthusiastic and stimulated by it all. And I have to say it was Dear Data that held the most attentions for the longest. There is no patronising text helping people bridging art to science or vice versa; I think that’s less necessary nowadays. A broad church from activism to paranoia to fun. Of all exhibits, I got the Biggest Bang from (awooo) Networks Of London by Ingrid Burrington and Dan Williams. They mapped out the secretive ways in which data moves around the physical world in this town. More than any theory, this brings home what a big deal it is (or is perceived to be by Dilbert’s boss) because of the colossal cost of creating and maintaining all of this infrastructure, often for reasons that seem flimsy to us everyday folk, like selling access to stock market data that might get there a few milliseconds before your competitor gets it.

Not so good: I think process matters more with this than a bunch of paintings. Artists don’t like talking about How I Made Elastic Man but in this setting, it would be nice to have some videos with headphones that delved more into how it is done. However… there’s loads of stuff on the website, so go and look at that even if you’re not in London twiddling your thumbs and looking for intellectual fun this weekend, Personally, also, I knew about or had seen quite a lot of these projects before, but I guess that’s inevitable, and it’s nice to see them in the flesh.

If you want to go, you’d better hurry. It closes on Sunday.

Leave a comment

Filed under Visualization

Poor review

Firstly, let’s namecheck the wittiest punner in stats for that title, Stephen Senn.

This recent post on Andrew Gelman’s blog is essential reading. I suspect my readers are all over there too, but I’ll mention it here because of this point in wrapping up:

Peer review can serve some useful purposes. But to the extent the reviewers are actually peers of the authors, they can easily have the same blind spots. I think outside review can serve a useful purpose as well.

I’ve seen this a lot in my life as a medical (read ‘health and social care more broadly, with a dash of education’) statistician. There are distinct tribes of healthcare professionals and they do things, including research designs, analytical methods and communicating findings, in their own sweet way. There’s generally no reason, it’s just custom and ritual. If you don’t fit that mould to some extent, you get rejected. Often, I find myself consciously peppering the paper / slides with some shibboleths that will ease my journey to REFland. (Of which, sample size calculations for anything that isn’t a randomised controlled trial is the most common, although I am no stranger to the Totally Unnecessary Reporting Diagram (TURD). I draw the line at Cohen’s D though; D stands for d’oh.)

‘Outside review’ reminds me of the idea of ‘strong inference’, or having your worst enemy analyse your data too and see if they can destroy your conclusions. You don’t have to go that far though, you could just make sure that reviewers extend beyond the specialism and profession of the authors to break that parochialism and question the unquestionable.

People Against Goodness And Normalcy

Essentially, if they can’t understand it, then it’s not written well. I don’t accept any argument that the subject is just too complex for outsiders – because the authors’ interests were once upon a time confined to Lego or the Smurfs, so it must be possible – nor do I claim to have got it right myself – it’s a constant challenge to pitch Bayesian latent variable models just so for a subject-expert audience.

I don’t know about you, but I enjoy reading academic papers outside my own field (OK, no critical theory please, but I’ll consider pretty much anything else). Maybe I should start an occasional series of randomly selected academic papers here, or maybe I just don’t have time for that.

2 Comments

Filed under Uncategorized

Notes after the death of Pierre Boulez

I’m going to take a diversion from the staple statistical fare and mark the passing of a man who has obliquely, and not without contradiction, been a long-running source of inspiration to me. The death of composer and conductor Pierre Boulez was announced in January. There is plenty you can read about him online, so I won’t attempt any kind of obituary; rather, I want to reflect on the art-science intercourse and the unexpected lessons in living and working.

For this post, it was quite hard to decide what to cover and how to structure it. Finally, I felt I should get on with it and follow his style and form. It would be tempting to keep deleting and rewriting it every ten years or so, but I don’t plan to do that. The text is not here but on my website (click here), because it uses a little JavaScript to play the role of a conductor interlocking the rings of Le Marteau Sans Maître.

Leave a comment

Filed under Uncategorized

Scaling Statistics at Google Teach

Not unrelatedly to the Data Science angle, I read the recent paper “Teaching Statistics at Google Scale” today. I don’t think it actually has anything to do with teaching, but it does have some genuinely interesting examples of the inventive juggling required to make inferences in big data situations. You should read it, especially if you are still not sure whether big data is really a thing.

Leave a comment

Filed under computing