Monthly Archives: January 2017

Dataviz of the week, 25/1/17

We had guideline-bustin’, kiddie-stiflin’, grandparent-over-the-threshold-usherin’ pollution in London at the beginning of the week. This is fairly standard nowadays, sadly. It’s not quite so bad out where I live in the Cronx, but in town it’s the worst in Europe. At the same time, Cameron Beccario pointed out the Beijing effect in his wonderful globe of carbon monoxide levels – far worse than anywhere else in the world, though there are some petrochemical hot spots. I’ve praised this live viz before, but that was before I started having a pick of the week on my office door (then, when the door went, here on the blog), so I’ll mention it again. Nice.


Leave a comment

Filed under Visualization

Is standing up good for your health?

Is sitting the new smoking? The excellent Tony “Dr K” Komaroff says yes, and I have great respect for his work on that website. I read the original paper (standing up and pacing about the office, of course) and felt that it was all just a bit too linear for my taste. What I’d really like to see on these data are additive / semi-parametric models of some form or other, graphically presented for us. I expect that the benefit is not a straight line function of time spent standing/stepping. Also, I suspect that activity at one time of the day is not the same as activity at another and would like to see that explored. And all in all, it’s such an interesting and important topic, it just seemed to me to be obscured unnecessarily by the stats, viz:

“Associations are described as regression coefficients (beta) or relative rates for log-transformed outcomes with 95% confidence intervals, and are plotted on a log scale, with beta rescaled as (beta+mean)/mean”

  1. mean? what mean?
  2. what log scale? the axis isn’t labelled
  3. log-transforms are cool because you get a multiplicative effect; why not use that to your advantage and describe a 10% reduction in triglycerides rather than RR=0.90?

The effects are for a 2 hour change per day (every day!) from sitting to standing, or sitting to stepping. Is that a meaningful change for most people? It sounds pretty ambitious! So, if I do 1 hour, do I get a 5% reduction (or, let’s be more mathematically aware, a 100*(1-sqrt(0.9))=5.1% reduction)? And we come back to the semi-parametric model of some form or other. There are so many cool models you can use, Generalized Additive Models being the most obvious candidate that comes to mind, why not do that sort of thing next time you face the Curse Of Linearity?

Now, for me, the real problem here is the disconnection between understanding the context of the analysis and then actually doing it. The experts who conducted the research certainly know that health benefits and biochemistry changes do not carry on and on and on as you pile in more minutes of standing up. Of course they know that! So then why do they go off and do totally dumb-arse things like this? At what point between starting up the computer and submitting the paper did they disengage the brain and go into a sort of auto-pilot torpor? I find it incredible. You want to know about effective feature selection in statistical modelling? Try thinking!

Leave a comment

Filed under healthcare

Dataviz of the week, 17/1/2017

These simple line charts are a lot of fun. Your task is to guess what happened to various stats during the Obama years. Then the truth is revealed. I got the first one amazingly close to the truth, felt pretty smug, then missed all the others by a mile. You might expect a rather partisan message from this left-wing (by American standards) source, but it is quite neutral.


Larry Buchanan, Haeyoun Park and Adam Pearce are the creators. Oh for the good old days when everyone was using d3 for online interactive graphics and the source code was easy to follow. These images don’t have to be interactive, just to have part of the line invisible and then appear. They seem to have made the whole thing in Illustrator and done some ai2html conversion from there. Each2theirown. It seems to me like it would actually take longer to do that than to just get on and code the damn thing from first principles. Drawing the line on top is actually pretty easy to achieve, even I can do that sort of thing, so, like Ken Hom’s hot wok, so can you.

This kind of interactive would be quite nice for teaching stats. And I like the way that the y-axis range changes slightly so as not to give you any clues.


Leave a comment

Filed under Visualization

Dataviz of the week, 13/01/2017

This chart from the Upshot team at the New York Times was picked up by Alberto Cairo on Twitter. The large blank space is included, partly because it’s good to have the domain of the variable visible in the axis, but mostly as a kind of mute protest to the gap between experts and public. I shall refrain from getting sidetracked into a discussion of the nature of evidence, complex systems and such.


The interesting thing is how the empty space looks impressive on the page, and not so on the screen – or so I thought, anyway. Empty space on a newspaper page is so unusual and reminds me of the classic journalists’ protest against censorship.

Leave a comment

Filed under Visualization

Stats in bed, part 2: Linux on Android

Never let it be said that Robert forgets to come back and finish off a job, although it might take a really long time. Last time (goodness, nearly 3 years ago; the antiquity of part one is shown by the long-dead term “phablet”), I was poking at Ubuntu Touch to see if it might offer a way of doing analysis on the go. Soon after that, I looked into more lightweight Linux implementations.

Firstly, your device will need to be rooted (no giggling at the back), as shown by the open padlock when it starts up. I discussed this last time; there’s plenty of advice online but basically it helps a lot to have a Linux computer (in this as in so many other ways).


Everything that follows happened a couple years ago so use your brains in checking details of apps etc if you want to try it out. I accept no responsibility for anything, ever.

So, the general idea here is to have a Linux virtual machine on your Android device. I started off using an app called GNUroot which was easy to use but had a limitation in getting files off the virtual space into the real world. When I restarted it, it made a new virtual drive, wiping out old files. Ok except for the fairly common crashes which lost all the work done in that session. It couldn’t work directly on the Android part of the machine (so to speak).

The next attempt was more stable but a little more complex. I installed Linux Deploy, which creates a 4GB virtual drive image and keeps that between uses (no more lost files). Instead of having one app that acts as a VM, I SSH’d into it using the app JuiceSSH (there are several like it).


The first step is to open Linux Deploy and press Start. An IP address appears at the top and we will use this to communicate with the VM using SSH.


Then, I went to JuiceSSH and chose (or typed in) the IP. Boom! You’re in Linux!


Awesome. Installing and updating programs sender to be a bit hit and miss, sometimes throwing up odd messages. So, to use R on the terminal, I relied on having the latest unstable build from Linux Deploy. 


I could even write files out.


In Linux Deploy, you can have the Android memory appear like a mounted drive to the Linux VM at ../../mnt/0


Coming out of Linux, we find our new file is right there, like magic.


I still find it pleasing to look at the file in a text editor in Android and marvel at how it got there. Simple pleasures.


So that’s fun. But a little clunky. No compiling C++, no Stan, or those other new-ish R packages that rely on Rcpp to build faster machine-code bits and bobs, though you can use lots and lots of R stuff. The antiquity of these experiments means I didn’t try Python out but I’m sure it would work just fine. Also, as self-styled clinical research maverick @anupampom pointed out to me, a major advantage is that you can take that linux.img file, stick it on another device with Linux Deploy, and carry on where you left off. Nice. It’s like a Docker container (kind of).

For reasons that may start to come into focus now, I gave up on doing phablet data science about this time. Not that way anyway. But the question of remote, platform-independent analysis and programming remains. And in part three, I’m going to close down this discussion with the real solution to this, which is in platform-independent interpreted languages.

Leave a comment

Filed under computing

Dataviz of the week, 06/01/2017

This is a spiral format of four months of time, with two colours (nice choices too) indicating sleep/awake patterns of a newborn baby. omg the first month is hard. I’m only 5½ months into being a dad, and I’ve already forgotten about it.


Made by Andrew Elliott, original on reddit here, brought to my attention by Randy Olsen on Twitter here.

On twitter you’ll see some people objecting to the spiral format, and it’s true that there is distortion with the early days taking up less space on the screen, but you trade that for eye-catching (GU6) and the continuity of time. No perfect mapping into visual parameters.

Leave a comment

Filed under Visualization