Monthly Archives: February 2014

Big bad data

Article on the New York Times website yesterday on big data in business. In essence, there’s nothing new here for data analysts, it’s the old problem that once you have some fancy new method, some people stop thinking about what it means and how it could have gone wrong. But the case study of the A&E TV network sacking someone on the basis of real-time tweets, and then having to change their mind, is a nice one!

The writer explains big data as:

the term du jour for the collection of vast troves of information that can instantaneously be synthesized

which has a certain irony to it.

1 Comment

Filed under Uncategorized

Visualizing multivariate data – at the RSS

I’ll be talking (and videoing) at the RSS on 11 March. This is in part a re-run of the highly popular dataviz session at the conference last September, though not every speaker could make it, so I’m also giving an overview. You can expect an introduction to interactive online graphics for people more familiar with stats software, and advice on how to get started making your own.

Tuesday 11 March 2014, 02:00pm – 05:00pm
Location Royal Statistical Society, 12 Errol Street, London, EC1Y 8LX

2pm Urska Demsar (St. Andrew’s) Bringing together geovisualisation, time geography and computational ecology: using space-time density of trajectories to visualise dynamics in animal space use over time

2:40pm Duncan Smith (LSE) An Urban Renaissance Achieved? Visualising Urban Form, Dynamics and Sustainability

3:20pm Tea/Coffee/Biscuits

3:50pm Robert Grant (St. George’s) Pretty persuasion: visualisation trends and tools from a statistician’s viewpoint

4:30pm Discussion: “The role of Statisticians in data visualisation research”

5pm Close

Booking with payment required – please book using the relevant booking form.

Registration fees:
£20 RSS Student & Retired Fellows
£22 RSS CStats & GradStats
£25 RSS Fellows
£35 RSS section & student members
£45 None of the above

Leave a comment

Filed under Visualization

Mandelbrot journeys made audible

Aschinchon posted some R code on his blog recently to sonify / audiblize the journeys made by individual points in the Mandelbrot set. The world’s favourite squashed-bug fractal shape is made up from following the paths of each pixel on the screen. A simple formula dictates how the pixels jump around, and if they don’t leave the boundaries of the picture, then they are conventionally colored black, making the body and squishy bits of the bug. Yuk. If they fly off, then they get colored in depending on how many steps it took before they departed. So his idea is to take us on one of those journeys, and translate the successive steps into pitches, determined by their distance from the origin (L2 norm or length of the hypotenuse). Yes, it’s a boring old sine tone, but never mind, the content is worth it.

First, starting at the point (-1,0) produces a never-ending oscillation that reflects the point bouncing back and forth. Aschinchon calls it an ambulance siren sound; round here the classic ambulance and police nee-naw sound is a minor third, and the way it has been tuned, it comes out not far from that.

Ambulance siren (starting point -1 +0i)

Ambulance siren (starting point -1 +0i)

Next, “slow divergence” is an example of a point just outside the bug, which bounces around for a long time, gradually settling down before zooming off never to return. This would be one of the colorful pixels in the swirly psychedelic zone.

Slow divergence (starting point -0.75 + 0.01i)

Slow divergence (starting point -0.75 + 0.01i)

Finally, the highlight for me is “Feigenbaum point”, which is located at one of the self-similar zones on the Mandelbrot shape – check out the animation on Wikipedia – where zooming in or out will still produce the same pattern of influences. This is characteristic of transitions to chaos rather than stochasticity, you get a mixture of predictable and unpredictable elements. If you look at the graph below you’ll see it’s not exactly periodic, although there is a similar shape each time. The sonification really brings this to life! Click on any of these graphs to access Aschinchon’s WAV files.

Feigenbaum point (starting value -0.1528+1.0397i)

Feigenbaum point (starting value -0.1528+1.0397i)

Ben Bolker added a comment linking to his work with duodecimal representations of pi mapped to a chromatic scale of 12 notes (you can probably guess my views on this), but also the logistic map (now I’m interested), which you can hear here; and hear hear!, I say, because it’s really interesting in two ways, one of which is (I suspect) really quite unexpected. The logistic map seen in the graph below has the same mix of predictable and unpredictable, because you get stable regions around 0.75 which just don’t happen in stochastic sequences. If you look at the maxima and minima you’ll see monotonic increases and decreases which are also too consistent to be random. What I really like here is that the sound brings out this detail in a way that you just don’t see in the graph; you can clearly hear the increasing pitches at the extremes of each oscillation, and the brief stable periods.

Logistic map (x[i+1] = 3.99 * x[i] * (1 - x[i]))

Logistic map (x[i+1] = 3.99 * x[i] * (1 – x[i]))

The aspect that surprised me was the percussive rhythmic ticking. I’m not sure where that came from as it is made entirely from sine waves in the audible range 220-440Hz (that’s A to A either side of middle C). I think it is from discontinuities where they butt up against one another. But serendipitously this suggests that having the pulse running through makes it easier to hear the patterns and attend to it with our full attention. It makes it more musical, in short. I wonder whether a de-glitched (and hence arrhythmic) version would be as easy to understand – it would be a good experiment.

1 Comment

Filed under Uncategorized

Selfie City

I don’t really know what to say about this magnum opus in image processing, interactive graphics, geotagging, face recognition &c &c. It certainly is impressive, though after messing about with the controls I’m not sure I was any wiser about different cultural norms for expression and the semiotics of the selfie. But I was totally creeped out by all these people looking at me. Except for the guys aged around 40, with whom I felt a natural affinity. In this group, a global bond of regressing to teenage goofing around, pretending to be a gangsta, and most universally of all, failing to use the camera properly, emerges.

That looks familiar

That looks familiar

Looks to me from a quick squizz through the source code that it’s entirely built in D3. Impressive.

Thank you to Information Aesthetics for spotting it.

Leave a comment

Filed under JavaScript, Visualization

My mother ate nothing but tattie scones through the Rationing Years

We were all in it together

… that’s why I was a premature baby. Or not.

I had to issue a passing swipe at the headline that what your mother ate as a little girl strongly affects your birth weight in the Guardian. All based on retrospective records review of 84 mothers at one rural clinic in the Philippines. Hmmm.

Leave a comment

Filed under Uncategorized

New data processing tip: dates and times

I told you I was going to work on these data processing tips! Just uploaded a new one on dealing with dates and times, particularly around importing and moving between data analysis software packages. There’s no consensus on how to encode dates and times, and every software maker has done it differently (or in the case of Microsoft, two different ways with another complication depending on the date itself). Understanding what the software is doing helps, but there’s also a belt-and-braces way of making absolutely sure nothing goes wrong.

Leave a comment

Filed under learning, SPSS

Dataviz: good and bad

I’ve made a promise to myself not to blog anything until I get some more data processing tips written up on my website. But ‘ll break it just for a quick couple of links. One rocks, the other sucks.

First, an amazing visualization of current wind and weather conditions over the whole world, by Cameron Beccario. Source code here. This brings together a few different trendy tools: the data is automatically scraped, animated in a nice way with a planet that you can click and roll around. Very neat JavaScript, but also valuable as communication of quantitative information. Why is it better than just the old synoptic chart? Because it’s engaging, it gets people interested, and because you can see the whole story at a glance; you’re not limited to national boundaries. I think it’s potentially really useful for geography teachers everywhere. Arise, Sir Cameron. The next step would be to have it play the last week’s data as a video. I spotted it at Freakonometrics. The GIF below doesn’t really do it justice, by the way, go click on it.


Second, a graph spotted at Atlantic Cities which worried them because it looked like the whole world wants smaller households fast, and that’s going to cause environmental havoc. It worried me, on the other hand, because it just looked implausible. It’s amazing how complacent analysts* become as soon as they can switch on their stats software and do some fancy stuff. The common sense part of the brain powers down. Mmmm, breakpoints regression. Ooooh, bootstrapped starting values. Here’s a graph! What does it mean? Never mind that, let’s just publish the damn thing!

If you look at the slopes, the developed countries’ breakpoint is about 1893, which makes sense with industrialisation. The devloping countries have 1987, which doesn’t make so much sense. It’s not clear from the paper, but it looks like the breakpoint regression was done at country level, without weighting them by population. I’m happy to be corrected on that, but that’s what it looks like. That gives China and Swaziland exactly the same weight in pushing and pulling the line. And, most importantly, look over at the far right of the developing countries – there’s not many there with data since 1990 (they acknowledge this in the paper), and the ones who are there have smaller household sizes. Is it a trend or is it information bias? Smaller household <– healthier economy –> regular official statistics. This is not rocket science, it’s common sense. Think about what your data might mean! Aaargh.



* – by “analysts”, I mean the authors of the paper, not Emily Badger whose writing and keen eye for interesting stats I have admired for some time


Filed under animation, JavaScript, Visualization