Joy unconfined

There’s been much excitement online about ggjoy which is an R package making vertically stacked area charts with a common x-axis. It is reminiscent of the Joy Division album cover, so I guess that ticks some cultural boxes in the dataviz community. The original version by Henrik Lindberg had the areas extending into the chart above,

joy-plot-lindberg

which provoked some killjoys to complain and the height got standardised. Here’s Xan Gregg tackling it with a stacked (small multiple? see below) horizon plot.

joy-plot-gregg

Fair enough, though, as Robert Kosara pointed out, it’s more joyless. Bob Rudis did a heatmap – violin plot cross

joy-plot-rudis-heatmap-violin-thing

And Henrik did a classic heatmap

joy-plot-heatmap-lindberg

None of which beat the original in my estimation.

You might use this look if you’re visualising continuous x, continuous y and categorical z. z is the stacking variable. Now, z could be ordinal or you might want to order it in some sensible way. What way? Like in the example above, it works best when there is an approximately smooth curve through the chart from top to bottom. Lindberg and Gregg have used different orders. There isn’t a right or wrong, because you emphasise different messages either way. Such is the way of dataviz. I think this is an important point, but for now consider a random z order.

outrandom

Not so good huh? To play with the order, clone Henrik’s github repo and play with the line

 mutate(activity = reorder(activity, p_peak, FUN=which.max %>% # order by peak time

I did:

 mutate(activity = reorder(activity, p_peak, FUN=function(x) { runif(1) })) %>% # order randomly

I think the appeal of these charts is that they emulate 3-D objects. That’s why I’m laid back about the chart extending into the block above.

This is all straightforward stuff with the three variables. Back in the day, I made an R function called slicedens that does something similar with two continuous variables. I chop up one of them (y) into slices, get kernel density plots for x in each slice of y, and off you go. Note the semi-transparency please.

slicedens

This attacks the recurring problem of a scatterplot with Big Data where it just turns into one massive blob of ink. Hexbins, contour plots etc are alternatives. Which of these can you batch-process when the data exceed the RAM or are streaming? Hexbins yes, because you just add to the count (and for real-time data, subtract the oldest set of counts). Contour maps no! Slice density yes, for the same reasons, that a kernel density plot is a sum of a load of little curves. If you want to draw it for a trillion data points, just break it up and accumulate density heights. (“Just,” he says!) If you want a real-time slice density for the last ten seconds of stock market trades, batch it up in seconds (for example), add them together. Every second, add the new one and subtract the one from ten seconds ago. Simples.

So, the 3-D object is important because this leverages a load of cognitive power to see things in the real world and makes the perception of pattern in the dataviz easier for the reader. Choses visibles et invisibles.

There’s a related question I want to expand on another time: when does a stacked chart become small multiples? This is not just semantics because they get understood in different ways.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s