# Should every nonparametric test be accompanied by a bootstrap confidence interval?

Well, duh. Obviously. Because (a) every test should have a CI and (b) bootstrap CIs are just awesome. You can get a CI around almost any statistic, they account for non-normality and boundaries.

But you might have to be a little careful in the interpretation, because they might not be measuring the same thing as the test.

Take a classic Wilcoxon rank-sum / [Wilcoxon-]Mann-Whitney independent-samples test (don’t you just love those consistent and memorable names?). This ranks all the data and compares them across two groups. Every bit of the distribution is contributing, and there isn’t an intuitive statistic; what you’re testing is the W statistic. Do you know what a W of 65000 looks like? No, neither do I. If there’s a difference somewhere in terms of location, it might come up.

It’s so much simpler for the jolly old t-test. You take means and compare them. You get CIs around those means with a simple formula. And everybody knows what a mean is, even if they don’t really want to grapple with a t-statistic and Satterthwaite’s degrees of freedom.

So, in the Mann-Whitney case, the most sensible measure might be the difference between the medians. There is no formula for a CI for this, though undoubtedly we could get a pretty bad approximation by the usual techniques. So, we reach for the bootstrap. In fact, perhaps we should just be using it all the time…?

So the problem here is that you could have a significant Mann-Whitney but a median difference Ci that crosses zero. Interpreting that is not so easy, and I found one of my students in just that pickle recently. It was my fault really; I’d suggested the bootstrap CI. How could we deal with this situation? Running the risk of cliché, it’s not a problem but an opportunity. Because the test and the CI look at the data in slightly different ways, you’re actually getting more insight into the distribution, not less. Consider this situation:

Here, the groups have the same median but should get a significant Mann-Whitney result if the sample size is not tiny. You can surely imagine the opposite too, with a bimodal distribution where the median flips from one clump to another through only a tiny movement in the distribution as a whole.

So, in conclusion:

• my enthusiasm for bootstrapping is undimmed
• there is still no substitute for drawing lots of graphs to explore your data (and for this, pencils are probably best avoided)