How to work out the standard deviation of differences for a sample size calculation

This is something a colleague asked me about the other day. They needed to work out a sample size for future research comparing pre and post intervention outcomes, based on previous findings which stated the mean and SD pre-intervention, the mean and SD post-intervention, and the pre-post correlation. But they needed the SD of the differences between pre and post in order to do the sample size calculation. Here’s how you do it.

Let’s call the correlation (this only works for Pearson correlation by the way…)  Cor(pre, post), and the mean of the pre-intervention outcomes Mean(pre). Similarly we can talk about Mean (post), SD(pre), SD(post). And we will use the variances, which are just the standard deviations squared: Var(pre)=SD(pre)*SD(pre); Var(post)=SD(post)*SD(post)

Now, there is a statistic we will use in passing called the covariance, Cov(pre, post). You don’t have to know about what it is because we are just using it as a stepping stone.

Work out Cov(pre, post) = Cor(pre, post) * SD(pre) * SD(post)

Then find Var(post-pre) = Var(pre) + Var(post) – (2 * Cov(pre, post))

and finally SD(post-pre) is the square root of Var(post-pre)

The mean difference is simply the difference of the means: Mean(post-pre)=Mean(post)-Mean(pre)



    1. Not necessarily. If you draw your data in a scatterplot then the correlation is just how drawn-out they are into a straight line. 1 is a perfect straight line, 0.8 is like a baguette, 0.5 is much more oval like some sort of posh bread (pain rustique?). With whatever correlation, the X and Y variables (pre and post) could be scaled whatever way you want. Imagine SD(pre)=1 and SD(post)=2; then you get SD(post-pre)=sqrt(3).

  1. Hi, thanks for the info.
    I need to find out the difference in SD, Given- mean pre intervention, SD pre and mean post intervention, SD post.
    I do not have correlation(pre,post) nor do I have detailed data (just the mean & SDs). What do we do in such a case?

    1. Hi Smriti, you can’t calculate it exactly but you could guess at the correlation from other, similar studies; that’s what the Cochrane Handbook would suggest. I would try a range of plausible correlations as a sensitivity analysis. If your ultimate analysis is Bayesian, just include a beta prior for the correlation!

  2. Hello Robert,
    if (2 * Cov(pre, post) is major than Var(pre) + Var(post), Var(post-pre) will be negative. How does I calculate SD(post-pre) in this case?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s