This is something a colleague asked me about the other day. They needed to work out a sample size for future research comparing pre and post intervention outcomes, based on previous findings which stated the mean and SD pre-intervention, the mean and SD post-intervention, and the pre-post correlation. But they needed the SD of the *differences* between pre and post in order to do the sample size calculation. Here’s how you do it.

Let’s call the correlation (this only works for Pearson correlation by the way…) **Cor(pre, post)**, and the mean of the pre-intervention outcomes **Mean(pre)**. Similarly we can talk about **Mean (post)**, **SD(pre)**, **SD(post)**. And we will use the variances, which are just the standard deviations squared: **Var(pre)=SD(pre)*SD(pre)**; **Var(post)=SD(post)*SD(post)**

Now, there is a statistic we will use in passing called the covariance, **Cov(pre, post)**. You don’t have to know about what it is because we are just using it as a stepping stone.

Work out **Cov(pre, post) = Cor(pre, post) * SD(pre) * SD(post)**

Then find **Var(post-pre) = Var(pre) + Var(post) – (2 * Cov(pre, post))**

and finally **SD(post-pre)** is the square root of **Var(post-pre)**

The mean difference is simply the difference of the means: **Mean(post-pre)=Mean(post)-Mean(pre)**

### Like this:

Like Loading...

*Related*

Thanks for this. If Cor(pre, post) = 0.5, the does the SD(post-pre) equal SD(pre)?

Not necessarily. If you draw your data in a scatterplot then the correlation is just how drawn-out they are into a straight line. 1 is a perfect straight line, 0.8 is like a baguette, 0.5 is much more oval like some sort of posh bread (pain rustique?). With whatever correlation, the X and Y variables (pre and post) could be scaled whatever way you want. Imagine SD(pre)=1 and SD(post)=2; then you get SD(post-pre)=sqrt(3).

Hi, thanks for the info.

I need to find out the difference in SD, Given- mean pre intervention, SD pre and mean post intervention, SD post.

I do not have correlation(pre,post) nor do I have detailed data (just the mean & SDs). What do we do in such a case?

Hi Smriti, you can’t calculate it exactly but you could guess at the correlation from other, similar studies; that’s what the Cochrane Handbook would suggest. I would try a range of plausible correlations as a sensitivity analysis. If your ultimate analysis is Bayesian, just include a beta prior for the correlation!

Hello Robert,

if (2 * Cov(pre, post) is major than Var(pre) + Var(post), Var(post-pre) will be negative. How does I calculate SD(post-pre) in this case?

Thanks

That’s not possible because correlation is between -1 and +1, also 2xy<=x^2+y^2 for all real x,y. There must be a mistake somewhere.