Yesterday I was reading Kontopantelis & Reeves’s 2010 paper “Performance of statistical methods for meta-analysis when true study effects are non-normally distributed: A simulation study“, which compares fixed-effects and a variety of random effects models under the (entirely realistic) situation where the studies do not happen to be drawn from a normal distribution. In theory, they would be if they were just perturbed from a global mean by sampling error, and that leads us to the fixed effects model, but the random effects says that there’s other stuff making the inter-study variation even bigger, and the trouble is that by definition you don’t know what that ‘stuff’ is (or you would have modelled it – wouldn’t you???)
The random effects options they consider (with my irreverent nutshell descriptions in brackets; don’t write and tell me they are over-simplifications, that’s the point) are DerSimonian-Laird (pretend we know the intra-study SD), Biggerstaff-Tweedie (intra-study SDs are themselves drawn from an inverse-chi-squared distribution), Sidik-Jonkman (account for unknown global SD by drawing the means from a t-distribution), “Q-based” (test Cochran’s Q for heterogeneity, if significant, use D-L, if not, use FE), maximum likelihood for both mean and SD, profile likelihood for mean under unknown SD, and a permutation version of D-L which was proposed by Follmann & Proschan, and seeing as everyone has been immortalized in the meta-analysis Hall of Infamy, I’m going to do it to them too. All in all a pretty exhaustive list. They tested them out in 10,000 simulations with data from a variety of skewed and leptokurtic population distributions, and different numbers of studies.
Conclusion one is that they are all pretty robust to all but the most bizarre deviations from normality. Having said that, the authors offer a graph dividing the world into D-L optimal and profile likelihood optimal, which I found a bit odd because, firstly, DerSimonian-Laird is never true, it’s just an approximation, secondly, profile likelihood required bespoke, painful programming each time and thirdly, they just said it doesn’t matter. I rather like the look of Sidak-Jonkman in the tables of results, but that may be a cognitive bias in me that prefers old-skool solutions along the lines of “just log everything and do a t-test, it’ll be about right and then you can go home early” (a strange attitude in one who spends a lot of time doing Bayesian structural equation models). I also like Follmann-Proschan for their auto-correcting permutations, but if a non-permuting method can give me a decent answer, why bother?
Interestingly, the authors have provided all these methods in an Excel plug-in (I can’t recommend that, but on your head be it) and the Stata package metaan, which I shall be looking into next time I have to smash studies together. In R, you can get profile likelihood (and, I think, Biggerstaff-Tweedie) from the metaLik package, and maximum likelihood, Sidik-Jonkman and a REML estimator too from metafor. Simpler options are in rmeta and some more esoteric ones in meta. However, it still seems to me that the most important thing to do is to look at the individual study effects and try to work out what shape they follow and what factors in the study design and execution could have put them there. This could provide the reader with much richer information than just one mega-result (oops sorry, inadvertently strayed a little too close to Eysenck there) that sweeps the cause of heterogeneity under the carpet.