My short paper on this came out on Friday in the British Medical Journal. The aim is to help both authors and readers of research make sense of this rather confusing but unavoidable statistic, the odds ratio (OR). The fundamental problem is that quoting the odds in group A, divided by the odds in group B, confuses most people because we just don’t think in terms of odds.

The home-made video abstract on the BMJ website shows you the difference between odds and risk, and how one odds ratio can mean several different relative risks (RRs), depending on the risk in one of the groups. Unfortunately, in some situations, you just have to get an OR, notably logistic regression and retrospective case-control studies.

The bottom line is that authors should present RRs if they can, and with excellent software like margins and marginsplot in Stata, and effects in R, there’s really no excuse not to do this, even for complex models. In particular, I’m a huge fan of the plots of marginal probabilities from these packages, which help you to show the complex patterns in your data to an audience that will run scared from tables of ORs, interaction terms and confidence intervals. John Fox’s 2003 paper is still worth reading.

For readers, it can be harder, because you only have the information in the paper. Anyone who has done a systematic review will know what I mean – the baseline stats are given in Table 1 and the ORs in Table 2 or 3, and without any idea of the risk in one of the groups, you can proceed no further. However, I would suggest there is still hope. If you can get a range of *plausible* risks for the control group, you can work out a range of *plausible* relative risks. The formula is:

RR = OR / (1 – p + (p x OR))

where p is the risk in the control group. I’ve given a ready-reckoner table in the BMJ paper.

And one more subtlety, if I may. As we’ve seen, a statistical model with a single shared OR for everyone (take this pill and your odds of a heart attack go down by 10%) does not imply a shared RR for everyone. If the logistic regression included adjustment for confounders, or if the case-control study was matched, then those other factors will warp the RR for different subgroups of people. This is where the plausible range comes in handy as well.

It sounds a bit Bayesian, but is really just a sensitivity analysis. However, a Bayesian meta-analysis could look at trials reporting ORs with little or no other supporting information , as well as trials reporting RRs, and combine them to get a pooled RR, if it included a shared prior for control group risks. This is one of my next projects…

This post is rather meaningless without reference to the design of the study in question. There is a well-established convention that case-control studies use odds ratios whereas cohort and RCT studies use risk ratios. There are good reasons for that. In a cohort study, you don’t need a conversion formula because you can calculate the RR directly. In a case control study, we are measuring the odds of exposure among the cases divided by the odds of exposure among the controls. If you use the risk ratio, people will think you mean the risk of disease in the exposed divided by the risk of disease in unexposed. “Risk” of acquiring the disease is not possible to measure in a case-control study, thus we use the odds ratio to prevent confusion.

“In a cohort study … you can calculate the RR directly” : sure, unless you have to adjust for confounders, which is almost always the case. Logistic regression will rarely converge with a logarithmic link function (see post “my odds ratios have gone weird”)

“… thus we use the OR to prevent confusion” : an OR has never prevented confusion! It is the reversibility of the formula that makes it useful in case-control settings, unless you do a Bayesian analysis with external prior information on the population prevalence.

Thank You Dear Robert for the example that is pushing us to use the RR instead of the OR, i would like to get more examples with 1 or 2 confounders in and how they will affect the formula and how to deal with the baseline risk ( i mean from where to get it and how it is calculated).

Sincerely.

Dear Robert,

Hello! I am a PhD student from Singapore in my first year. I had some problem in comprehending the OR to RR conversion. Your article did help me clear a few doubts 🙂

For my longitudinal cohort project, currently I am converting OR (obtained from binary logistic regression, SPSS) to RR using the formula:

RR = OR / (1 – p + (p x OR)) , as you have mentioned.

I would like to clarify a few doubts (apology if they are too amateur!):

Query-1: As per the definition, I am considering the reference group of each confounder as the non-exposed group, and thus, calculating the incidence of outcome in the non-exposed group for each confounder. Is this method accurate?

Query-2: If the above method is accurate, the dichotomous categorical variable is easy to calculate. But how do I calculate for the continuous variable e.g. age?

Thank you so much for your help!

Regards,

Tosha

Sorry I missed your comment until now! The short answer is that with confounding, it quickly gets complicated. By far the best thing to do is to use a marginal effects function like in Stata. It will make your life so much easier!