# Football and domestic violence

An interesting article at the BBC website today claims that their own research has shown a link between England winning or losing football matches and the incidence of domestic violence. A closer look reveals the argument is really a lot shakier than that. We have numbers of incidents reported to police on the day of the match, compared to the same day a year ago, and alas! because England tend not to stay in these tournaments very long, there are only four matches to draw on. So that is four pairs of counts. The numbers are not given but one can work them out:

2510 (draw), 1890 (draw), 2427 (win), 3221 (lose) for the four match days and 2556, 1875, 1911, 2497 respectively for the previous year.

We could complain about the same date being a weekday one year and a weekend the other but the real problem is there’s not enough data. We need to know how much these incident counts fluctuate naturally from one day to another, and we can’t really tell from just four non-match days. You can see that there is an overlap between the match and non-match days, so common sense says this is not going to prove anything. Indeed if you wanted to prove that win/lose results were different to draws, then you’d only have two pairs in each camp.

Nevertheless you could go ahead and fit a generalized linear model (Poisson regression) to this, with year and result as predictors (if you try the interaction between year and result you will have run out of data, that’s how close to the edge this exercise is). This says p<0.0001 for an increased incidence with a win or lose result, which presumably is what Prof Brimicombe means when he says it is a “definitive and significant increase“. But Poisson regression requires a pretty specific formula for the natural fluctuation from one day to another (variance), and we can’t really test that with just four numbers. An alternative analysis is negative binomial regression, which does not have the stringent assumptions of the Poisson, but needs more data to be able to estimate the variance. Which of these alternatives best fits the data? We can’t tell because the negative binomial has too few data to converge to an answer for us.

Let common sense prevail. This is an important topic and an interesting analysis, but looking at four days is not going to reveal any causal link. It should be a springboard to applying for funding to do the research properly, not straight to the headlines.

R code:

library(MASS)
library(lmtest)
date<-factor(rep(1:4,2))
ipv<-c(2510,1890,2427,3221,2556,1875,1911,2497)
draw<-c(1,1,0,0,0,0,0,0)
result<-c(0,0,1,1,0,0,0,0)
preg<-glm(ipv~date+draw+result,family=poisson)
nbreg<-glm.nb(ipv~date+draw+result)
summary(preg)
summary(nbreg)
lrtest(preg,nbreg)