model0 <- glm(Goals ~ 1,data=StartData,family='poisson')

model1 <- glm(Goals ~OffenseClub,data=StartData,family='poisson')

model2 <- glm(Goals ~OffenseClub + DefenseClub,data=StartData,family='poisson')

model3 <- glm(Goals ~OffenseClub + DefenseClub +

OffThuis,data=StartData,family='poisson')

anova (model0,model1,model2,model3,test='Chisq')

Analysis of Deviance Table

Model 1: Goals ~ 1

Model 2: Goals ~ OffenseClub

Model 3: Goals ~ OffenseClub + DefenseClub

Model 4: Goals ~ OffenseClub + DefenseClub + OffThuis

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 611 865.23

2 594 754.17 17 111.064 7.610e-16 ***

3 577 699.13 17 55.043 6.743e-06 ***

4 576 668.96 1 30.172 3.955e-08 ***

It appears that modelling step which makes the model more complex is significant, we must reject the hypothesis that any of these terms is not relevant. Hence the number of goals is dependent on the teams plus a home team effect.

### The twelfth man

It does make a difference who is playing at home. In practical terms, due to the model used, this advantage is difficult to interpret. In general, when two clubs of equal strength play each other, they each make 1.3 goals.

exp(coef(model2)[1])

(Intercept)

1.346538

When one of these equally strong teams plays away, the other at home, the numbers change. A team playing at home makes 1.6 goals, while playing away only 1.1.

exp(coef(model3)[length(coef(model3))] + coef(model3)[1])

OffThuis

1.58019

exp(coef(model3)[1])

(Intercept)

1.112886

This would make playing away or at home both statistical and practically significant. Note that the size of this effect can not be transferred to other circumstances.

### The teams

Each of the teams has two parameters in the model. These can be most easily be interpreted as offensive and defensive power. The following code plots these powers.

co <- coef(model3)

coO <- co[grep('Offense',names(co))]

coD <- co[grep('Defense',names(co))]

names(coO) <- gsub('OffenseClub','',names(coO))

names(coD) <- gsub('DefenseClub','',names(coD))

# Ado Den Haag is missing in the parameterization. so it is added.

coB <- rbind(cbind(coO,coD),matrix(c(0,0)

,nrow=1,,dimnames=list('Ado Den Haag',c('coO','coD'))))

# scaled for relative strength

coB <- as.data.frame(scale(coB,scale=FALSE))

# -coD to make more defensive power visually larger

plot(-coD ~coO, type='n', data=coB,xlab='Offensive power',ylab='Defensive power',axes=FALSE)

text(-coD ~coO,data=coB,labels=rownames(coB))

abline(a=0,b=1)

abline(v=0)

abline(h=0)

The plot shows the axes, a team close to the centre (NAC Breda, FC Utrecht) was average in both offensive and defensive strength. A diagonal line depicts the equal defense and offense strength region. Hence Feyenoord is equally strong in offense and defense, same for De Graafschap. The line is not quite diagonal, the range in in offense strength is larger than the range in defense strength. The best teams is top right; Ajax. The worst teams are bottom left; De Graafschap and Excelsior have relegated to eerste divisie. A few clubs are noticeable for their mismatch in offensive and defensive strengths. SC Heerenveen has almost the same goal making power as Ajax, but not enough defensive capacity. In contrast, Vitesse won't receive many goals, but lacks the power to make the goals. Overall they have about the same strength.

Otherwise stated; if SC Heerenveen played against itself. ignoring home team advantage, it would probably make two or even three goals.

fbpredict(model2,'SC Heerenveen','SC Heerenveen')[[1]]

SC Heerenveen in rows against SC Heerenveen in columns

0 1 2 3 4 5 6 7 8 9

0 0.0060 0.0153 0.0196 0.0167 0.0107 0.0055 0.0023 0.0009 0.0003 0.0001

1 0.0153 0.0391 0.0501 0.0428 0.0274 0.0140 0.0060 0.0022 0.0007 0.0002

2 0.0196 0.0501 0.0641 0.0548 0.0351 0.0180 0.0077 0.0028 0.0009 0.0003

3 0.0167 0.0428 0.0548 0.0467 0.0299 0.0153 0.0065 0.0024 0.0008 0.0002

4 0.0107 0.0274 0.0351 0.0299 0.0192 0.0098 0.0042 0.0015 0.0005 0.0001

5 0.0055 0.0140 0.0180 0.0153 0.0098 0.0050 0.0021 0.0008 0.0003 0.0001

6 0.0023 0.0060 0.0077 0.0065 0.0042 0.0021 0.0009 0.0003 0.0001 0

7 0.0009 0.0022 0.0028 0.0024 0.0015 0.0008 0.0003 0.0001 0 0

8 0.0003 0.0007 0.0009 0.0008 0.0005 0.0003 0.0001 0 0 0

9 0.0001 0.0002 0.0003 0.0002 0.0001 0.0001 0 0 0 0

If Vitesse played against itself it would make zero or one goal.

fbpredict(model2,'Vitesse','Vitesse')[[1]]

Vitesse in rows against Vitesse in columns

0 1 2 3 4 5 6 7 8 9

0 0.1165 0.1252 0.0673 0.0241 0.0065 0.0014 0.0002 0 0 0

1 0.1252 0.1346 0.0724 0.0259 0.0070 0.0015 0.0003 0 0 0

2 0.0673 0.0724 0.0389 0.0139 0.0037 0.0008 0.0001 0 0 0

3 0.0241 0.0259 0.0139 0.0050 0.0013 0.0003 0.0001 0 0 0

4 0.0065 0.0070 0.0037 0.0013 0.0004 0.0001 0 0 0 0

5 0.0014 0.0015 0.0008 0.0003 0.0001 0 0 0 0 0

6 0.0002 0.0003 0.0001 0.0001 0 0 0 0 0 0

7 0 0 0 0 0 0 0 0 0 0

8 0 0 0 0 0 0 0 0 0 0

9 0 0 0 0 0 0 0 0 0 0

### model extensions

The Residual deviance of model3 is 668.96 on 576 degrees of freedom. That might mean some more effects can be found in the data.

#### twelfth man and teams

The first extension is that home and away advantage is different between teams. Based on these data, this does not seem to be statistically significant.

model4a <- glm(Goals ~OffenseClub*OffThuis + DefenseClub

,data=StartData,family='poisson')

model4b <- glm(Goals ~OffenseClub + DefenseClub*OffThuis

,data=StartData,family='poisson')

model5 <- glm(Goals ~(OffenseClub + DefenseClub)*OffThuis

,data=StartData,family='poisson')

anova (model3,model4a,model5,test='Chisq')

Analysis of Deviance Table

Model 1: Goals ~ OffenseClub + DefenseClub + OffThuis

Model 2: Goals ~ OffenseClub * OffThuis + DefenseClub

Model 3: Goals ~ (OffenseClub + DefenseClub) * OffThuis

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 576 668.96

2 559 649.00 17 19.953 0.2766

3 542 626.77 17 22.236 0.1758

anova (model3,model4b,model5,test='Chisq')

Analysis of Deviance Table

Model 1: Goals ~ OffenseClub + DefenseClub + OffThuis

Model 2: Goals ~ OffenseClub + DefenseClub * OffThuis

Model 3: Goals ~ (OffenseClub + DefenseClub) * OffThuis

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 576 668.96

2 559 647.46 17 21.499 0.2048

3 542 626.77 17 20.690 0.2404

#### Before and after winter break

Winter break has the possibility to change players. It might be, that teams change in quality in this period. In these data, it seems this effect is not statistically significant.

StartData$year <- factor(c(substr(old$Datum,1,4),substr(old$Datum,1,4)))

model6 <- glm(Goals ~OffenseClub + DefenseClub + year + OffThuis

,data=StartData,family='poisson')

model7 <- glm(Goals ~(OffenseClub + DefenseClub)*year + OffThuis

,data=StartData,family='poisson')

anova (model3,model6,model7,test='Chisq')

Analysis of Deviance Table

Model 1: Goals ~ OffenseClub + DefenseClub + OffThuis

Model 2: Goals ~ OffenseClub + DefenseClub + year + OffThuis

Model 3: Goals ~ (OffenseClub + DefenseClub) * year + OffThuis

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 576 668.96

2 575 668.82 1 0.135 0.7129

3 541 625.48 34 43.345 0.1308

Could we do the same with individual players?

ReplyDeleteThat would be difficult if not impossible. The smaller problem is getting data. Who played in which matches? The bigger problem is the amount of data per player. Some players may have only few games hence very imprecise results. Other players may have played a lot together, who is causing their combined effect? It is certainly out of the scope I envisioned.

Deletei really like your post. did you include the goals of the current matchday (the goals you want to explain) in your independent variables or did you exclude them?

ReplyDeleteAt this point I have included all data of season 2011/2012. I am looking at understanding the data, what explains the number of goals? The predictions shown I would not call predictions, rather illustrations of the model and output.

DeleteObviously it is also interesting to add the current season and predict coming matches. That is something I need to make.

i tried it in the german context. the models are a lot poorer . ordinal regression worked better than poison models. models on the game level worked also better than models on gameXteam level. i can predict about 30 % of the results (not the goals just the direction) which is not that good. bokies are better ;-)

Delete