I’d like to explain some of my recent football (soccer) predictions that have appeared on the @DoctorFootie Twitter account. First, what separates my model from other models is that possession forms the basis for all predictions. Why possession? Simply, because you can’t shoot without the ball, and if you don’t shoot you (usually) don’t score. With possession as a basis, we can predict other, more useful statistics.
Let’s take a look at how we can predict possession. I decided on a simple model from chemistry class– specifically, chemical equilibrium equations– to determine how strong a team might be at retaining possession. In chemistry class, we learn how a reaction can go forward or backward. Molecule A can turn into molecule B, and B can turn back into A. The same basic principle applies to football: Team A can give the ball up to team B, and B can give the ball back to A. Equation 1 shows how a team’s “strength” or “selfishness” of possession will be proportional to its k constant.
Eq. 1: k constants determine how strongly a team wants to hold possession.
Further using the concept of equilibria, it becomes apparent that the ratio of the k constants is going to determine who dominates possession.
Eq. 2: Ratio of k constants determines who gets more possession.
Equation 2 then lets us solve for two k constants for every team in the premier league: one home and one away. Within only a few weeks of possession data, we can solve for meaningful k values which minimize the error between predicted possession (Eq. 2) and actual match possession. This gives us 40 k constants, one home and one away for each team in the premier league. Here’s an example of how one week’s possession predictions worked out using this method.
Now that we have an effective method of predicting possession, we need to use correlations to know what a team might do with that possession. Let’s look at shots in particular. This isn’t necessarily trivial; a given team might play counter-attacking football when it lacks possession (higher shot per possession ratio) but take its time with the ball when it dominates possession (lower shot per possession ratio). So, individual teams’ shot-to-possession correlations need to be established for home and away game behavior.
These shot-to-possession correlations were obtained using linear regression and opponent’s ‘shots allowed per possession allowed’ as an opponent strength scaling variable. Using these correlations, the following shot predictions were obtained for gameweek 20.
As seen here, we can come to some useful statistic predictions using possession as a predictive basis. Shots on target and goals could be easily predicted with this method. Of course, there are limitations to all predictive methods, and those limitations come from the high variance of football results. The predictions presented here are an attempt to represent an “average” result.