Markov Chain Monte Carlo and the AUDL Project: Reflections
Published:
For my mathematics senior capstone project, I examine the robustness of a Poisson Goal-scoring model proposed by Everson and Goldsmith-Pinkham (2008). My expository capstone paper is found here. The purpose of this paper is to highlight the goal-scoring model, simplify it, and apply it to a similar scenario. In this blog post, I go back and examine the failure of prediction in my simplified Markov Chain Monte Carlo (MCMC) algorithm. I will go look at two division in particular: the Midwest and West divisions.
Last Updated: 06/07/2018
Small edit to paper: the difference in games played is from teams making the playoffs vs not making the playoffs (described below). The AUDL 2017 final standings and stats include playoffs + regular season stats.
The Problem with my MCMC Model
Today, I took a look back at the Markov Chain Monte Carlo (MCMC) I made for my mathematics capstone project in college (Math 97). By looking at the final results, there were clearly things that went wrong, such as horrible teams somehow finding ways to be the predicted first place winners of the division. It may be very likely that my MCMC is implemented incorrectly, but for a second, let’s assume that I did it correctly.1
As a brief synopsis, the model I used on the AUDL data is a MCMC chain between a gamma distributed prior which describes offensive and defensive strength and a poisson likelihood function which describes the goal-scoring process. As a result, we get a gamma posterior distribution. We start with initial offensive and defensive strength parameters as well as data collected from the AUDL 2017 season on goals scored (offense) and goals allowed (defense). As priors, we set an uninformative alpha-parameter, which means there is no concentration around the mean. The prior mean, mu, is set as the mean of total points.
Now we come to the question of why the model failed for certain divisions. I will take a closer look at comparing the actual results of the midwest and west divisions to the predicted results of those divisions.
Figure 1: Offensive and defensive strength of teams in the midwest and west divisions compared to their actual results.
West
First, we look at the point differentials. The point differentials were all pretty close together for the top 4 teams, so that would’ve made it hard for the algorithm to properly predict the exact ranks of each team. In other words, the top 4 teams seem to be evenly matched across their games played: scoring between 346-421 points and allowing 324-404 points. These ranges are drastic due to the differences in numbers of games played, which we account for in the algorithm. However, even though we account for this, the defensive strengths are still not quite right. According to the algorithm, Vancouver Riptide has a better defensive strength than that of San Fransisco FlameThrowers, despite being the worst team in the division. This is because Vancouver played fewer games than San Fransisco (3 games less). If they played the same number of games, then I would imagine Vancouver’s pts against would be way higher than San Fransisco’s pts against, thus worsening the defensive strength of Vancouver.
This brings me to my next point: games played. Because the AUDL only reports aggregate statistics for each team, they have also included the playoff numbers into the final standings. This is especially annoying since we have no way of checking purely the regular season stats, where games played would be consistent. In other words, there is no way of controlling the scenario: regular season play tends to be less intense than post season play. Looking purely from a W-L perspective in the regular season (out of 14 games), the western division is quite evenly matched among its top 4: SF (10-4), LA (9-5), San Jose (8-6), SEA (7-7). Thus, if teams are evenly matched, it makes sense that the algorithm is unable to exactly predict the actual ranks of the top 4 teams, especially when one of our prior parameters are uninformative. In other words, some of the placing noise is due to random chance of the draws.
Midwest
I’m going to be very honest: I don’t know what happened in the midwest division. For example why did Madison Radicals and Pittsburgh Thunderbirds place so poorly? How did the Chicago Wildfire, the second worst team in the division, project to be first place? I want to attribute the errors to lucky draws, but there might be other problems.2
The first thing that jumps out is Pts Against, specifically that of the worse teams. Once again, these values are distorted due the differences in games played. This will impact the defensive strengths which decreases the strength. I’d imagine that these estimates would be improved or less egregious when we only use regular season data.
From a W-L perspective, it is peculiar that the clear top 3 teams (Madison, Pittsburgh, and Minnesota) did not get lumped together. The algorithm did not take into account win-loss, but we can also clearly see the top 3 teams using PD, which is included in the algorithm. This division is unique in terms of its predictions, so it is definitely worthwhile to look further into.
To “troubleshoot” the algorithm, we could try the following:
Run the algorithm for within the midwest division.
Set a better prior alpha-parameter to describe the scoring variations of the division (or the league).
Conclusion
In conclusion, my algorithm might not be completely wrong. There are definitely instances where the MCMC algorithm properly predicts the actual rank, as in the East and South divisions. The West division was also fairly accurate in terms of properly predicting what the top four teams were. The problems or faults came about in the Midwest division, where the best three teams were neither clustered together nor properly ranked (except for Minnesota).
The solutions would be as follows: set a better prior alpha-parameter and dig deeper to find regular season AUDL data in order to control for games played and potentially pace of games. An added potential solution would be to run the algorithm within each division in order to control for the strength of the division. This is because some divisions are more evenly matched than others.
I will admit that the model I used is too simple. It only accounts for points scored and points against. Ways to modify the model are to include pass completion percentages, time of possession, percent chance of scoring (Points For/ Number of Possessions), and number of blocks vs unforced errors. All these factors modify the offensive and defensive strengths of a team.
Last remarks
I don’t know if I’ll look into this problem further3, but it was definitely useful to understand the theoretical aspects of MCMC and attempt to implement it, even though it might not be correct. To counter my own skepticism, the prior and posterior distributions should have been solved correctly (it’s a pretty common conjugate prior/posterior problem) and the posterior distribution of my estimates check out to follow a gamma distribution.
1 The methodology is described in the paper linked above.
2 The algorithm nailed predicting the Minnesota Wind Chill actual rank.
3 Forgive me, but I’m quite lazy.
