It's football season again, hooray! Every year for my friends' football pool I try out a different algorithm. Invariably, my picks are around 60% accurate. Not terrible, but according to NFL Pickwatch (archive, current season), the best pickers get to 68 or 69%. So, an amazing performance—my upper bound—is just under 70%, and the lower bound for a competitive model—the FiveThirtyEight baseline—is 60%.

I've been modeling NFL outcomes for a couple of years, and running linear (predicting point spread) and logistic (predicting win probability) regressions given various team and player data. My best year so far incorporated the Vegas spread into the model, and my biggest disaster so far was an aggressive lasso model on every player in every offensive line, with team defenses lumped as a group. Attempting to track injuries, suspensions, and other changes to the starting lineup was not sustainable for the amount of time I wanted to spend.

Enter Nate Silver's awesome NFL Elo rankings, the aspirational target for this year. What's impressive is that he gets something like 60% accuracy out of literally no information but home field advantage and past scores. I particularly love that it updates weekly to incorporate the new information—this immediately says "Bayesian" and in fact is a lot how people using their intuition are making their picks anyway. A system like his—but with a more straightforward Bayesian model—is the goal of this post.

Bayesian updating for the (NFL) win 🏈¶

Silver's Elo model maintains a running metric of team ability, $Elo_{tm}$, that updates after every game. Here is his formula and source code. The number is used to estimate win probability in almost the same way as you'd use the result from a logistic regression:

$$ Pr(\text{team A wins}) = \frac{1}{10^{-(Elo_A - Elo_B)/400} + 1} $$

Copy Silver's approach, but use a different model¶

The FiveThirtyEight Elo approach is intuitive and simple:

Start with a baseline value for team skill
Update the Elo numbers every week
At the end of each season revert the values back to the mean

This post will do the same thing, but will be updating hyperparameters of known distributions rather than the Elo parameters FiveThirtyEight used. I want to only select from distributions that have analytic solutions. This table on Wikipedia lists those in the exponential family, with columns for the likelihood, its conjugate prior, and their posterior distribution. The two models selected for this blog post are:

A Bernoulli posterior to predict win probability
⇒ A running tally of wins vs. losses for each team is all you need.
A Student's T posterior to predict point spread
⇒ A running game count, mean spread, and variance are all for this model. The theory depends on every future outcome being independent of previous outcomes, which is true so we're fine.

Data¶

FiveThirtyEight's GitHub repository contains a cleaned dataset of historical scores that date back to 1920. We can use to develop the algorithm. The column named "neutral" has two values: "neutral == 1" means the game is on neutral ground (no home field advantage), and "neutral == 0" means the home field advantage goes to "team1". The rest of the column names are straightforward. The goal is to make an online-updating model like FiveThirtyEight's Elo using only prior scores, team, and home field advantage.

Initial setup¶

This blog post is an executable Jupyter notebook. You can reproduce everything by re-running the original notebook. This first cell does the standard imports, then pulls the data from FiveThirtyEight.

%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd  # at least 0.19.2 to read_csv from url
import seaborn as sns
import scipy.stats as stats
from collections import OrderedDict


# Seaborn setup
sns.set_context("notebook")
sns.set_style("white")

url = "https://github.com/fivethirtyeight/nfl-elo-game/raw/master/data/nfl_games.csv"
df = pd.read_csv(url)
print(df.shape)
df.head(5)

(16007, 12)

Explore the data¶

See what's in the dataset. There are over 100 football teams total, but if we cut off at 1980, we get the familiar 32 teams, so that's fine. Also, the FiveThirtyEight folks have made it so teams that changed cities are always named whatever they're called now. Very nice.

For those skeptical about the point spread being normally distributed, you're kind of right, but it's still good enough for me. The plot below contains histograms of each team's point spread at home since 1980. Each bar is 7 points wide, and a normal distribution is superimposed with a dashed line. Some observations:

The Normal distribution isn't terrible for most teams. ⇒ Modeling point spread with a T distribution won't be horrible.
A disproportionate number of games end with spreads that are within 1 touchdown (±7 points). ⇒ Coaching/performance under pressure?
- The Browns (CLE), Lions (DET), Chargers (LAC), Rams (LAR), and Jets (NYJ) often lose by 1 touchdown or less.
- The Cardinals (ARI), Broncos (DEN), Dolphins (MIA), Patriots (NE), and Seahawks (SEA) often win by 1 touchdown or less.

# use toggle

# First create the "spread" column
df['spread'] = df.score1 - df.score2
print("Total teams in the dataset:", len(df.team1.unique()))
print("Total teams after 1980:", len(df[df.season > 1980].team1.unique()))


def color_coded_hist(x, **kwargs):
    """Color the negative and positive bins differently.

    Color scheme from http://colorbrewer2.org/    
    """
    hist_bins = [-75, -42, -35, -28, -21, -14, -7, 0, 7, 14, 21, 28, 35, 42, 75]
    __, __, patches = plt.hist(x, density=True, bins=hist_bins, color="#f1a340")
    # Purple for positive spread:
    i = next(i for i, val in enumerate(hist_bins) if val == 0)
    for p in patches[i:]:
        p.set_facecolor("#998ec3")
        

def best_norm(x, **kwargs):
    """Plot the best normal fit for the data."""
    mu, std = stats.norm.fit(x)
    # Plot the PDF.
    xmin, xmax = min(x), max(x)
    x = np.linspace(xmin, xmax, 100)
    p = stats.norm.pdf(x, mu, std)
    plt.plot(x, p, 'k--', alpha=0.6)


# Plot all of the teams' spreads since 1980
g = sns.FacetGrid(df[df.season > 1980].sort_values('team1'), col="team1", col_wrap=5, height=2)
g = (g.map(color_coded_hist, "spread")
     .map(best_norm, "spread")
     .set_titles("{col_name}")
     .set_axis_labels("spread", "density"))
msg = "Histograms of spread (at home) since 1980. Normal approximation is dashed line."
plt.suptitle(msg, y=1.025, fontsize=14);

Total teams in the dataset: 101
Total teams after 1980: 32

Home field advantage¶

Nate Silver's model bumps the team's Elo rating by a constant to account for home field advantage. Great idea. I'm copying it.

To avoid using unknown future information, the model home field advantage value will be the average from the past three seasons, rather than the average over the entire dataset, saved in the data frame rolling_avg. The plots below show spread on the left and win percent on the right, year over year. The approximation for home advantage is a dotted black line. The central tendency for the data is a solid blue line, and the 95% confidence interval for each season is the light blue region. Other observations about home field advantage:

Starting around the 1970's the overall point spread tightens: home field advantage starts at 10 points on average in 1920 and settles down to 2 to 3 points per game, give or take a couple of points.
This translates to about a 16% greater chance of winning at home (58% at home versus 42% away).
Approximating home field advantage via the average of the past three years (black dotted line) looks fine: it closely tracks the actual median and is always within the confidence interval. Good enough for me.

# Ignore a future warning that comes from scipy.
import warnings
warnings.filterwarnings("ignore")


# Show some of the values
print("~ Overall ~")
print(df[df.neutral == 0][['spread', 'result1']].mean())
print("\n~ Past few seasons ~")
print(df[(df.neutral == 0) & (df.season > 2011)].groupby('season')[['spread', 'result1']].mean())


# Create a data frame with lagged rolling means and combine with the existing data
rolling_avg = (
    df.groupby('season')[['spread', 'result1']].mean()
    .rolling(3, min_periods=1)
    .mean()
    .shift(1)
)
rolling_avg['win_pct_advantage'] =  rolling_avg.result1 - .5
rolling_avg['spread_advantage'] = rolling_avg.spread
rolling_avg = rolling_avg.drop(columns=['result1', 'spread'])


# Plot everything
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (14.0, 4.0)
f, ax = plt.subplots(1, 2)
sns.lineplot(x="season", y="spread", data=df[df.neutral == 0], ax=ax[0])
ax[0].plot(rolling_avg.index, rolling_avg.spread_advantage, 'k:')
ax[0].set_ylabel("home spread")

sns.lineplot(x="season", y="result1", data=df[df.neutral == 0], ax=ax[1])
ax[1].plot(rolling_avg.index, rolling_avg.win_pct_advantage + .5, 'k:')
ax[1].set_ylabel("home win percent")
plt.suptitle(
    'Central tendency and confidence interval of spread (left), '
    'win percent (right) by season — lagged moving average is dotted line',
    fontsize=14);

~ Overall ~
spread     2.976579
result1    0.580278
dtype: float64

~ Past few seasons ~
          spread   result1
season                    
2012    2.615094  0.575472
2013    3.174242  0.600379
2014    2.851711  0.583650
2015    1.475285  0.539924
2016    3.057252  0.585878
2017    2.639847  0.574713

Convert from "by-game" to "by-team" format¶

The original FiveThirtyEight dataset has one row per game ("by-game" format), with the home team labelled as "team1." If I want to see all of the games for a single team progress I have to bounce between the "team1" and "team2" columns. Reshape the dataset to "by-team" format, meaning there are two rows per game, adding an extra column "home" to identify the home team. This way I can group by "season" and "team1" and have all of the games for each team. The new dataset will be called by_team.

by-game format: one row per game
season	date	team1	team2	result1
1985	1985-09-15	CHI	NE	1
1985	1985-09-19	MIN	CHI	0

⇨

by-team format: one row per team per game
season	date	home	team1	team2	result1
1985	1985-09-15	1	CHI	NE	1
1985	1985-09-19	0	CHI	MIN	1
1985	1985-09-15	0	NE	CHI	0
1985	1985-09-19	1	MIN	CHI	0

# use toggle
home_games = df[[
    'date', 'season', 'neutral', 'playoff', 'team1', 'team2', 'result1', 'spread'
]][df.season > 1975]
home_games['home'] = 1

# Now swap the teams for "away"
away_games = home_games.rename(columns={'team1':'team2', 'team2':'team1'})
away_games['home'] = 1 - home_games.home
# Remember to switch the meaning of the winning and spread columns too
away_games['result1'] = 1 - home_games.result1
away_games['spread'] = - home_games.spread

by_team = (
    pd.concat([home_games, away_games], ignore_index=True)
    .sort_values(by=['season', 'team1', 'date'])
)

Set up for iterative updating¶

Both the models will follow the same procedure:

Initialize the model with a baseline:
1. Bernoulli win probability = 50%
2. T-distribution spread = 0
After every game, update the hyperparameters with the new information as described in their respective formulas
Use the hyperparameters to calculate the posterior predictive distribution

Make a base class that will keep track of all of the hyperparameters for each team and each game. The parameters can be added as columns to the main dataset for backtesting. In practice, the Updater.lookup attribute would be accessing a database, not a dictionary, and the update step would be run weekly to increment the hyperparameters based on the game results. But for this blog post, it will remain a dictionary.

# use toggle
from abc import ABC, abstractmethod
from collections import namedtuple

class Updater(ABC):
    def __init__(self, *hyperparameters):
        # Form of lookup:
        # {season: {team: [{week1 data}, {week2 data}, ... {weekN data}]}}
        self.lookup = {}
        self.Params = namedtuple('Params', ['date'] + list(hyperparameters))
        
    def iterrows(self):
        for season, teams in self.lookup.items():
            for team, results in teams.items():
                for row in results:
                    yield dict(season=season, team1=team, **row._asdict())
        
    def get_rows(self):
        return [r for r in self.iterrows()]
    
    @abstractmethod
    def revert_to_mean(self, season, team, keep=.3, n_obs=8):
        pass
    
    @abstractmethod
    def update(self, row):
        pass

Bernoulli distribution¶

The probability of winning is equal to:

$$ \text{Pr}(\text{win}) = \frac{\alpha}{\alpha + \beta} \pm \text{home advantage} $$

The hyperparameters for the Bernoulli distribution are:

$$ \begin{aligned} \alpha &= 1 + \text{number of wins} \\ \beta &= 1 + \text{number of losses} \end{aligned} $$

Updates are done by this rule:

A win increments $\alpha$ by 1
A loss increments $\beta$ by 1
A tie increments both $\alpha$ and $\beta$ by 0.5

The default initial value sets $\alpha = \beta$, where you can choose how many "observations" (n_obs) to pretend we have already seen. If n_obs is a large number, then the incremental effect of new games won't be as pronounced. If n_obs is a small number, then the new season's data will quickly overwhelm the past performance. I tuned the parameters for good performance:

Weight $\alpha$ and $\beta$ at the beginning of each year as if they count for 4 games.
Revert $\alpha$ and $\beta$ back to the mean ($\alpha = \beta$) keeping 80% of last year's final value.

class BernoulliUpdater(Updater):
    def __init__(self):
        super().__init__('alpha1', 'beta1')
        
    def revert_to_mean(self, season, team, keep=.8, n_obs=4):
        # default
        alpha = beta = 1 + n_obs * .5
        # or use existing data        
        if season in self.lookup and team in self.lookup[season]:
            last_entry = self.lookup[season][team].pop()
            date, alpha0, beta0 = last_entry
            p = alpha0 / (alpha0 + beta0)
            alpha = 1 + n_obs * (keep * p + (1 - keep) * .5)
            beta = 1 + n_obs * (keep * (1 - p) + (1 - keep) * .5)
            # push back the reverted value to the list
            self.lookup[season][team].append(self.Params(date, alpha, beta))
        return alpha, beta
            
    def update(self, row):
        if row.season not in self.lookup:
            self.lookup[row.season] = {}
        if row.team1 not in self.lookup[row.season]:
            self.lookup[row.season][row.team1] = []
            alpha, beta = self.revert_to_mean(row.season - 1, row.team1)
        else:
            __, alpha, beta = self.lookup[row.season][row.team1][-1]
        # THE UPDATE STEP:
        # a' = a + 1 if win else 0
        # b' = b + 1 if lose
        if row.result1 == 1:  # Won
            alpha_beta_next = self.Params(row.date, alpha + 1, beta)
        elif row.result1 == 0.5:  # Tie
            alpha_beta_next = self.Params(row.date, alpha + .5, beta + .5)
        else:  # Lost
            alpha_beta_next = self.Params(row.date, alpha, beta + 1)
        self.lookup[row.season][row.team1].append(alpha_beta_next)
        return alpha, beta


bernoulli_updater = BernoulliUpdater()
for i, row in by_team.iterrows():
    bernoulli_updater.update(row)

ab = pd.DataFrame(bernoulli_updater.get_rows()).sort_values(['team1','season'])
g = ab.groupby('team1')
ab = ab.assign(alpha1 = g.alpha1.shift(), beta1=g.beta1.shift())

bernoulli_dataset = (
    by_team[[c for c in by_team.columns if c != 'spread']]
    .merge(ab, on=['season', 'date', 'team1'])
    .reindex(columns=[
        'season', 'date', 'home', 'neutral', 'playoff',
        'team1', 'team2', 'result1', 'alpha1', 'beta1'])
)

print("\n\n☆ Da Bears! ☆")
bernoulli_dataset[(bernoulli_dataset.season == 1985) & (bernoulli_dataset.team1 == 'CHI')]


☆ Da Bears! ☆

Convert back from "by-team" to "by-game" format¶

We now have to convert back to the original layout of one row per game. The goal is to combine each team's performance to predict the outcome of the game: team1's hyperparameters $\alpha_1$ and $\beta_1$ are combined with team2's hyperparameters $\beta_1$ and $\beta_2$, so that overall the formula for win probability will be:

$$ \text{Pr}(\text{team}_1 \text{ wins}) = \frac{\alpha}{\alpha + \beta} + \text{team}_1\text{ home advantage} $$

in which $\alpha = \left(\alpha_1 + \beta_2 - 1\right)$ and $ \beta = \left(\beta_1 + \alpha_2 - 1\right)$.

# use toggle
b = (
    bernoulli_dataset[['season', 'date', 'team1', 'alpha1', 'beta1']]
    .rename(columns=dict(team1='team2', alpha1='alpha2', beta1='beta2'))
    .merge(bernoulli_dataset, on=['season', 'date', 'team2'])
    .join(
        rolling_avg[['win_pct_advantage']]
        .rename(columns={'win_pct_advantage':'home_advantage'})
        , on='season')
)

b = (
    b.assign(
        pwin = 
        (b.alpha1 + b.beta2 - 1) / (b.alpha1 + b.beta1 + b.alpha2 + b.beta2 - 2)
        # if at home and not neutral add home advantage
        + b.home * (1 - b.neutral) * b.home_advantage
        # if away and not neutral subtract home advantage
        - (1 - b.home) * (1 - b.neutral) * b.home_advantage
        ,
        success = lambda row:  row.pwin.round() == row.result1
    )
    .reindex(columns=(
        list(bernoulli_dataset.columns)
        + ['alpha2', 'beta2', 'home_advantage', 'pwin', 'success']
    ))
)

print(b.success.mean())
b.tail()

0.6327005036807439

Check model performance¶

One way to visualize the performance of a binary classification model is with a Receiver operating characteristic (ROC) plot (Wikipedia page). The total area under the curve (AUC) can be used as a measure of model performance, with larger AUC identifying better models.

No effect

Mild effect

Amazing

Our best model with the this Bernoulli posterior has about 63% accuracy, and an area under the ROC curve of about 0.68. This is as good as my logistic regression models from prior years that incldued moving averages of penalties, turnovers, and passing and rushing yards. Cool.

# use toggle
def plot_roc(predicted, actual, resolution=100, ax=None):
    """'predicted' and 'actual' are pandas Series."""
    ax = ax or plt.gca()
    cutoff = np.linspace(0, 1, resolution)
    total_pos = (actual == 1).sum()
    total_neg = (actual != 1).sum()
    true_positive_rate = np.fromiter(
        map(lambda c: (actual[predicted > c] == 1).sum() / total_pos, cutoff),
        float)
    false_positive_rate = np.fromiter(
        map(lambda c: (actual[predicted > c] != 1).sum() / total_neg, cutoff),
        float)
    ax.plot(
        false_positive_rate, true_positive_rate,
        linestyle='-', color=sns.color_palette()[0], linewidth=3)
    ax.set_xlim([0,1])
    ax.set_ylim([0,1])
    ax.plot([0,1], [0,1], 'k:')
    # Area under the curve
    auc = sum((true_positive_rate[:-1] + true_positive_rate[1:]) / 2
              * (false_positive_rate[:-1] - false_positive_rate[1:]))
    ax.set_title('ROC curve. AUC = {:0.3f}'.format(auc), fontsize=14);


## Start the actual plot
plt.rcParams['figure.figsize'] = (15.0, 3.0)
f, ax = plt.subplots(1, 3)

summary = b.groupby(['team1', 'season'], as_index=False).success.mean()

# Histogram
sns.distplot(summary.success, ax=ax[0], bins=np.linspace(0, 1, 11))
ax[0].axvline(0.5, color='k', linestyle=':')
ax[0].set_ylabel("frequency count")
ax[0].set_title('Model accuracy (grouped by team, season)', fontsize=14)

# Time series
sns.lineplot(x="season", y="success", data=summary, ax=ax[1])
ax[1].set_ylabel("Model success rate")
ax[1].set_title('Accuracy year over year (mean {:0.0%})'.format(b.success.mean()), fontsize=14)

# ROC
plot_roc(b.pwin, b.result1, resolution=100, ax=ax[2])

T distribution¶

This section is based on the content of this table on the Wikipedia page for conjugate priors. The spread follows a T distribution with parameters:

$$ \text{Pr}(\text{spread}) = t_{2\alpha}\left(\tilde{\text{spread}} | \mu, \sigma^2 = \frac{\beta (\nu + 1)}{\alpha\nu}\right) $$

The hyperparameters for the T distribution are:

$$ \begin{aligned} \nu &= \text{number of games} \\ \mu &= \text{running mean of spread} \\ \alpha &= \nu / 2 \\ \beta &= (\text{running sum of the squared deviations}) / 2 \end{aligned} $$

Updates are done by these rules:

Increment $\nu$ each game, $\nu' = \nu + 1$
Intermediate variable $\delta = x - \mu$
The running mean is $\mu' = \mu + \delta / \nu'$
The running sum of squared deviations is $2\beta = 2\beta + \delta * \mu'$

The default initial value sets $\mu = 0$, and the number of observations n_obs to 3. The parameters for mean reversion were tuned for good performance and are:

Set $\nu$ to 3 games.
Revert $\mu$ back to zero keeping 50% of last year's final value.
Revert $\beta$ back to the mean $\beta$ of all teams in the prior year, scaled down to 3 games. Keep 50% of last year's final value.

class TUpdater(Updater):
    def __init__(self):
        super().__init__('nu1', 'mu1', 'alpha1', 'beta1')
        
    def get_mean_beta(self, season):
        mean_beta = 16**2 / 2  # Default
        if season in self.lookup:
            team_sets = self.lookup[season].values()
            mean_beta = (
                sum(ts[-1].beta1 for ts in team_sets)
                / sum(ts[-1].nu1 for ts in team_sets))
        return mean_beta
        
    def revert_to_mean(self, season, team, keep=.5, n_obs=3):
        mean_beta = self.get_mean_beta(season - 1)  # Default
        nu, mu, alpha, beta = n_obs, 0, n_obs / 2, mean_beta * n_obs
        # or use existing data
        if season in self.lookup and team in self.lookup[season]:
            last_entry = self.lookup[season][team].pop()
            date, nu0, mu0, alpha0, beta0 = last_entry
            mu = keep * mu0
            beta = nu * (keep * beta0 / nu0 + (1 - keep) * mean_beta)
            # push back the reverted value to the list
            self.lookup[season][team].append(self.Params(date, nu, mu, alpha, beta))
        return nu, mu, alpha, beta
            
    def update(self, row):
        if row.season not in self.lookup:
            self.lookup[row.season] = {}
        if row.team1 not in self.lookup[row.season]:
            self.lookup[row.season][row.team1] = []
            nu, mu, alpha, beta = self.revert_to_mean(row.season - 1, row.team1)
        else:
            __, nu, mu, alpha, beta = self.lookup[row.season][row.team1][-1]
        # THE UPDATE STEP:
        delta = row.spread - mu
        nu_mu_alpha_beta_next = self.Params(
            row.date,
            nu + 1,                       # nu' = nu + 1
            mu + delta / (nu + 1),        # mu' = mu + delta / (nu + 1)
            alpha + .5,                   # alpha' = alpha + 1/2
            beta + delta * (mu + delta / (nu + 1)) / 2
                                          # beta' = beta + delta * mu' / 2
        )
        self.lookup[row.season][row.team1].append(nu_mu_alpha_beta_next)
        return nu, mu, alpha, beta


t_updater = TUpdater()
for i, row in by_team.iterrows():
    t_updater.update(row)

nmab = pd.DataFrame(t_updater.get_rows()).sort_values(['team1','season'])
g = nmab.groupby('team1')
nmab = nmab.assign(
    nu1 = g.nu1.shift(),
    mu1 = g.mu1.shift(),
    alpha1 = g.alpha1.shift(),
    beta1=g.beta1.shift())

t_dataset = (
    by_team[[c for c in by_team.columns if c != 'result1']]
    .merge(nmab, on=['season', 'date', 'team1'])
    .reindex(columns=[
        'season', 'date', 'home', 'neutral', 'playoff',
        'team1', 'team2', 'spread', 'nu1', 'mu1', 'alpha1', 'beta1'])
)

print("\n\n☆ Da Bears! ☆")
t_dataset[(t_dataset.season == 1985) & (t_dataset.team1 == 'CHI')]


☆ Da Bears! ☆

Convert back from "by-team" to "by-game" format¶

We will again convert back to the original layout of one row per game. Combining the two teams' hyperparameters is a little more complicated than for the Bernoulli distribution. The distribution of the spread will be:

$$ \text{Pr}(\text{team}_1 \text{ spread}) = t_{2\alpha}\left(\tilde{\text{spread}} | \mu, \sigma^2 = \frac{\beta (\nu + 1)}{\alpha\nu}\right) $$

in which

$$ \begin{aligned} \nu &= \nu_1 + \nu_2 \\ \alpha &= \alpha_1 + \alpha_2 \\ \mu &= \frac{\nu_1 \mu_1 - \nu_2 \mu_2}{\nu_1 + \nu_2} \\ \beta &= \beta_1 + \beta_2 + \frac{\nu_1 \nu_2}{\nu_1 + \nu_2} \frac{(\mu_1 + \mu_2)^2}{2} \end{aligned} $$

t = (
    t_dataset[['season', 'date', 'team1', 'nu1', 'mu1', 'alpha1', 'beta1']]
    .rename(columns=dict(
        team1='team2', nu1='nu2', mu1='mu2', alpha1='alpha2', beta1='beta2'))
    .merge(t_dataset, on=['season', 'date', 'team2'])
    .join(
        rolling_avg[['spread_advantage']]
        .rename(columns={'spread_advantage':'home_advantage'})
        , on='season')
)

t = (
    t.assign(
        pspread =
            (t.nu1 * t.mu1 - t.nu2 * t.mu2) / (t.nu1 + t.nu2)
            # if at home and not neutral add home advantage
            + t.home * (1 - t.neutral) * t.home_advantage
            # if away and not neutral subtract home advantage
            - (1 - t.home) * (1 - t.neutral) * t.home_advantage
        ,
        betaprime =
            t.beta1 + t.beta2
            + (t.nu1 * t.nu2) / (t.nu1 + t.nu2)
            * (t.mu1 + t.mu2)**2 / 2
        ,
        pwin = (
            lambda row: 1 - stats.t.cdf(
                0,
                row.nu1 + row.nu2,
                loc=row.pspread,
                scale=(
                    row.betaprime
                    * (row.nu1 + row.nu2 + 1)
                    / (row.nu1 + row.nu2) / (row.alpha1 + row.alpha2)
                )))
        ,
        success = lambda row: row.pwin.round() == (row.spread > 0)
    )
    .reindex(columns=(
        list(t_dataset.columns)
        + ['nu2', 'mu2', 'alpha2', 'beta2', 'home_advantage', 'pspread', 'pwin', 'success']
    ))
)

print(t.success.mean())
print(t.shape)
t.tail()

0.636139093374661
(20648, 20)

Check model performance¶

So, the model's performance at predicting spread is pretty crappy. But, using a probit link to connect the T distribution to a win/loss prediction, we get performance similar to the Bernoulli distribution, with an AUC of 0.67 under the ROC plot. Not much improvement for all that extra complication.

# use toggle
ss_res = ((t.spread - t.pspread)**2).sum()
ss_tot = ((t.spread - t.spread.mean())**2).sum()
r_squared = 1 -  ss_res/ ss_tot
sns.jointplot("spread", "pspread", data=t, kind="hex", space=0, color="b", ratio=4)
title = "Actual spread vs. mean of  distribution. R squared= {:0.0%}".format(r_squared)
plt.suptitle(title, x=.45, y=1.01, fontsize=14);

# use toggle
plt.rcParams['figure.figsize'] = (15.0, 3.0)
f, ax = plt.subplots(1, 3)

summary = t.groupby(['team1', 'season'], as_index=False).success.mean()

# Histogram
sns.distplot(summary.success, ax=ax[0], bins=np.linspace(0, 1, 11))
ax[0].axvline(0.5, color='k', linestyle=':')
ax[0].set_ylabel("frequency count")
ax[0].set_title('Model accuracy (grouped by team, season)', fontsize=14)

# Time series
sns.lineplot(x="season", y="success", data=summary, ax=ax[1])
ax[1].set_ylabel("Model success rate")
ax[1].set_title('Accuracy year over year (mean {:0.0%})'.format(t.success.mean()), fontsize=14)

# ROC
plot_roc(t.pwin, t.spread > 0, resolution=100, ax=ax[2])

Conclusion¶

It is possible to use nothing but home advantage and per-team historical win/loss data to achieve better than 60% predictive accuracy (on average) year over year for NFL wins. Further, it can be done using an analytical Bayesian approach, at a computational cost of a handful of arithmetic operations per team per week.

It is worth your time to look at simple Bayesian solutions to predictive problems.
Not everything needs numerical methods, people have been solving hard statistical problems for centuries without computers.
Even if you end up using other methods, the hyperparameters from the Bayesian method can be part of your feature engineering pipeline.
- Logit / probit link for a regression that adds injuries and rest between games?
- Computational cost is nearly nothing.
The '85 Bears are the best football team ever to have existed.★ 🐻★

You can track the Bernoulli model's performance under my name in this year's FiveThirtyEight forecasting game leaderboard…or join the fun, and enter the game yourself!

Python libraries used in this notebook¶

The dataset is from the FiveThirtyEight GitHub repo:

Repo page: https://github.com/fivethirtyeight/nfl-elo-game
Raw data direct link: https://github.com/fivethirtyeight/nfl-elo-game/raw/master/data/nfl_games.csv

The rest of the dependencies are regular Python packages, listed below:

import sys, platform, IPython, pandas as pd, numpy as np, seaborn as sns
print("This notebook was created on an %s computer running OSX %s and using:\nPython %s\nIPython %s\nNumPy %s\nPandas %s\nSeaborn %s\n" % (platform.machine(), platform.mac_ver()[0], sys.version[:5], IPython.__version__, np.__version__, pd.__version__, sns.__version__))

This notebook was created on an x86_64 computer running OSX 10.13.6 and using:
Python 3.6.6
IPython 6.5.0
NumPy 1.15.1
Pandas 0.23.4
Seaborn 0.9.0

Vocabulary¶

hyperparameter
The parameters for the conjugate prior are called "prior hyperparameters," and the parameters for the posterior distribution are called "posterior hyperparameters." The only thing called "model parameters" are the parameters of the likelihood. The word "hyperparameter" is used specifically to distinguish between the "model parameters" and everything else. Here is a worked example showing hyperparameters for the Bernoulli distribution.
posterior
In the context of Bayesian probability theory, the posterior is the distribution that is found a posteriori, meaning after you combine your prior knowledge and the likelihood of the observed data using Bayes's theorem: $p(\theta | x) = \frac{p(x|\theta) p(\theta)}{p(x)}$. Wikipedia definition of posterior.
recursive Bayesian estimation
I didn't mention this in the blog, but upon reading the entire Wikipedia page on conjugate priors, I found out there's a name for the online updating method I implemented: recursive Bayesian estimation. It is common in engineering—the Kalman filter is a Bayesian approach used in feedback control that estimates unknown parameters in a linear system. Here is the Wikipedia recursive Bayesian estimation page and the Kalman filter page.
support
In the context of a probability distribution, the support is the set of values for which there is a nonzero probability. For example, the support for a Bernoulli random variable is the set {0, 1}, which contains the only two possible outcomes.

References¶

Bernoulli distribution conjugate prior—Wikipedia's table of discrete conjugate priors. The table lists the parameters and how to update them.
Normal distribution conjugate prior—Also Wikipedia. For continuous distributions.
FiveThirtyEight's NFL Elo rankings, explained—Also sign up for their NFL forecasting game, and check out their GitHub repo.
NFL Pickwatch—The best forecasters' picks, aggregated in one site.
Pro-Football Reference—One of the easiest sites to use for football statistics, including a frequently-updated injury list.
Single-pass variance blog post (no https)—The first blog to actually cite Knuth while giving the algorithm gets my linkback.

All links in this post were accessed on or before September 9, 2018.

Appendix: major changes to NFL rules 🏈¶

I didn't want to mess up the models with games that are totally different thanks to rules changes, so I made a summary of major NFL rules changes from the official NFL operations site, with additional citations as noted.

1933:
- Quarterbacks can now throw from anywhere behind the line of scrimmage (previously at least 5 yards back), drastically increasing the prevalence of the passing attack
By 1940, to increase scoring for a "maximum of entertainment":
- No penalties for multiple incomplete passes in the same series of downs
- Hash marks are at 10 yards, then 15, then 20 yards from the sidelines (Wikipedia)
- 15-yard penalty for roughing the passer
- Ball shape is changed to be more pass-friendly
1972:
- Hash marks move from the sidelines to 70 feet, 9 inches (~23.5 yards) from the sidelines
1974
- Goal posts move from goal line to the end line
- All field goals missed beyond the 20-yard line are a turnover
- Kickoffs are at the 35-yard line (back from 40)
- Offense can't move on a punt until after the kick
- Penalty for holding, illegal use of hands, tripping is 10 yards (down from 15 which was almost unrecoverable)
- A receiver can only be chucked (legal contact? I can't find the definition) once by a given defender after the receiver has gone 3 yards downfield
1978:
- Illegal contact is now anything beyond 5 yards downfield
- Offensive linemen can extend arms and open hands on pass plays
1994:
- Multiple unlisted changes intended to increase scoring (which is at 2 TD/team/game at this point) and reduce injuries
2016:
- Field goal moves to 15 yd from 2 yd, plus multiple changes to 2 point conversion

Actually all of these rules are either for safety or to favor offense and increase scoring. Except the recent change to the distance for an extra point, which just messes up historical rankings of kickers. Since both teams have an offense, and all the data I'm using is win/loss or points, any advantage is a wash. In the end I didn't use this, but didn't want to delete all the typing.

	date	season	team1	team2	elo1	elo2	elo_prob1	score1	result1
0	1920-09-26	1920	RII	STP	1503.947	1300.000	0.824651	48	1.0
1	1920-10-03	1920	AKR	WHE	1503.420	1300.000	0.824212	43	1.0
2	1920-10-03	1920	RCH	ABU	1503.420	1300.000	0.824212	10	1.0
3	1920-10-03	1920	DAY	COL	1493.002	1504.908	0.575819	14	1.0
4	1920-10-03	1920	RII	MUN	1516.108	1478.004	0.644171	45	1.0

	season	date	home	neutral	playoff	team1	team2	result1	alpha1	beta1
3938	1985	1985-09-08	1	0	0	CHI	TB	1.0	3.259186	2.740814
3939	1985	1985-09-15	1	0	0	CHI	NE	1.0	4.259186	2.740814
3940	1985	1985-09-19	0	0	0	CHI	MIN	1.0	5.259186	2.740814
3941	1985	1985-09-29	1	0	0	CHI	WSH	1.0	6.259186	2.740814
3942	1985	1985-10-06	0	0	0	CHI	TB	1.0	7.259186	2.740814
3943	1985	1985-10-13	0	0	0	CHI	SF	1.0	8.259186	2.740814
3944	1985	1985-10-21	1	0	0	CHI	GB	1.0	9.259186	2.740814
3945	1985	1985-10-27	1	0	0	CHI	MIN	1.0	10.259186	2.740814
3946	1985	1985-11-03	0	0	0	CHI	GB	1.0	11.259186	2.740814
3947	1985	1985-11-10	1	0	0	CHI	DET	1.0	12.259186	2.740814
3948	1985	1985-11-17	0	0	0	CHI	DAL	1.0	13.259186	2.740814
3949	1985	1985-11-24	1	0	0	CHI	ATL	1.0	14.259186	2.740814
3950	1985	1985-12-02	0	0	0	CHI	MIA	0.0	15.259186	2.740814
3951	1985	1985-12-08	1	0	0	CHI	IND	1.0	15.259186	3.740814
3952	1985	1985-12-14	0	0	0	CHI	NYJ	1.0	16.259186	3.740814
3953	1985	1985-12-22	0	0	0	CHI	DET	1.0	17.259186	3.740814
3954	1985	1986-01-05	1	0	1	CHI	NYG	1.0	18.259186	3.740814
3955	1985	1986-01-12	1	0	1	CHI	LAR	1.0	19.259186	3.740814
3956	1985	1986-01-26	0	1	1	CHI	NE	1.0	20.259186	3.740814

	season	date	home	team1	team2	result1	alpha1	beta1	alpha2	beta2	home_advantage	pwin	success
20643	2017	2017-11-30	1	DAL	WSH	1.0	8.555930	8.444070	8.068991	8.931009	0.072409	0.587626	True
20644	2017	2017-12-10	1	LAC	WSH	1.0	8.482438	9.517562	8.068991	9.931009	0.072409	0.584570	True
20645	2017	2017-12-17	0	ARI	WSH	0.0	9.031599	9.968401	8.068991	10.931009	0.072409	0.454330	True
20646	2017	2017-12-24	0	DEN	WSH	0.0	8.258948	11.741052	9.068991	10.931009	0.072409	0.406274	True
20647	2017	2017-12-31	1	NYG	WSH	1.0	5.301124	15.698876	10.068991	10.931009	0.072409	0.453213	False

	season	date	home	neutral	playoff	team1	team2	spread	nu1	mu1	alpha1	beta1
3938	1985	1985-09-08	1	0	0	CHI	TB	10	3.0	1.390742	1.5	23.341826
3939	1985	1985-09-15	1	0	0	CHI	NE	13	4.0	3.543057	2.0	38.593370
3940	1985	1985-09-19	0	0	0	CHI	MIN	9	5.0	5.434445	2.5	64.289990
3941	1985	1985-09-29	1	0	0	CHI	WSH	35	6.0	6.028704	3.0	75.037828
3942	1985	1985-10-06	0	0	0	CHI	TB	8	7.0	10.167461	3.5	222.320086
3943	1985	1985-10-13	0	0	0	CHI	SF	16	8.0	9.896528	4.0	211.594917
3944	1985	1985-10-21	1	0	0	CHI	GB	16	9.0	10.574692	4.5	243.866083
3945	1985	1985-10-27	1	0	0	CHI	MIN	18	10.0	11.117223	5.0	274.023262
3946	1985	1985-11-03	0	0	0	CHI	GB	6	11.0	11.742930	5.5	314.435248
3947	1985	1985-11-10	1	0	0	CHI	DET	21	12.0	11.264352	6.0	282.090056
3948	1985	1985-11-17	0	0	0	CHI	DAL	44	13.0	12.013248	6.5	340.568433
3949	1985	1985-11-24	1	0	0	CHI	ATL	36	14.0	14.298016	7.0	569.241980
3950	1985	1985-12-02	0	0	0	CHI	MIA	-14	15.0	15.744815	7.5	740.088842
3951	1985	1985-12-08	1	0	0	CHI	IND	7	16.0	13.885764	8.0	533.574098
3952	1985	1985-12-14	0	0	0	CHI	NYJ	13	17.0	13.480719	8.5	487.161572
3953	1985	1985-12-22	0	0	0	CHI	DET	20	18.0	13.454013	9.0	483.927771
3954	1985	1986-01-05	1	0	1	CHI	NYG	21	19.0	13.798538	9.5	529.090300
3955	1985	1986-01-12	1	0	1	CHI	LAR	24	20.0	14.158611	10.0	580.071649
3956	1985	1986-01-26	0	1	1	CHI	NE	36	21.0	14.627249	10.5	652.047869

	season	date	home	team1	team2	spread	nu1	mu1	alpha1	beta1	nu2	mu2	alpha2	beta2	home_advantage	pspread	pwin	success
20643	2017	2017-11-30	1	DAL	WSH	24	14.0	-1.009843	7.0	122.008579	14.0	-1.220737	7.0	58.007248	2.519351	2.624798	0.570663	True
20644	2017	2017-12-10	1	LAC	WSH	17	15.0	3.632639	7.5	102.490156	15.0	-2.739355	7.5	89.207489	2.519351	5.705348	0.663206	True
20645	2017	2017-12-17	0	ARI	WSH	-5	16.0	-5.046426	8.0	133.114383	16.0	-3.630645	8.0	115.095160	2.519351	-3.227241	0.463974	True
20646	2017	2017-12-24	0	DEN	WSH	-16	17.0	-4.156688	8.5	131.158625	17.0	-3.122960	8.5	101.618580	2.519351	-3.036215	0.456733	True
20647	2017	2017-12-31	1	NYG	WSH	8	18.0	-8.337091	9.0	261.273841	18.0	-2.060573	9.0	81.916448	2.519351	-0.618908	0.494824	False