The Authentic Games Metric

In Tuesday Morning Quarterback’s most recent column, he suggested picking post-season favorites based on what he calls Authentic Games:

Power rankings, strength-of-schedule, likes on Facebook — there are many ways to assess NFL teams. As the home stretch approaches, Tuesday Morning Quarterback makes his annual contribution: the Authentic Games metric.

Authentic Games are those against other potent teams. The regular season is a smorgasbord of strong and weak; in the postseason, only strong opponents trot onto the field. That makes how a team performs against equal-caliber opposition the gauge TMQ likes.

The Authentic metric values most W’s over best percentage. Thus I rank the Denver Broncos at 4-2 ahead of the Cincinnati Bengals and Indianapolis Colts at 3-1. The reasoning is that the more wins a team has versus power opponents, the better prepared the team is for the postseason.

In principle, the Authentic Games Metric makes sense. A great team should be able to beat other great teams rather than pounding on cupcakes. But I was immediately suspicious because it plugs into what I call the Grand Championship Delusion: the belief that the team that wins the championship is always or even usually the best team. We want desperately to believe that the team that wins the title is not a team that had a good season and then got hot. Or a team that had a good season and then had a few breaks go their way. We want to believe that they possess some ineffable quality — clutchiness, manliness, moxie — that makes them win. And the idea that their record in “Authentic Games” is tempting as a way to measure their supposed manliness.

However, once my skepticism was aroused, I came up with numerous problems with the Authentic Games Metric:

  • There is a great deal of parity in the NFL. If you opened up the playoffs to all 32 teams, we would doubtless see the occasional one seed upset by the occasional 16 seed. And the likelihood of upsets only increases as the teams become closer in quality. A team’s record in a 16-game season is subject to enough random variation, chance plays, tipped passed and blown calls. When you narrow it down to 2-6 “Authentic Games” between teams of near-equal quality, you’re basically just looking at noise.
  • This is born out by research that Football Outsiders has done: great teams are usually defined by their ability to dominate lesser teams not win close games. A great team puts games out of reach; a lucky team wins the nail-biters.
  • Even if Authentic Games gave you some read on who is really the best team in the NFL, applying those to playoffs results invokes even more uncertainty. You’re now dealing with an even smaller sample of 11 games involving teams that are nearly equal in quality.
  • Basically, I think this is yet another attempt to find the “special sauce” that would enable us to know why some some #5 seeds win the Super Bowl while #1 seeds fail. Because, to our simian brains, “football happens” isn’t enough. We don’t want to believe that the winner is a result of team quality convolved with a lot of luck and random chance. We don’t want to believe that a team wins the Super Bowl because they just happen to have three or four good games in a row. No, there has to be a reason behind the madness.

    Anyway, here’s what I did to test the Authentic Games Metric:

    I took all 60 playoffs teams from the last five years. I then went through their schedules and kept track of how they did against other playoff contenders. I then tracked how well this predicted playoff results. In the case of a tie, I went with the team that had more Authentic Games. Since we are subject to noise, I did a second test just looking at strong predictions — where one team was two or more games over or under .500 against fellow playoff teams during the regular season and their opponent was not.

    As a control, I then checked predictions made based purely on their regular season record (with a tie going to the higher seeded team) or which team had home-field advantage. I then checked against predictions based on Football Outsider’s team rankings.

    The result? It really isn’t even close. Teams that won the most Authentic Games were 25-25 in their matchups. For strong predictions, teams were 17-18. Essentially, the Authentic Games Metric is the same as flipping a coin. Of course, using the regular season records was 27-28, which bears out TMQ’s criticism that seeding and the regular season don’t tell you nearly enough about the relative quality of the best team.

    However, I did find two predictors that were useful. One was homefield advantage. Home teams were 30-20 in the playoffs. Even if you discount home teams in the divisional round, who have had a bye while their opponent was playing, home teams still win 60% of the time (I’m obviously excluding the Super Bowl here).

    Of similar quality was Football Outsider’s team efficiency ratings, which went 32-23. Not great, but pretty decent all things considered. FO would be the first to admit that predicting the winner in a football game is a fool’s business. Not only do you have the problem of random luck and chance, you have the problem that football is about matchups. A team may be, by some metric, the best. But if they have a weak secondary, they can get torched by a “lesser” team.

    Breaking it down by year reveals just how random the Authentic Game metric is:

  • In 2008, Arizona went 1-4 in authentic games and came within a hair of winning the Super Bowl. Meanwhile, Indianapolis (5-1) died in the first round against San Diego (0-5). Philly (4-2) made the conference final but only because they played New York, also 4-2.
  • In 2009, Indianapolis and New Orleans were both 3-1 in Authentic Games, which would seem to give the metric some credence. But Minnesota (4-1) died in the conference final while Baltimore (1-6) made the divisional round. This was actually the best year for the Authentic Games Metric.
  • In 2010, Pittsburgh (2-4) made the Super Bowl while New England (6-1) died in the first round. The AFC final matches two 2-4 teams in Pittsburgh and New York.
  • In 2011, Baltimore and Green Bay went 6-0 in Authentic Games. Only Baltimore even made the conference final. The New York Giants went 1-3 and won the Super Bowl. Detroit went 1-5 and lost in the first round. New Orleans went 5-1 and lost in the division round. Atlanta went 1-4 and lost in the first round. San Francisco went 4-1 and lost the conference title game. Instead of a matchup of Baltimore (6-0) and Green Bay (6-0) we got New England (1-2) against New York (1-3).
  • In 2012, Seattle was 4-1 in Authentic Games and lost in the divisional round. Green Bay went 2-4 and lost in the division round; Baltimore went 2-4 and won the Super Bowl. Instead of Seattle (4-1) against Indianapolis (3-2), we got Baltimore (2-4) against San Francisco (3-2).
  • You see? You can occasionally pick out a team that did well in both Authentic Games and the playoffs but it’s mostly random. Part of this is, again, the vicissitudes of football. But FO’s rankings don’t do too badly. I think it’s more of a flaw in the Authentic Games metric itself. Because a metric based on 2-6 games is going to be worse, not better, than one based on 16.

    If you want to predict how the NFL post-season will go, here’s my system:

    1) When in doubt, pick the home team or the team with better FO ranking.
    2) Have a lot of doubt.