Wednesday, March 26, 2014

Hypersensitive Probabilities

There's been a lot of talk about Tennessee and their bewilderingly high tempo-free rating, a rating so confusing that Ken Pomeroy wrote a column trying to explain it. To summarize, Tennessee is CHAOSTEAM: they throw a wrench into the ranking system by consistently winning by a lot and losing by a little. That's how a 12-loss team can have a Pythagorean rating of 91.61%.

A team's Pythagorean rating is an estimate of the probability of the team defeating an average team. In Division I in 2013-14, the teams that rate out as most average are Holy Cross, a good team from a weak conference, and USC, a bad team from a power conference. If you have the Pythagorean ratings for two teams, A and B, you can calculate the probability that Team A defeats Team B using the log5 formula rediscovered by Bill James:

In words, the log5 formula states that the odds in favor of Team A beating Team B are equal to the odds in favor of Team A beating an average team times the odds in favor of an average team beating Team B. This formula is equivalent to the Elo chess rating system and several formal statistical models. If we apply the log5 formula to the Tennessee-Michigan game, we input Tennessee's Pythagorean rating of 91.61% and Michigan's rating of 90.32% and find that the probability of Tennessee winning the game is about 55%.

However, a lot of people are looking askance at these Pythagorean ratings as it just doesn't seem right that a 24-12 team from the weakest of the power conferences should be the #6 team in the country. Tennessee has been extremely unlucky according to kenpom: they rank 337th out of 351 with a luck score of -0.86. Among major conference teams, only Iowa and Oklahoma State have been unluckier.

The word "luck" is a misnomer here as it implies that deviations from the model are due to random chance. For all but the simplest of phenomena, errors in a statistical model are a combination of both random chance and systemic error. No existing model has captured effects like the B Factor - these effects are not necessary intangible, but they aren't captured by the models and are part of the reason that predictions can go awry.

Let's look at the Final Four probabilities for the Midwest Regional and suppose that some percentage of a team's luck is a real, systemic, deviation from their calculated Pythagorean rating. If luck is completely transitory, then Louisville is the clear favorite with a 46% chance of making it to Dallas, followed by Tennessee, Michigan, and then Kentucky.

The ratings and luck for the four teams are:

            Pyth  Luck
Louisville .9543 -.034
Tennessee  .9161 -.086
Michigan   .9032  .054
Kentucky   .8986 -.043

Of these four teams, only Michigan has had positive luck this year. If some of this luck is a real effect, how does luck effect the probability of going to the Final Four? Introduce a luck factor that can vary from 0 to 1, and suppose that each team's actual rating is Pyth+Luck*(luck factor). As the effect of luck (or uncertainty) increases, the chances of making the Final Four change a great deal:
You only need to believe that 10% of that luck is a real systemic error for Tennessee and Michigan to be at even odds. If you're willing to accept that even more luck is real, you can increase the chances of Michigan making the Final Four from 17% to 57%!

The issue is that probabilities computed using log5 become very sensitive when each team has a very high Pythagorean rating. This isn't a huge problem in baseball, where teams almost never win less than 30% or more than 70% of their games. But this does cause a problem when calculating the win probabilities between very good teams. Slight changes in the Pythagorean ratings can result in big changes in the log5 output.

For comparison, let's turn the four teams in the Midwest Regional into average teams by subtracting 0.4 from their Pythagorean ratings. I've changed the team names to average teams with similar tempo-free numbers:

                Pyth  Luck
Boston College .5543 -.034
Santa Clara    .5161 -.086
Holy Cross     .5032  .054
Brown          .4986 -.043

When the teams are close to average, the log5 probabilities are far less sensitive to changes in the Pythagorean ratings. Holy Cross, as the only lucky team, sees its win probability increase from 23% to 35% as we increase the role of luck, but this is far less than the 40% swing we saw with Michigan. The probabilities for Boston College and Brown decrease only slightly, while Santa Clara, the unluckiest team of all, has its probability of winning two games decrease by 10%.

Most statistical models are far better at predicting the behavior of the average or typical subjects and far weaker at predicting how outliers will behave. Now that we've reached the Sweet Sixteen, it turns out that tempo-free stats are no different: the probability that a given team will make the final four is very sensitive to the precise value of the Pythagorean rating, to the point where these ratings can really only give a rough estimate of win probability with very large error bars.

Or, to put things even more simply: I am incapable of predicting what's going to happen in the Sweet Sixteen because no one can predict what's going to happen with a high degree of confidence. The probabilities are just too sensitive to underlying uncertainties and systemic errors. Being uncertain about the outcome makes sports fun. Also terrifying, but fun.

No comments: