Monday, March 31, 2014

Sometimes

Mister Fantastic tried his absolute best. (Dustin Johnson/UMHoops.com)
Dustin Johnson captured many amazing shots of the Regional Final, which I recommend checking out here before going forward.  But this one stands out, because it speaks to the hardest part about being a sports fan.

Sometimes, the other guy just makes a play.  Your team didn't do anything wrong, the other guy just does his job and it works.  Aaron Harrison was one of the most highly sought after recruits in the country last season and sometimes talent just overwhelms.

Look at what else is going on in that shot.  Look at all of that Kentucky Blue in the stands, punctuated by hints of adidas highlighter yellow.  Kentucky is actually a touch closer to Indianapolis than Michigan is, especially the heart of Michigan country relative to the heart of Big Blue Nation.  Big Blue Nation was out in force, they wanted to see the battle royale with their in-state rivals in Louisville, and now they were back here in the House that Peyton Built to see their prized class, the pre-season #1, their beloved Wildcats try to prove everyone wrong by advancing to the Final Four.  The only team younger than Michigan in the tournament, they had grown up quickly, or so we had been told.  Sometimes talent can find its way.

Sometimes statistics are defied.  Kentucky shot 7-11 on three pointers, or nearly double their season average, whereas conversely, Michigan shot just a shade under their usual 40% on threes.  (Side note: If they call LeVert's shot in the first half a three, which could have gone either way, then Michigan shoots 42% for the game, or just a shade above their season average.  It also puts Michigan up one after the putback, which may have changed any number of things, but that is speculation at best.)

Sometimes, your luck runs out.  Michigan had not lost a game by fewer than double digits since their two point loss to Arizona back in December.  When Michigan had lost, it had lost big, at Indiana, at Iowa, to Wisconsin, to Michigan State, it was rough outings.  Conversely, Michigan has won thirteen games by single digits, seven by three points or fewer.  Michigan had lived on the knife edge on Friday night against the Vols and survived.  This time, they lost their margin of error that they had had throughout the tournament was not there, and it was over.  It was gone.

Sometimes you don't appreciate you have until it's gone.  Which is why I am thankful we were able to send off Jordan Morgan on a high note.  Morgan is exactly what we want our players to be, tenacious, hard-working, always working to be better, and, oh yeah, a pretty damn good student to boot.  To see all of the #ThanksJMo tweets after the game is to know that we didn't lose sight of what was going to end when Stauskas's last shot fell short.  We know we're probably also losing some other players, and we'll deal with that when the time comes, but for now, we appreciate what we had, because it was fun.  It was just fun.

So a tip of a cap to Coach Beilein and his staff for making us care again.  A salute to the Wildcats for a well-earned victory.  But most of all, a thanks for the memories that this team created, ones not soon forgotten.

Wednesday, March 26, 2014

Hypersensitive Probabilities

There's been a lot of talk about Tennessee and their bewilderingly high tempo-free rating, a rating so confusing that Ken Pomeroy wrote a column trying to explain it. To summarize, Tennessee is CHAOSTEAM: they throw a wrench into the ranking system by consistently winning by a lot and losing by a little. That's how a 12-loss team can have a Pythagorean rating of 91.61%.

A team's Pythagorean rating is an estimate of the probability of the team defeating an average team. In Division I in 2013-14, the teams that rate out as most average are Holy Cross, a good team from a weak conference, and USC, a bad team from a power conference. If you have the Pythagorean ratings for two teams, A and B, you can calculate the probability that Team A defeats Team B using the log5 formula rediscovered by Bill James:


In words, the log5 formula states that the odds in favor of Team A beating Team B are equal to the odds in favor of Team A beating an average team times the odds in favor of an average team beating Team B. This formula is equivalent to the Elo chess rating system and several formal statistical models. If we apply the log5 formula to the Tennessee-Michigan game, we input Tennessee's Pythagorean rating of 91.61% and Michigan's rating of 90.32% and find that the probability of Tennessee winning the game is about 55%.

However, a lot of people are looking askance at these Pythagorean ratings as it just doesn't seem right that a 24-12 team from the weakest of the power conferences should be the #6 team in the country. Tennessee has been extremely unlucky according to kenpom: they rank 337th out of 351 with a luck score of -0.86. Among major conference teams, only Iowa and Oklahoma State have been unluckier.

The word "luck" is a misnomer here as it implies that deviations from the model are due to random chance. For all but the simplest of phenomena, errors in a statistical model are a combination of both random chance and systemic error. No existing model has captured effects like the B Factor - these effects are not necessary intangible, but they aren't captured by the models and are part of the reason that predictions can go awry.

Let's look at the Final Four probabilities for the Midwest Regional and suppose that some percentage of a team's luck is a real, systemic, deviation from their calculated Pythagorean rating. If luck is completely transitory, then Louisville is the clear favorite with a 46% chance of making it to Dallas, followed by Tennessee, Michigan, and then Kentucky.

The ratings and luck for the four teams are:

            Pyth  Luck
Louisville .9543 -.034
Tennessee  .9161 -.086
Michigan   .9032  .054
Kentucky   .8986 -.043

Of these four teams, only Michigan has had positive luck this year. If some of this luck is a real effect, how does luck effect the probability of going to the Final Four? Introduce a luck factor that can vary from 0 to 1, and suppose that each team's actual rating is Pyth+Luck*(luck factor). As the effect of luck (or uncertainty) increases, the chances of making the Final Four change a great deal:
You only need to believe that 10% of that luck is a real systemic error for Tennessee and Michigan to be at even odds. If you're willing to accept that even more luck is real, you can increase the chances of Michigan making the Final Four from 17% to 57%!

The issue is that probabilities computed using log5 become very sensitive when each team has a very high Pythagorean rating. This isn't a huge problem in baseball, where teams almost never win less than 30% or more than 70% of their games. But this does cause a problem when calculating the win probabilities between very good teams. Slight changes in the Pythagorean ratings can result in big changes in the log5 output.

For comparison, let's turn the four teams in the Midwest Regional into average teams by subtracting 0.4 from their Pythagorean ratings. I've changed the team names to average teams with similar tempo-free numbers:

                Pyth  Luck
Boston College .5543 -.034
Santa Clara    .5161 -.086
Holy Cross     .5032  .054
Brown          .4986 -.043

When the teams are close to average, the log5 probabilities are far less sensitive to changes in the Pythagorean ratings. Holy Cross, as the only lucky team, sees its win probability increase from 23% to 35% as we increase the role of luck, but this is far less than the 40% swing we saw with Michigan. The probabilities for Boston College and Brown decrease only slightly, while Santa Clara, the unluckiest team of all, has its probability of winning two games decrease by 10%.

Most statistical models are far better at predicting the behavior of the average or typical subjects and far weaker at predicting how outliers will behave. Now that we've reached the Sweet Sixteen, it turns out that tempo-free stats are no different: the probability that a given team will make the final four is very sensitive to the precise value of the Pythagorean rating, to the point where these ratings can really only give a rough estimate of win probability with very large error bars.

Or, to put things even more simply: I am incapable of predicting what's going to happen in the Sweet Sixteen because no one can predict what's going to happen with a high degree of confidence. The probabilities are just too sensitive to underlying uncertainties and systemic errors. Being uncertain about the outcome makes sports fun. Also terrifying, but fun.

Tuesday, March 18, 2014

Tempo-Free Hate 2013-2014: Bo Ryan's Revenge!

Last year, between the Ohio and Wisconsin games, we spent the time wondering why do we all hate Aaron Craft so much? To answer this question, we proposed the Four Factors of Hate, and used these factors to find the Big Ten leaders in Tempo-Free Hate. This year, the question of why we all hate Aaron Craft was answered once and for all when Doug Gottlieb blamed this on the ball being too slippery:

GIF by Timothy Burke (@bubbaprog)
Even though unwarranted media attention and apologia are good reasons to be annoyed by a player, the Four Factors of Hate try to quantify just what makes a player so annoying using only the stats on the floor. The four factors are:
  • Steals per personal foul (ST/PF). Also known as handchecking ability, or Craftiness.
  • Ability to draw fouls (Free Throw Attempts/(Minutes Played * Usage %)).  Jordan Morgan is not as bad at this as you think.
  • Three-Point Specialization (3FG/FG). The "Just A Shooter" Award.
  • Free Throw Percentage (FT%). For those annoying players who just Win The Game. Since all players are supposed to be able to shoot free throws reasonably well FT% only counts half as much as the other three.
All stats come from Sports Reference College Basketball, except for team adjusted tempo, which is pulled from the front page at kenpom. Who is the most annoying player in the Big Ten? Find out after the jump!