The Hoover Street Rag: A golden flash of stupidity

Thursday, November 29, 2012

A golden flash of stupidity

(Post has been edited and updated in response to ~~Sam~~ Ed Feng's comments. Unnecessary profanity has been removed and/or replaced with red text. Also see below. Whether the title of this post refers to me, Ed, or Wesley Colley is for you to decide.)

So I woke up this morning and got on the Internet, looking for the outcome of the ohio-Duke game -- I was, like all good-hearted people, rooting for both to lose so that Michigan could be ranked #2 -- when I was blindsided by an act of reading comprehension and statistical innumeracy so massive that I completely forgot what I was doing and said, "I must write a blog post to address this folly!"

I am not familiar with Ed Feng's work at thepowerrank.com, so I do not know if his reputation is good or bad. However, in science as in life, reputation should be irrelevant: if you're spouting B.S., you should expect to get called on your B.S. no matter what your track record is. Feng's attack on the Colley Matrix is so incorrect that it ranks as a sub-Bleacher Report level trolling of Matt Sussman, the state of Oklahoma, and statisticians everywhere. It convinced Stewart Mandel, of course.

In his article, Feng writes:

However, the Colley Matrix, the one fully transparent computer poll, does not use this game-specific information. The system considers a team's win-loss record and strength of schedule; yet, the results of each individual game are not counted as an input. The method doesn't care whether Kent State's loss came against Kentucky or against Rutgers.

This is incorrect. Competely and utterly incorrect. The Colley Matrix does use game-specific information: it takes as its input solely the record of who won or who lost each game. It doesn't take into account whether that game was home, away, or at a neutral site. It doesn't take into account margin of victory and wouldn't even if it were allowed. It doesn't take into account team reputation, hence Colley's referring to the method as "bias-free." All of this information is freely available in the PDF document Wesley Colley provides explaining his method, ~~which Feng either didn't read or didn't understand~~ (Edit: uncalled-for.) For example, the input to the Colley Matrix for the Big Ten conference play would be:

A 1 in your row means you won the game, a -1 means you lost, and a 0 means you didn't play this year. If Illinois has beaten Indiana instead of vice versa, then the -1 in Row 1, Column 2 would become a 1 and the 1 in Row 2, Column 1 would become a -1. Encoded slightly differently, this is the input the Colley Matrix uses. It takes into account the outcome of each game; what Feng wrote was 100%, absolutely, positively, wrong, and SI.com should issue a retraction or correction.

The source of Feng's outrage is the Colley Matrix's "play god" feature. If you edit the rankings by assuming that Kent State lost to Rutgers and beat Kentucky instead of the other way around, Kent State ends up ranked #17 instead of #16. Let's go to the chart:

Seriously, Dr. Colley, hire someone to redesign your web page.

If we change Kent State's loss to Kentucky to a loss to Rutgers, Kent State drops a spot in the rankings because Rutgers didn't lose to Kent State and is thus ranked higher! The only other change is the Top 25 is that South Carolina and LSU flip, which happens because Skarlina's win over Kentucky would count for less if Kentucky hadn't beaten Kent State.

Feng was full of outrage because he couldn't ~~even be fucking bothered to~~ look at the rightmost column of the chart. The actual raw rating for Kent State is 0.766. In the hypothetical situation, Kent State's raw rating INCREASES to 0.768. This is exactly in line with common sense. If you take away both a good win and a bad loss from a team's record, their rating doesn't change much.

The switch of the outcome of the Kent State-Kentucky game is nothing more than a distraction, sort of the college football equivalent of Bertrand's box paradox. If the only change you make is that Rutgers beats Kent State, then Rutgers is ranked #16 and Kent State is ranked #25. That is, they switch positions as if the outcome of their match-up actually meant something.

(Edit: Mr. Feng's intentions are not in evidence in his article and it was inappropriate for me to cast aspersions on them. I apologize to Mr. Feng.)
Sam Feng's article is a perfect example of anti-science. He started with the conclusion he wanted -- Kent State shouldn't make the BCS -- and cherry-picked data in order to make his point. He ignored all the data that demonstrated he was full of shit, and owes Dr. Colley an apology for misrepresenting his work.

However, this is not to say that I am a fan of the Colley Matrix. While the statistical methodology is sound, the underlying non-mathematical assumptions about football are inane. Colley is proud of the fact that, in his rankings, "score margin does not matter at all" and writes:

Ignoring margin of victory eliminates the need for ad hoc score deflation methods and home/away adjustments. If you have to go to great lengths to deflate scores, why use scores?

You should use scores because they provide more information about the quality of a team! Our post-season opinion of Michigan is based heavily on the fact that Alabama beat Michigan 41-14 and not 24-23. Based on that game, the Colley Matrix concludes only that Alabama was better than Michigan on that day, while the margin of victory tells us that Alabama was far better than Michigan that day. Colley's presentation of his rankings is anti-scientific: he is proud of the fact that he throws away useful information because he doesn't know how to incorporate it into his system.

In contrast, consider Nate Silver's acclaimed electoral prediction system. Silver knows that certain pollsters skew toward either Republicans or Democrats. Does he throw out their polls? No! He calculates, in a rational (not ad hoc) fashion the "house effect" of each poll and adjusts its margins accordingly. With access to full play-by-play data, there is no reason that a rational adjustment of margin of victory could not be applied for use in football rankings. With a little thought, a rational adjustment could not only not reward rolling up the score, but actually punish it as unsportsmanlike.

In summary, if you throw our data for no good reason, you're bad at statistics. I disapprove of Colley because I don't agree with his assumptions, but at least he puts them out for all the world to see. I disapprove of Feng because I don't know what the fuck he was doing, and I don't think he knows what the fuck he was doing either.

UPDATE (1:52 PM):

Ed Feng graciously replied in the comments and noted that the matrix shown above is not the Colley Matrix. This is correct but not germane to my point. The Colley Matrix C for the Big Ten regular season is:

The number 10 appears along the diagonal because the it is equal to the number of games played plus two. The -1 values off-diagonal indicate that the two opponents played.

The record for each team is converted into a vector, b, where, for each team, its corresponding element in b is equal to the 1 plus the 1/2 times (the number of wins - the number of losses).

Colley's ratings are the solution r to the matrix equation Cr = b. For this example, the solution is:

You should agree that Ohio State at #1 and Illinois at #12 makes a lot of sense.

Colley's method does not explicitly include the output of every game as individual wins and losses are collapsed into the vector b. This makes the method invariant to "circles of death," e.g., three teams that are 1-1 against each other. In the Big Ten this season, Iowa defeated MSU, MSU defeated Indiana, and Indiana defeated Iowa. If you change the outcome of all three of these games so that MSU beat Iowa, Indiana beat MSU, and Iowa defeated Indiana, then the vector b does not change and thus Colley's rankings do not change. This is not a bad feature for a ranking system based solely on wins and losses to have!

I stand by my statement that Feng's assertion that "[Colley's] method doesn't care whether Kent State's loss came against Kentucky or against Rutgers" is, in isolation, incorrect. I will add the following qualification though. There is a chain between the SEC, Big East and the MAC where:

Kent State lost to Kentucky
Kentucky lost to Louisville
Louisville lost to Connecticut
Connecticut lost to Rutgers
Rutgers lost to Kent State

If you were to reverse the outcome of all five of these games, Colley's method would produce the exact same ratings. In that sense, Colley's method does not "care" whether Kent State lost to Kentucky or Rutgers as part of a larger set of events that the method doesn't care about. It cares that either: a) Kent State lost to Kentucky but has a four-level win chain over Kentucky, or b) Kent State lost to Rutgers but has a four-level win chain over Rutgers, and treats the two situations equivalently. If you switch the outcomes of all five of the above games in Colley's what-if machine, then you get back the exact same ratings.

Feng's attack on the Colley Ratings, as posted on SI.com, is still very misleading because switching only KSU-Kentucky and Rutgers-KSU results in a perfectly reasonable change to the rankings, namely, Rutgers jumps ahead of KSU. A correction or clarification is still in order.

7 comments:

Unknown said...: Hi, I'm the author of the SI article. The matrix you state is not the Colley matrix. It is not consistent with equation 19 of his paper. Thank you.; Thursday, November 29, 2012 10:48:00 AM
David said...: I didn't claim that the matrix is the Colley Matrix. I state that it can be used as the input to Colley's algorithm. The example matrix I give is equivalent to the table above Eq. 24 in Colley's paper, not Eq. 19. At each stage of the iteration, the fact that two teams have different ratings affects the update. For example, since team a has initial rating .4 and team b has initial rating .5, team e's win over b counts more than its win over a at the first iteration. By the final iteration, when all teams have different ratings, each individual game outcome affects the ratings differently, as each r_j^i in Eq. (7) is different.

I quoted your statement that the Colley Matrix does not use this game-specific information. The game-specific information is encoded in the ratings vector r as indicated by Eq. (7). The inputs to the algorithm are a matrix like the one I showed in my post and an initial rating r_i = 0.5 for each i that assumes no prior knowledge of any team's ability. Thus, your statement in your article is false and requires correction.; Thursday, November 29, 2012 11:22:00 AM
Unknown said...: David,

Thanks for the updated post. I could have used a better example. You're absolutely correct about the 3 ring chains. I would have used that in the article, but it was more complicated to explain for an already complicated article.

Plus, Stanford, Oregon and Washington were in a 3 ring chain of death, and it makes the Cardinal look bad :); Thursday, November 29, 2012 2:04:00 PM
David said...: Ed (aka Sam-I-Am-Not :) ),

My second favorite team is Oklahoma, so I'm not attacking about of bias here. I think your critique of the Colley Matrix was inaccurate but that you also know far more about the subject than came across in the SI.com article. Plus I love flame wars over statistics.

I think Colley's rankings are bad because he willfully ignores margin of victory, but that's as much the BCS's fault as his. Within the constraints of the BCS, you could do worse (Billingsley).; Thursday, November 29, 2012 2:19:00 PM
dougj said...: Other than your article above, I have not read about the Colley rankings. But based on your description, it indeed does *not* take into account game-specific information. You note the ranking is determined by Cr = b, where b is determined purely by win-loss record and C is essentially a matrix for who has played who (and does not include any information on game outcomes).

So r is entirely determined without using game-specific information.; Thursday, November 29, 2012 3:28:00 PM
David said...: @dougj: The values of the vector b are where the game-specific information dependence lies. b is determined purely by win-loss record, true, but the values of b are not independent of each other - you can't change one value without changing at least one other, because if you change a win to a loss somewhere, you have to change a loss to a win somewhere else. These dependences are caused by the head-to-head matchups.

If you change the outcome of any one game so that A beats B instead of B beats A, it can affect every value of the solution r = inv(C)b provided that the adjacency graph of C is fully-connected, as it is by the end of every CFB season. r is thus indirectly determined using game-specific information through changes in the values of b.

b is invariant to the direction of "circles of death," and in that sense only does it not depend on game-specific information. I don't know if the Colley ratings are invariant only to changing the directions of circles of death; if there are other transformations that don't affect the ranking, they'll likely have more problematic interpretations.; Thursday, November 29, 2012 3:51:00 PM
David said...: So I thought about this problem some more, and the only changes of sets of H2H match-ups for which b is invariant are sets containing only entire circles of death. I think we've already established that having only these cycles is sufficient for invariance, so we just need to show it's necessary.

Proof sketch: Represent the set of game outcomes as directed graph, with a vertex V_i for each team, and a directed edge E_{i,j} for each game outcome. A "circle of death" is then a sequence of edges that begins and ends and the same vertex. Changing the outcome of a game is the equivalent of removing E_{i,j} and replacing it with E_{j,i}.

Let's choose a set of directed edges to transform that cannot be decomposed into cycles. Then there must exist a path E_{a,b} --> E_{b,c} --> ... --> E_{y,z} that is not a cycle, i.e., V_z ~= V_a. Along this path, the out-degree of V_a (i.e., the number of wins for team a) is decreased by one and the in-degree of V_a (i.e., the number of losses for team a) is decreased by one; the reverse is true for team z. Since the function to calculate b_a and b_z is a function of the difference of the in-degree and the out-degree, and transformation that change the overall degree of the vertex are not allowed, a transformation of outcomes that cannot be decomposed into cycles must change at least two values of b.

Since C is unchanged by any such transformation, any non-cyclical transformation must result in a change to the ratings vector r and then possibly to the overall rankings.

So the Colley ratings will not change if you replace the scenario (Oregon beat Washington beat Stanford beat Oregon) with the scenario (Stanford beat Washington beat Oregon beat Stanford). They will change if you replace (Kentucky beat Kent State beat Rutgers) with (Rutgers beat Kent State beat Kentucky) because that's not a cycle. So, my interpretation is that the ratings do care whether Kent State beat Rutgers or Kentucky unless you included the rest of the cycle.

Dear God, I think I understand the Colley rankings better than Colley now. Their assumptions are flawed for the application to ranking football teams, but the method itself is reasonable in the absence of better information. I'd say they were a good ideas whose time has passed now that we're in a more data-rich age. I'd rather have Ed's rankings making the decisions than Colley's or a roomful of NCAA bureaucrats.; Friday, November 30, 2012 7:25:00 AM