(Post has been edited and updated in response to Sam Ed Feng's comments. Unnecessary profanity has been removed and/or replaced with red text. Also see below. Whether the title of this post refers to me, Ed, or Wesley Colley is for you to decide.)
So I woke up this morning and got on the Internet, looking for the outcome of the ohio-Duke game -- I was, like all good-hearted people, rooting for both to lose so that Michigan could be ranked #2 -- when I was blindsided by an
act of reading comprehension and statistical innumeracy so massive that I completely forgot what I was doing and said, "
I must write a blog post to address this folly!"
I am not familiar with Ed Feng's work at
thepowerrank.com, so I do not know if his reputation is good or bad. However, in science as in life, reputation should be irrelevant: if you're spouting B.S., you should expect to get called on your B.S. no matter what your track record is. Feng's attack on the Colley Matrix is so incorrect that it ranks as a sub-Bleacher Report level trolling of
Matt Sussman, the state of Oklahoma, and statisticians everywhere. It convinced Stewart Mandel,
of course.
In his article, Feng writes:
However, the Colley Matrix, the one fully transparent computer poll, does not use this game-specific information. The system considers a team's win-loss record and strength of schedule; yet, the results of each individual game are not counted as an input. The method doesn't care whether Kent State's loss came against Kentucky or against Rutgers.
This is
incorrect. Compete
ly and utter
ly incorrect. The Colley Matrix does use game-specific information: it takes as its input solely the record of who won or who lost each game. It doesn't take into account whether that game was home, away, or at a neutral site. It doesn't take into account margin of victory and wouldn't even if it were allowed. It doesn't take into account team reputation, hence Colley's referring to the method as "bias-free." All of this information is freely available in the
PDF document Wesley Colley provides explaining his method,
which Feng either didn't read or didn't understand (Edit: uncalled-for.) For example, the input to the Colley Matrix for the Big Ten conference play would be:
A 1 in your row means you won the game, a -1 means you lost, and a 0 means you didn't play this year. If Illinois has beaten Indiana instead of vice versa, then the -1 in Row 1, Column 2 would become a 1 and the 1 in Row 2, Column 1 would become a -1. Encoded slightly differently, this is the input the Colley Matrix uses. It takes into account the outcome of each game; what Feng wrote was
100%, absolutely, positively, wrong, and SI.com should issue a retraction or correction.
The source of Feng's outrage is the Colley Matrix's "play god" feature. If you edit the rankings by assuming that Kent State lost to Rutgers and beat Kentucky instead of the other way around, Kent State ends up ranked #17 instead of #16. Let's go to the chart:
|
Seriously, Dr. Colley, hire someone to redesign your web page. |
If we change Kent State's loss to Kentucky to a loss to Rutgers, Kent State drops a spot in the rankings because
Rutgers didn't lose to Kent State and is thus ranked higher! The only other change is the Top 25 is that South Carolina and LSU flip, which happens because Skarlina's win over Kentucky would count for less if Kentucky hadn't beaten Kent State.
Feng was full of outrage because he couldn't
even be fucking bothered to look at the rightmost column of the chart. The actual raw rating for Kent State is 0.766. In the hypothetical situation, Kent State's raw rating
INCREASES to 0.768. This is exactly in line with common sense. If you take away both a good win and a bad loss from a team's record, their rating doesn't change much.
The switch of the outcome of the Kent State-Kentucky game is nothing more than a distraction, sort of the college football equivalent of
Bertrand's box paradox. If the only change you make is that Rutgers beats Kent State, then Rutgers is ranked #16 and Kent State is ranked #25. That is,
they switch positions as if the outcome of their match-up actually meant something.
(Edit: Mr. Feng's intentions are not in evidence in his article and it was inappropriate for me to cast aspersions on them. I apologize to Mr. Feng.)
Sam Feng's article is a perfect example of anti-science. He started with the conclusion he wanted -- Kent State shouldn't make the BCS -- and cherry-picked data in order to make his point. He ignored all the data that demonstrated he was full of shit, and owes Dr. Colley an apology for misrepresenting his work.
However, this is not to say that I am a fan of the Colley Matrix. While the statistical methodology is sound, the underlying non-mathematical assumptions about football are inane. Colley is proud of the fact that, in his rankings, "score margin does not matter at all" and writes:
Ignoring margin of victory eliminates the need for ad hoc score deflation methods and home/away adjustments. If you have to go to great lengths to deflate scores, why use scores?
You should use scores because they provide more information about the quality of a team! Our post-season opinion of Michigan is based heavily on the fact that Alabama beat Michigan 41-14 and not 24-23. Based on that game, the Colley Matrix concludes only that Alabama was better than Michigan on that day, while the margin of victory tells us that Alabama was
far better than Michigan that day. Colley's presentation of his rankings is anti-scientific: he is proud of the fact that he throws away useful information because he doesn't know how to incorporate it into his system.
In contrast, consider Nate Silver's acclaimed electoral prediction system. Silver knows that certain pollsters skew toward either Republicans or Democrats. Does he throw out their polls? No! He calculates, in a rational (not
ad hoc) fashion the "house effect" of each poll and adjusts its margins accordingly. With access to full play-by-play data, there is no reason that a rational adjustment of margin of victory could not be applied for use in football rankings. With a little thought, a rational adjustment could not only not reward rolling up the score, but actually
punish it as unsportsmanlike.
In summary, if you throw our data for no good reason, you're bad at statistics. I disapprove of Colley because I don't agree with his assumptions, but at least he puts them out for all the world to see. I disapprove of Feng because I don't know what the fuck he was doing, and I don't think he knows what the fuck he was doing either.
UPDATE (1:52 PM):
Ed Feng graciously replied in the comments and noted that the matrix shown above is not the Colley Matrix. This is correct but not germane to my point. The Colley Matrix
C for the Big Ten regular season is:
The number 10 appears along the diagonal because the it is equal to the number of games played plus two. The -1 values off-diagonal indicate that the two opponents played.
The record for each team is converted into a vector,
b, where, for each team, its corresponding element in
b is equal to the 1 plus the 1/2 times (the number of wins - the number of losses).
Colley's ratings are the solution
r to the matrix equation
Cr =
b. For this example, the solution is:
You should agree that Ohio State at #1 and Illinois at #12 makes a lot of sense.
Colley's method does not explicitly include the output of every game as individual wins and losses are collapsed into the vector
b. This makes the method invariant to "circles of death," e.g., three teams that are 1-1 against each other. In the Big Ten this season, Iowa defeated MSU, MSU defeated Indiana, and Indiana defeated Iowa. If you change the outcome of
all three of these games so that MSU beat Iowa, Indiana beat MSU, and Iowa defeated Indiana, then the vector
b does not change and thus Colley's rankings do not change. This is not a bad feature for a ranking system based solely on wins and losses to have!
I stand by my statement that Feng's assertion that "[Colley's] method doesn't care whether Kent State's loss came against Kentucky or against Rutgers" is, in isolation, incorrect. I will add the following qualification though. There is a chain between the SEC, Big East and the MAC where:
- Kent State lost to Kentucky
- Kentucky lost to Louisville
- Louisville lost to Connecticut
- Connecticut lost to Rutgers
- Rutgers lost to Kent State
If you were to reverse the outcome of all five of these games, Colley's method would produce the exact same ratings. In that sense, Colley's method does not "care" whether Kent State lost to Kentucky or Rutgers as part of a larger set of events that the method doesn't care about. It cares that either: a) Kent State lost to Kentucky but has a four-level win chain over Kentucky, or b) Kent State lost to Rutgers but has a four-level win chain over Rutgers, and treats the two situations equivalently. If you switch the outcomes of all five of the above games in Colley's
what-if machine, then you get back the exact same ratings.
Feng's attack on the Colley Ratings, as posted on SI.com, is still very misleading because switching only KSU-Kentucky and Rutgers-KSU results in a perfectly reasonable change to the rankings, namely, Rutgers jumps ahead of KSU. A correction or clarification is still in order.