Nate Silver, Polls and the RCP 2000 Fiasco

I can’t recall an election cycle when so much attention was paid to polls. We do, of course, have more polling than ever. And the election is likely to be very close, so everyone is riveted on the polls. But it’s not just the attention to the polls: it’s the loud debate over them. I can’t recall seeing so many articles analyzing the polls, adjusting the polls, arguing the polls and selectively quoting polls. This has been especially strong from the Republican side, which has claimed that 1) the polls are skewed; 2) Nate Silver is a gay Obama supporter and can’t be trusted; 3) the polls are skewed; 4) Rasmussen is the only reliable pollster; 5) boy, are those polls skewed.

I don’t think this is a unique function of Republican hysteria or reality denial, incidentally. It is a result of a few models and analyses favoring Obama right now. If they favored Romney, I’m sure we’d be hearing conspiracy theories from the Left.

(The reporting on polls is enough to drive you mad. The bias and misunderstanding of how polls and statistics work would be stunning if I didn’t think it was deliberate. To illustrate how this goes, imagine that Romney and Obama are tied for the purple state of New Ubekibekistanstan. On one day, five polls come out that read like so:

Poll Palace: Tied
We R Polls: Tied
Polls R Us: Romney +1
Republican Poll Man: Romney +2
Liberal Poll Dudes: Obama +3

That’s a tie. But guess which ones the liberal blogs will talk about? Guess which ones the conservative ones will? This is how alternative realities are created.

Then there’s the issues of “margin of error”. If a poll comes out showing Romney is leading New GOPland by three points with a three point margin of error, the liberal blogs will say it is essentially tied. But it’s not. 3+-3 means that it’s about 70% likely that Romney leads and it’s as statistically likely that Romney leads by 6 as it is tied.

Then you compound the two. Imagine New GOPland has three polls released:

Polls R Us: Romney +2 +- 3
We R Polls: Romney +5 +- 2
Poll Palace: Romney +8 +- 3

Assuming there are no biases, Romney actually has a solid lead: five points, give or take two. But the news media will say it’s tied.)

I should note that a big reason for the attention to polls is the null difference between the two candidates. If they really had major policy differences, we’d be talking about those. Romney supporters would be talking about how awesome his economic plan is and Obama supporters would be talking about how awesome the economy is. But because they are essentially the same man, we’re talking about polls.

And if we’re talking polls, we’re really talking about Nate Silver. Silver is one of several people who understand statistics and tries to incorporate all of the available data into an electoral projection. As of right now, Silver’s model projects Obama as a likely winner, although it is very close. Close enough that one week could shift it either way.

This has prompted a massive response from Romney supporters. Some of the criticism is legitimate. A lot of it is bullshit.

But his critics being full of crap doesn’t make Silver right. Silver came to fame with a dead-on projection of 2008. But 2008 was not a close election. It was, all things considered, a landslide for Obama. Only three states — North Carolina, Missouri and Indiana — were within 1% and Silver missed on Indiana (to be fair, Silver gives probabilities not certainties and getting two out of three coin flips right is just fine). 2012 is going to much closer. And I dare say this will be the real test of Silver’s abilities. Is he going to be proven dead on again? Or will his model be spectacularly wrong?

This year is reminding me an awful lot of Election 2000. It’s not just because of the closeness and the likelihood of an electoral college-popular vote split; it’s because that was the first time an attempt to model the electoral outcome was done. And, as the Wayback machine reminds us, it failed spectacularly. Real Clear Politics predicted Bush would win by 10 points in the popular vote and with an electoral landslide of 446-92. That … didn’t happen.

I remember the events very clearly. My advisor tipped me to the RCP site as evidence that the media were ignoring Bush’s pending win. But I also remember being highly skeptical. because it seemed to me they were going overboard to try to make Bush win, constantly putting states in “definite Bush” but very few in “definite Gore”.

(Of course, that may have been my natural pessimism: I was a Bush supporter and RCP’s projection seemed too good to be true. If I were supporting Obama this year, I’m sure I would have convinced myself that Silver is wrong in his analysis.)

Here’s a breakdown of how RCP went wrong:

States Bush Would Win: Alaska, Utah, Idaho, Montana, Nebraska, Kansas, Oklahoma, Texas, South Dakota, North Dakota, Wyoming, Colorado, Indiana, Arizona, Virginia, South Carolina, Alabama, Mississippi, Georgia, Louisiana, Kentucky, Ohio, North Carolina. They also had Nevada as a probably win. Bush did win all of these and most of them were not close. Ohio, now a swing state, went to Bush by 170,000 votes. That was not really the problem. The problem was:

States Gore Would Win: DC, New York, Hawaii, Massachusetts, Rhode Island with Connecticut as a probable win. These were the only states they had as definite Gore. California, Maryland, Washington — these were not seen as definite Gore states. And it was this bias that I was subconsciously picking up: not that they overestimated Bush’ performance, but they under-estimated Gore’s, refusing to accept that people would vote for him. They seriously had Gore polling at 42% nationally. Given the popularity of Clinton and the state of the economy, that was absurd.

Leans Bush: They correctly called Missouri, New Hampshire, Florida, Arkansas, Tennessee and West Virginia. They also had New Mexico and Oregon, which went to Gore but were cose. But Washington? Michigan? Pennsylvania? Maine? Gore won them all by 5 or 6 points.

Leans Gore: Maryland and Vermont. Again, we see a reluctance to put things in Gore’s column. Gore won both by double digits. The idea that Maryland “leaned” was laughable.

Slight Bush: Delaware, Iowa, Minnesota, Illinois, California All were easy wins for Gore. Only Minnesota was within shouting distance.

Slight Gore: New Jersey Another huge win for Gore.

We can see that it wasn’t just that RCP was wrong; they were wrong everywhere, systematically and massively underestimating Gore’s support.

So what happened? And does this mean we should point and laugh at projections for this year?

Well, first of all, RCP way over-estimated Ralph Nader’s influence. This may sound strange to Democrats still bitter about 2000, but RCP estimated Nader at 5.7%, over twice as well as he actually performed. And almost all of his supposed voters went to Gore. This not only skewed the popular vote, it massively skewed the vote in blue states like California.

Second, Bush eventually underperformed the polls by three points. Ted Frank makes the case that this was because of the November Surprise of Bush’s drunk driving arrest. While that’s possible — I thought so at the time — I’m less convinced now. When you get into the last days of the election, most people have decided. I really doubt this shifted the national polls by three points in three days, which is a *very* large and *very* rapid shift so late in the game.

In the end, I think it was all of the above: they overestimated Nader’s support, the polls shifted late and RCP had a bit of a bias. But I also think RCP was simply ahead of its time. In 2000, we simply did not have the relentless national and state level polls we have now. And we did not have the kind of information that can tease out the subtle biases and nuances that Nate Silver can.

Ah, Nate Silver. We keep circling back to him. So what do I think? Is Silver going to be sitting pretty on November 7 or will he have egg on his face?

I don’t know.

I think he’s doing the best job he can, given the difficulty of the data. But when the election is this close, you’re straining the ability of even the most careful analyst to predict the future. I think it’s possible that he will miss. But it’s not because he’s biased or stupid. It’s simply because close elections are difficult to forecast. Even the smallest error — a 1% national offset in the popular vote — could have big implications for the final result. I simply find it hard to believe that any model can predict an election likely to be within the noise.

I will note that if Silver does miss badly, this does not make his critics right. We should never confused the process with the result. If Silver misses but some guy throwing darts an electoral college map gets it right, this does not mean dart-throwing is superior. It means that one guy got lucky and the other missed something.

My prediction? I don’t know. This feels like an electoral-popular split since Romney’s red-state support is stronger than Obama’s blue-state support. That may be my own bias playing up: I would love to watch the pundits argue 180 degrees from where they were in 2000 and I would love to see the President, whoever he is, weakened to the point where Congress takes the lead on solving our budget woes.

But right now, no result would surprise me. There’s nine days left. There’s a massive hurricane bearing down (natural disasters can hurt incumbents and I expect the GOP to say Obama’s response is incompetent no matter what). Job numbers have yet to come out. Some football teams have yet to play.

To be honest: I just want it to be over, one way or the other. I’m tired of it. I’m tired of one side or the other quoting whichever poll most favors them. I’m tired of the bullshit gotchyas. I’m tired of being bashed from one side as an Obama bootlicker and the other as a secret Romney supporter. I’m tired of everything having a political implication.

Hopefully, in a little over a week, we can start getting back to policy and ideas and things that really matter.