Tuesday, October 5, 2010

Are Polls Junk?

Given the sheer volume of polling data made public every day, and given that there sometimes seems to be a lot of variation in what the polls say about individual races, it is not surprising to hear doubts expressed about the overall quality of political polling.  Charlie Cook, for instance recently expressed  a preference for candidate or party-sponsored polls and a skepticism of "independent" polling, even referring to academic polls and those sponsored by local media as "dime store junk."  At the same time, though, polls are pretty useful in predicting outcomes.  In earlier posts, I've shown that both House and Senate outcomes track very closely with aggregated polls, and Nate Silver also touts the predictive power of polls, using them as a central component in his forecasting models.

My point in this post is not to sort out which types of polls are more accurate than others, though that might come in a later post.  Instead, I'm interested in looking at the average rate of polling error.  Although there are a lot of things one could look at to judge the quality of a poll (question wording, sampling method, etc.), my guess is that most people focus on the bottom--how close the poll finding is to the election outcome--when judging trial-heat polls.  For better or worse, this is likely to be the case.  Toward that end, I examine the error in Senate, House, and gubernatorial polls taken in the last fifteen days of the 2006 and 2008 campaigns.  

The histograms below show the distribution of polling error across offices and years.  In all of the figures, I compare the Democratic percent of the two-party vote result to the Democratic percent of the two-party poll result.   Data for all of these graphs come from pollster.com.

To be sure, there are differences across years and across level of office, but the overall picture is not one of wildly inaccurate polls.   Some of the key findings are:
  • The statewide polls in Senate and gubernatorial elections are generally more accurate than district-level House polls.  House polls in 2008 had the highest error rate.
  • Most polls (90% of Senate polls, 88% of gubernatorial polls, and 81% of house polls) are within five points of the actual outcome.
  • Overall, the election outcomes fall within margin of error of individual polls 79% of the time.  Theoretically, this should happen 95% of the time. 
One thing to bear in mind, though, when looking at these error patterns, is that the deck is stacked somewhat against at least some of the polls.  I used a fifteen day window to make sure I had a large number of polls for each year.  One consequence of this is that many of the polls are being compare to an outcome that occured several days after they are conducted.  Assuming that there is at least some real movement in vote intention during this time period, one might expect a greater discrepancy between polls and votes than what is reported above.


  1. With respect to House Polling, what do the numbers show with respect to Incumbents under 50?

  2. So if I understand this, most senate polls are within 5% of the actual result (although you don't show that-you show election outcome). And in that table, there are 7 incorrect predictions, out of 36. That's a 20% error rate - pretty unreliable in my opinion. Most of these errors are in the closely contested races, where Dems were expected to garner 40-50% of the vote. A 5% error in that range for contested elections is the whole ball game. I don't care how close the polls are for the elections where the predicted winner wins with 75% of the vote. So what good are polls?

  3. Anon2: I'm not sure where you get the "7 incorrect predictions, out of 36." In fact, if you look at my earlier post you will see that, especially in senate races, almost all of the polls called the right winner. Even in close races, polls with a five-point error can still call the right winner by erring in favor of the winner.

  4. I've been doing some Bayesian modeling on Polling series (Jackman style), and the general result is that most pollsters have design effects of about 1.25. In other words, the standard deviation of poll results is about 24% higher then one would expect from pure sampling error.

    I found this just by looking at poll to poll variance and accounting for house effects and not looking at election results. It's comforting that it tracks your findings pretty well through an independent method.

  5. TH: I assumed the table at the top right was the data you were reporting on. I see that is actually the current predictions. Sorry - my error. BTW, the link to your earlier post doesn't work.