Tuesday, October 5, 2010

Are Polls Junk?

Given the sheer volume of polling data made public every day, and given that there sometimes seems to be a lot of variation in what the polls say about individual races, it is not surprising to hear doubts expressed about the overall quality of political polling.  Charlie Cook, for instance recently expressed  a preference for candidate or party-sponsored polls and a skepticism of "independent" polling, even referring to academic polls and those sponsored by local media as "dime store junk."  At the same time, though, polls are pretty useful in predicting outcomes.  In earlier posts, I've shown that both House and Senate outcomes track very closely with aggregated polls, and Nate Silver also touts the predictive power of polls, using them as a central component in his forecasting models.

My point in this post is not to sort out which types of polls are more accurate than others, though that might come in a later post.  Instead, I'm interested in looking at the average rate of polling error.  Although there are a lot of things one could look at to judge the quality of a poll (question wording, sampling method, etc.), my guess is that most people focus on the bottom--how close the poll finding is to the election outcome--when judging trial-heat polls.  For better or worse, this is likely to be the case.  Toward that end, I examine the error in Senate, House, and gubernatorial polls taken in the last fifteen days of the 2006 and 2008 campaigns.  

The histograms below show the distribution of polling error across offices and years.  In all of the figures, I compare the Democratic percent of the two-party vote result to the Democratic percent of the two-party poll result.   Data for all of these graphs come from pollster.com.


To be sure, there are differences across years and across level of office, but the overall picture is not one of wildly inaccurate polls.   Some of the key findings are:
  • The statewide polls in Senate and gubernatorial elections are generally more accurate than district-level House polls.  House polls in 2008 had the highest error rate.
  • Most polls (90% of Senate polls, 88% of gubernatorial polls, and 81% of house polls) are within five points of the actual outcome.
  • Overall, the election outcomes fall within margin of error of individual polls 79% of the time.  Theoretically, this should happen 95% of the time. 
One thing to bear in mind, though, when looking at these error patterns, is that the deck is stacked somewhat against at least some of the polls.  I used a fifteen day window to make sure I had a large number of polls for each year.  One consequence of this is that many of the polls are being compare to an outcome that occured several days after they are conducted.  Assuming that there is at least some real movement in vote intention during this time period, one might expect a greater discrepancy between polls and votes than what is reported above.