Monday, September 27, 2010

Poll-Based Forecasting Models

In my previous post I showed that individual House election outcomes tracked fairly well with district-level poll results.  The implication of this is that district-level polls can be used to forecast House outcomes at the district level.

To get a better sense of this I've generated out-of-sample (OOS) election predictions for 2006 and 2008 based on district-level poll results.  So, for instance,  I used OLS estimates from the relationship between the 2006 polling averages (over the last forty-five days of the campaign) and the actual 2006 House outcomes to predict the 2008 outcomes, based on the 2008 polling values.  For 2006 I used estimates from the relationship between polls and votes in 2008 to do a backwards prediction (a "postdiction," I suppose) of the 2006 outcomes. The idea here is to see how well outcomes in one year can be predicted using parameter estimates from another year.

Here's how closely the OOS predicted values followed the actual House election outcomes for 2006 and 2008, pooled:

The correlation between predicted and actual outcomes is .89, and 83% of all outcomes (win/lose) are called correctly (using just the point estimate), compared to just 52% using the modal outcome. The important thing to remember here is that this figure illustrates the accuracy of predictions for outcomes in one year based on the relationship between polls and votes in another year.  

The same method is used for Senate elections from 2006 and 2008 (pooled), with similar but stronger results:

In this case the correlation between the OOS predictions and actual votes is .97, and in only one out of 63 contests was the outcome (win/lose) called incorrectly.  To be sure,  in both the Senate and House contests a lot of the predicted votes hover right around 50% and the contests could easily tip one way or another.  But using the simple point estimates generates pretty fair predictions of winners and losers.

So, what does this tell us about the 2010 outcomes?  It's not very complicated. Generally speaking those who maintain a lead in the polls during the fall campaign generally go on to win. This is especially clear in Senate races.

 Estimates based on regression models of the relationship between polls and outcomes from other election years (2006 and 2008) should give us a pretty good basis for predicting the 2010 outcomes.  This is a very simple approach and may suffer somewhat from not taking into account a whole host of factors that might, for instance, reflect expected accuracy of the polling organizations, or district-specific characteristics that may play an important role in 2010.  But that's sort of the point--to provide a quick and simple way to predict outcomes.

Toward that end I've posted forecasting boxes on the right side column of the blog.  Right now, I have the forecasts for Senate elections and will post the House forecasts as soon as I'm able to gather the existing 2010 district-level polls (hey, if you've already gathered these data I'd be glad to borrow them from you!).

Important note: these forecasts are based on current information and will change as new data come in.  Be sure to stop back and take a look every once in a while.

Tuesday, September 21, 2010

Followers of politics these days are inundated with poll numbers, mostly of the horse-race variety, routinely reported at a couple of my favorite sites, politicalwire.com and pollster.com, as well as an number of other places.  Although the national generic ballot is given some attention due to its predictive capacity, we know relatively little about how trial-heat polls in individual races can be used to predict district-level outcomes.  If we assume the vast share of congressional districts are easily predicted, based on incumbency and party strength, then trial-heat polls could be a valuable tool in predicting outcomes in the "toss-up" and "leaning" districts.

Dan Hopkins' post last week on (the lack of) partisan bias in pre-election polling gives us some sense of the overall accuracy of trial-heat polls, albeit in Senate and Gubernatorial races.  What I want to look at today is how well district-level trial-heat polls perform in U.S. House elections, with an eye toward eventually using them to predict outcomes in individual races.

Using data  provided by the folks at pollster.com, I've taken average poll readings over the last forty-five days of the campaign from 91 contests in 2006 and 89 contests in 2008.  The figure below shows the relationship between the averages and the eventual outcome.  In both years, there is a strong relationship, suggesting that poll leaders generally go on to win House elections.


In fact, in terms of the bottom line, across both years 85% of those candidates whose average poll share was greater than 50% went on to win their election.  

Do  pre-election House polls track better with outcomes as election day approaches?  Yes, but not always very much.




Certainly it is the case that in 2006 the polls became better predictors of final outcomes as election day approached.  In 2008, however, there wasn't a very steep increase in the predictive accuracy of polls as the election drew nearer.  More to the point, the overall correlation across the forty-five days prior to the election (first figure) is virtually identical to the correlation found for the last two-weeks of polling.  Also (though recognizing the cases are different due to restricting analysis to districts with polls in the last two weeks), only 77% of poll leaders (combining both years) in the last two weeks of the campaign went on to win their elections, compared to 85% using the forty-five day average.


Although these data illustrate a fairly obvious point--that candidates who lead in the polls tend to win-- it is comforting to see the strong connection between polls and outcomes, especially given that the data come from many different types of organizations, polling in a variety of different local contexts.  No doubt, taking some of those factors into account might shed light on conditions that make polls better or worse predictors of outcomes.  But that will have to wait for another day.