Editore"s Note
Tilting at Windmills

Email Newsletter icon, E-mail Newsletter icon, Email List icon, E-mail List icon Sign up for Free News & Updates

August 13, 2008
By: Kevin Drum

MARGIN OF ERROR....Teagan Goddard sez:

According to a new Pew Research poll, Sen. Barack Obama's national lead over Sen. John McCain has disappeared. The race is now a statistical tie, with Obama barely edging McCain, 46% to 43%.

This comes via Nick Beaudrot, who claims not only that this is wrong, but that you can go ask Kevin Drum if you don't believe him. And it's true. I don't think the "statistical tie" trope is ever going to go away, but that still doesn't make it right.

I originally wrote about this back in 2004, but here it is again. The idea of a "statistical tie" is based on the theory that (a) statistical results are credible only if they are at least 95% certain to be accurate, and (b) any lead less than the MOE is less than 95% certain.

There are two problems with this: first, 95% is not some kind of magic cutoff point, and second, the idea that the MOE represents 95% certainty is wrong anyway. A poll's MOE does represent a 95% confidence interval for each individual's percentage, but it doesn't represent a 95% confidence for the difference between the two, and that's what we're really interested in.

In fact, what we're really interested in is the probability that the difference is greater than zero — in other words, that one candidate is genuinely ahead of the other. But this probability isn't a cutoff, it's a continuum: the bigger the lead, the more likely that someone is ahead and that the result isn't just a polling fluke. So instead of lazily reporting any result within the MOE as a "tie," which is statistically wrong anyway, it would be more informative to just go ahead and tell us how probable it is that a candidate is really ahead. Here's a table that gives you the answer to within a point or two:

So in the poll quoted above, how probable is it that Obama is really ahead? Pew contacted 2414 registered voters, which means the MOE of the poll is about 2%, and they report that Obama's lead is 3 percentage points. So go to the top row and then read the number from the 3% column. Answer: there's a 93% probability that Obama is genuinely ahead of McCain (i.e., that his lead in the poll isn't just due to sampling error).

Generally speaking, national polls use sample sizes of about 1,100, which translates to an MOE of 3%. State polls often use a sample of 600, which produces an MOE of 4%. Subsets of polls sometimes have MOEs of 5% or higher.

Now, there are plenty of reasons other than sampling error to take polls with a grain of salt: they're just snapshots in time, the results are often sensitive to question wording or question ordering, it's increasingly hard to get representative samples these days, etc. etc. But from a pure statistical standpoint, a lead is a lead and it's always better to be ahead than behind.

ACKNOWLEDGMENTS: Thanks to Nancy Carter and Neil Schwertman, Professors of Mathematics and Statistics at California State University, Chico, for providing me with the formulas used to generate the table and the spreadsheet.

Kevin Drum 6:31 PM Permalink | Trackbacks | Comments (35)

Bookmark and Share
 
Comments

I think I'll vote for Pat Paulsen.

Posted by: The Conservative Deflator on August 13, 2008 at 6:40 PM | PERMALINK

Maybe more to the point, why should anybody care a whit about national polling data when the president isn't elected by popular vote?

Posted by: junebug on August 13, 2008 at 6:42 PM | PERMALINK

Great post. Maybe this will put the lie to the notion that the Pew results mean Obama and McCain are statistically tied. They aren't. Too bad, bad reporters are going to continue spreading the word.

Posted by: Ron Byers on August 13, 2008 at 6:44 PM | PERMALINK

Go read the Pew poll at http://pewresearch.org/pubs/924/presidential-race-draws-even

Their narrative starts out stating that Obama's lead has disappeared. It doesn't say they are statistically tied.

The breakdown in voters preferences is worth taking a note of and shows us who we need to target so we can win this fall.

Posted by: optical weenie on August 13, 2008 at 6:46 PM | PERMALINK

I'm no statistician, but it seems to me that if there's going to be an error in reporting, it's better if they make Obama's lead appear less secure. This makes overconfidence less likely, and that can only be good for all of us.

Posted by: thersites on August 13, 2008 at 6:50 PM | PERMALINK

is this graph based on this equation?

Posted by: bend on August 13, 2008 at 7:02 PM | PERMALINK

I don't know why we are fixated on the MOE, or more specifically the "sampling margin of error." That is, it just measures the error introduced by the fact that your sample doesn't perfectly reflect the population. Maybe we use it because we can measure it.

But because all the big pollers use different methodologies for choosing and tweaking their samples, if you look at all the polls, this cancels out and becomes irrelevant.

More important, the other sources of error swamp MOE. As you say, "often sensitive to question wording or question ordering." In fact, they are by and large more important than MOE. Talking about MOE all the time makes us forget this.

Posted by: anandine on August 13, 2008 at 7:07 PM | PERMALINK

Talking about MOE all the time makes us forget this.

Curly and Larry say the same thing.

Posted by: thersites on August 13, 2008 at 7:10 PM | PERMALINK

Talking Points Memo seems to be particularly addicted to this fatuous "statistical tie" nonsense. I once tried explaining it to them in an email, as I am sure many other have too, but they just carry right on - I guess innumeracy is no longer something to be embarrassed about.

Posted by: Anonymous on August 13, 2008 at 7:16 PM | PERMALINK

Well, this may help:

On top right now at Drudge: Colin Powell will endorse Obama at the convention.

We've got to fight the new swiftboating - by the same creep!

Posted by: Neil B. ☼ on August 13, 2008 at 7:17 PM | PERMALINK

God I hate this polling crapola. Earlier today I saw reference to that Pew Poll (aptly named) at talkingpointsmemo and it said 47 - 42 which is basically 1 point different from the prior poll of 48 - 40.

Posted by: ckelly on August 13, 2008 at 7:18 PM | PERMALINK

BTW, "statistically tied" is supposed to mean, the separation is no more than the margin of error. So if e.g. the margin of error was 4% and one candidate goes from being 7% ahead to only 3% ahead, that candidate goes from being "ahead" to being "statistically tied" with the other candidate. It's not just a trick or a way to help McCain, etc.

Posted by: Neil B ♪ ♫ on August 13, 2008 at 7:19 PM | PERMALINK

Neil B - did you even read Kevin's post? As I said above, innumeracy is obviously no longer something to be embarrassed about.

Posted by: Anonymous on August 13, 2008 at 7:23 PM | PERMALINK

I actually heard the polling guy from Pew refer to the result as a "statistical tie" on NPR this afternoon.

Posted by: bdbd on August 13, 2008 at 7:31 PM | PERMALINK

Neil, yes, we know that that's how people use it.

Those people are wrong.

That's the entire point of Kevin's post.

In the future, you really, really should read the post before you respond to it.

Posted by: on August 13, 2008 at 7:32 PM | PERMALINK

part 2. It was Andrew Kohut of Pew who referred to the new poll results (46-43) as a "statistical tie" -- here is the story/interview on NPR ATC this afternoon (titled "Obama losing ground" or something like that, with lead graf saying "candidates are now even" !!!) http://www.npr.org/templates/story/story.php?storyId=93575211

Posted by: bdbd on August 13, 2008 at 7:35 PM | PERMALINK

Thanks, it is so tiresome hearing the daily polling, and I feel orwellian half the time.
I not only grow weary, it seems overdue for the Bushies to drum up support for duct tape.
Seeing Tom Ridge in the news with McSame makes me think we will be seeing colors soon.
What a bunch of enablers.

Posted by: consider wisely always on August 13, 2008 at 7:36 PM | PERMALINK

The other issue is that when you have results from many different polls using different samples and similar but not identical questions and they all return similar results, then margins within the MOE of any one poll are much more likely to be significant.

Posted by: tanstaafl on August 13, 2008 at 7:38 PM | PERMALINK

Yeah I'm springing around and being a bit careless, but that conventional wisdom isn't simply plain wrong either. The chances of really being ahead are of course proportional, they couldn't suddenly change right at the boundary. But notice that even when the point difference is *equal* to the MOE, the combined confidence of being ahead is only 84% (because if superposed standard distributions are separated by 95% the ebbing slopes still intersect rather much.) Well guess what - that percentage is even less than the 95% certainty level that defines "reasonable certainty" as a convention, and so by that standard shouldn't be called "being ahead" - in fact, even being one point ahead of the MOE doesn't usually cut it! Oh, the irony ...but I concede that "statistically tied" isn't a good way to put it.

BTW don't be too hopeful yet about the Colin Powell "endorsement", and it may be a trick to raise and then deflate expectations for Obama:

http://andrewsullivan.theatlantic.com/the_daily_dish/2008/08/a-kristol-bluff.html

Posted by: Neil B. on August 13, 2008 at 7:40 PM | PERMALINK

What I love is that the poll of polls has been fixed since June.

Obama up 3 - 5%. End of story.

This election is basically people over 60 versus people under 40. It's not going to change much because it has nothing to do with politics. It's a census, not an election. Most people are not political bloggers. Most people are voting personality and identity, not issues.

Posted by: Adam on August 13, 2008 at 7:40 PM | PERMALINK

I agree that the so-called Powell endorsement is fraught with failure in the form of drowning in hypothetical waters. It kind of dropped in the news tonight like bird poop

Posted by: consider wisely always on August 13, 2008 at 8:12 PM | PERMALINK

Just counting the sampling error factor, you can bet that many of the polls we've seen this year are outside their MOE merely because of which folks got sampled.

If you take 100 polls at a 95% confidence interval, you would expect 5 to be wrong by more than the MOE - and how many polls have we seen so far?

Posted by: MobiusKlein on August 13, 2008 at 8:33 PM | PERMALINK

We need to remember, polls are only statistical ties when the democrat is in the lead.

Posted by: Keith Frohreich on August 13, 2008 at 9:05 PM | PERMALINK

For months I've believed Obama would win the election simply because Americans were so sick of Bush and the Republicans.

I've changed my mind. Obama is coming to be seen as vacuous as he really is. He's a "nice guy," I guess. And I'll still vote for him. And if we're lucky, and he wins, he'll bring some sharpies to Washington with him. That's what I'm counting on.

But, I have even less enthusiasm for Obama than I had for Kerry.

Posted by: alibubba on August 13, 2008 at 9:30 PM | PERMALINK

obama is just another sell-out dem. It doesn't matter if he is ahead in the polls or not, he represents the same interests.

EITHER WAY - THE FOLKS THAT BROUGHT US DUR CHIMPERFOR WIN

He has sold us out on every progressive issue he has initially supported - has a long history of it, going back to when he sold out the poor in Chicago -


But don't take my word for it, hear what they have to say about his Obama's Shameful Housing Record in Chicago.:

More obama lies:

*He sold us out on FISA

*Has gone from "Iraq war was wrong and a mistake" to "lets continue the war crimes and crimes against humanity for at least another year

*Advocates endless wars for Israel in the Middle East

*Does not support meaningful healthcare reform, just want to give handouts to large healthcare providers and insurance companies

*Has already indicated he will start the Social Security bamboozle tour again by promoting the lie that it is in "crisis"

*Instead of providing any meaningful leadership or answers to high concentrations of poverty and dysfunctional schools in urban communities with large populations of African Americans, he place the "personal responsibility" card, blaming victems of structural poverty

Margin of errors don't matter - neither does obama versus mccain - they dance for the same masters.

Posted by: on August 13, 2008 at 9:30 PM | PERMALINK

The constant thing in all the polls is that McCain is in the low 40's.

None of the sound and fury changes that.

Posted by: snoey on August 13, 2008 at 9:36 PM | PERMALINK

Hey Kevin,

In high school and early in college, students are exposed to their first encounter with "heavy" math - symbolic math, variables, functions, all that good stuff. Some love it, some muddle through, and not a few just hit a brick wall. Those in the latter category have to decide their major, and it sure ain't gonna be science or engineering, and this is the population of students from which comes our "journalism" majors.

The NY Times and other major papers have decent science reporters, but in my personal experience in encounters with people who are in the media, their mathematical acumen could be written on the head of a pin, in Cyrillic.

Kev, since you may occasionally meet political reporters personally, ask them if their reporting on poll margins of error is based on their understanding of Poissonian statistics, and in particular, the discrete quantization of the variance. I'll bet you they'll give you a look like a dog does when you make a funny noise, with the head cocked and one ear up, going "whaaaaaa?"

Posted by: Greg in FL on August 13, 2008 at 10:06 PM | PERMALINK

This election is basically people over 60 versus people under 40.
I guess that means I won't be voting. Imagine my surprise...

Snark aside, every generational generalization about boomers like me has been wrong. What makes you suppose that it's different this time?

Posted by: thersites on August 13, 2008 at 10:11 PM | PERMALINK

if e.g. the margin of error was 4% and one candidate goes from being 7% ahead to only 3% ahead, that candidate goes from being "ahead" to being "statistically tied" with the other candidate.

Assuming the validity of 'statistically tied' for a moment, this would be wrong. The MOE of the difference between two statistics is bigger than the MOE of either one, unless they've got a strong positive correlation.

If two statistics with the same MOE are independent of one another, then you get the MOE of a difference by multiplying the individual MOEs by sqrt(2). But if they're as strongly negatively correlated as polls of two-candidate contests tend to be, then the MOE of a difference is more like 1.8 or 1.9 times the individual MOE.

The formula is MOE(diff) = sqrt[MOE1^2 + MOE2^2 -2*r*MOE1*MOE2].

It's a straightforward modification of formula (3) on page G-12 of this 341-page Census PDF.

Posted by: low-tech cyclist on August 13, 2008 at 10:27 PM | PERMALINK

Forgot to add: r = correlation coefficient. For non-math types, -1 =

Positive correlation: when one stat increases, the other is more likely than not to increase. r>0. If they move up or down in lockstep, r=1.


Negative correlation: when one stat increases, the other is more likely than not to decrease. r

If they're independent of one another, then r=0.

Posted by: low-tech cyclist on August 13, 2008 at 10:32 PM | PERMALINK

Crap. Forgot about the effect of angle brackets in a HTML environment. Too tired to fix now.

Posted by: low-tech cyclist on August 13, 2008 at 10:34 PM | PERMALINK

If still there LTC, I forgot to think about MOEs versus "half-margins" i.e. if "4 % MOE" that really means 2% on each side, right? That changes the issue of whether the difference between candidates is inside "MOE" or not. But in any case you are right that we need to take into account the negative correlation and not treat preference statements them as independently selectable events (like people could just as easily say "A" to one pollster as "B" to another.) But if that's true, then is the table in the post really accurate? I'd want to think the statisticians would get it right, but after the episode of "Marilyn vos Savant and the three door problem" I'm not so sure.

Posted by: Neil B. on August 13, 2008 at 11:05 PM | PERMALINK

In other words, a 46%-43% split with a 3% margin means that it's just as likely that it's at 49%-40% as it is 43%-43%, and probably somewhere in between, with obama winning.

Posted by: rea on August 14, 2008 at 7:33 AM | PERMALINK

I forgot to think about MOEs versus "half-margins" i.e. if "4 % MOE" that really means 2% on each side, right?

No, MOE is radius, not diameter - in other words, if the estimate is 46% and the MOE is 4%, then the 'true' value has a 95% chance of being between 42% and 50%.

Posted by: low-tech cyclist on August 14, 2008 at 9:42 AM | PERMALINK

There is really no good reason to be quibbling over the statistics of cherry-picked individual polls, which involve all sorts of hidden assumptions, when so many polls are being taken. A scatter diagram, like this one

http://pollkatz.homestead.com/files/approval-data_files/zzzmainGRAPHICS_14808_image001.gif

for Bush approval rating, gives a good idea of the variability of polls. Such a thing may exist for 2008 election preferences, but I don't know where to find it.

Posted by: skeptonomist on August 14, 2008 at 10:01 AM | PERMALINK




 

 

Read Jonathan Rowe remembrance and articles
Email Newsletter icon, E-mail Newsletter icon, Email List icon, E-mail List icon Sign up for Free News & Updates

Advertise in WM



buy from Amazon and
support the Monthly