Editore"s Note
Tilting at Windmills

Email Newsletter icon, E-mail Newsletter icon, Email List icon, E-mail List icon Sign up for Free News & Updates

June 30, 2008
By: Kevin Drum

CORRELATION, CAUSATION, AND BLOGGING....Kieran Healy is annoyed:

You only have to hang around the world of social science research- or policy-related blogging for a few hours before you come across someone willing to snottily inform you, or some other luckless interlocutor, that although the finding of this or that paper may appeal to you, nevertheless don't you know that Correlation Is Not Causation. Often this seems to be the only thing they know about statistics.

I grudgingly admit that it's a plausible-sounding rule, and in the textbooks and stuff. But, to be honest, I read it too many times in various posts and comments threads the other day, and in my raging pique I found myself thinking that the next time it happened I would say, "That's completely backwards: in fact, causation is just correlation" and fling a copy of Hume's first Enquiry at their head. Or at the screen, I suppose, but that image is less satisfying, because now who's the crank on the internet, etc.

OK, he's kidding. Sort of. But not really. Bloggers are fond of yelling "correlation is not causation" at any piece of research that comes to a conclusion they find distasteful, but what they almost never do is actually read the paper in question, which invariably addresses most of their concerns: research methodology; alternate explanations; potential intervening variables; results of similar studies in the past; shortcomings in the data set; etc. That's not to say that researchers always take every possible problem seriously enough, or that social science papers don't deserve heightened scrutiny. But it is to say that if, in 30 seconds, some possible problem with the research program occurs to you, it's almost a dead certainty that the person with a PhD who performed the study also thought of the same thing. And discusses it in the paper.

At least, that's what's I've found on virtually every occasion when I've cracked open one of these things. The discussion isn't always great, and sometimes it leaves a variety of questions hanging, but it's almost always there. It's true that correlations don't always imply causation, especially if the research is poorly done or the statistical analysis is mangled, but it also turns out, surprisingly enough, that people with doctorates mostly understand this stuff almost as well as bloggers who read the New York Times.

Kevin Drum 11:44 AM Permalink | Trackbacks | Comments (34)
 
Comments

I first heard that (not those exact words, but the same idea) in a High School economics class in 1966.

All I did was ask if the economic expansion of the 1950's had anything to do with the huge increases in defense spending. I was not the teacher's pet.

Posted by: thersites on June 30, 2008 at 11:56 AM | PERMALINK

The other one to watch out for: "you can't prove a negative". While true, in the strictly formal logic sense, this is often a canard. In the infinitely more practical Bayesian logic sense, you can "prove a negative" by reducing its probability to a vanishingly small amount (with the appropriate studies). Thus, it informs maybe not the "truth" in a strict sense, but certainly what you should believe in light of the evidence.

Posted by: MLE on June 30, 2008 at 11:57 AM | PERMALINK

For many scientific papers, it is true that the authors do consider this question, and if they don't (and they don't always do this, unfortunately), much shame on the referees.

However, there is a distinction that needs to be made here- many bloggers, also, don't read the papers they cite as as evidence for their own personal interpretations, and they rarely, if ever, consider the question of cause and correlation. It is usually the difference between a writer and a scientist.

Posted by: Yancey Ward on June 30, 2008 at 12:02 PM | PERMALINK

Even though authors of the papers show that the affect is causal, only the best (and longest) pop science articles mention how they did so. The average news consumer doesn't care but bloggers do care.

Editors can't find room for caveats and nuance -- this both reflects and causes our ignorance of the scientific method. (Wow, dual causality?!)

Posted by: SM on June 30, 2008 at 12:03 PM | PERMALINK

"Thus, it informs maybe not the "truth" in a strict sense, but certainly what you should believe in light of the evidence."

Does the "truth" really even exist in the strict sense? It seems all theories in physics really come down to probability and not absolute certainty. It seems Bayesian logic is the only real way to approach modern science. The only absolute certainty we have is that we can never achieve absolute certainty. That may be discomforting to many, but it's the truth.

Posted by: fostert on June 30, 2008 at 12:18 PM | PERMALINK

I think it is important to also mention that one of the ways to ferret out what may be causation rather than simply correlation is to see if the author of the premise that event(s) A are the cause of event(s) B has provided a plausible (within the current understanding of how things work) mechanism connecting event(s) A to event(s) B. I think it is almost always, if not always, justifiable to be skeptical of a claim of causality that does not contain a plausible mechanism.

I loved the reference to Hume, but using his line of thinking to question a particular instance of a claim of causation is, I believe, misplaced. Hume's argument is a doomsday device as far as all claims are causation are concerned. You might win the argument in the particular but to do so you have to lay waste to all claims of causality - probably not what most people have in mind when the are concerned that a particular claim of causality is bogus.

Posted by: Ted King on June 30, 2008 at 12:23 PM | PERMALINK

No matter how cautious scientists are in their publications - and they have to be, given the way reviewers critique manuscripts - their caution in a scientific journal is no protection against the ways media outlets run with stories based on those publications. The popular accounts typically don't provide all the caveats the scientists do (or gloss over them if they're there) and end up sounding much more prescriptive than the original publications. Then NYT readers come along and read what the reporter said about the original finding, and it looks to them like the scientists were being cavalier with their interpretations.

As SM noted, it's tough to do nuance in a short newspaper article.

Posted by: Colin on June 30, 2008 at 12:28 PM | PERMALINK

OK, I won't suggest that lay people who don't read the papers can do any better -- but some fields of research require so much expertise that the authers and two or three referees sometimes aren't enough.

There's also a number of other factors involved in publication:

1) Many journals like Science and Nature are driven toward the new, exciting, and controversial. Findings are more likely to be proved spectacularly wrong after further study.
2) Citation rankings don't reflect how many times people bring up your argument as a counterpoint or essentially call you a moron. Getting hired or getting tenure is more likely to depend on the ranking alone because the rest of the department will not have the expertise or time to dig deeper. Controversy sells.
3) Authors are hesitant to emphasize (or mention) certain complexities or problems with techniques or theories. They don't want to crap on their own past work or the work of their colleagues and reviewers. There's a certain amount of salesmanship involved and in todays funding environment folks tend to do a weird dance of shameless promotion and anonymous back stabbing.
. . .

Suffice it to say that PhDs do not all have all the expertise we all wish they had and they are going to ignore or bury certain considerations that are not in their favor. If you're interested in the details, you'd be helped immensely by paying attention to the nature/history of the journal and the scientist.

Posted by: asdf on June 30, 2008 at 12:34 PM | PERMALINK

re: thersites 1st comment

The huge expansion of the 1950s wasn't.

The growth was relatively anemic and there were 2 recessions.

It looked like an expansion because we weren't getting killed in huge numbers and employment numbers beat the 30a and 40s all hollow.

Posted by: Jeffrey Davis on June 30, 2008 at 1:05 PM | PERMALINK

Call me crazy, but if we're talking public health policy, and there hasn't been a bunch of intervention studies to verify an epidemiologist's correlation then it's incredibly stupid to implement some plan assuming it's a causal relationship.

I understand it's used as a way to ignore things you don't like, but it's not a thing that can ever be undersold IMHO.

Posted by: J.W. Hamner on June 30, 2008 at 1:09 PM | PERMALINK

In 1972 I worked as a statistician in the NIH sponsored Lipid Research Clinics test of the lipid hypothesis. It was well known that high cholesterol (LDL) was associated with heart attacks (correlation) but not known if lowering LDL cholesterol would reduce their indicence (causation). The lipid hypothesis was this: lowering LDL cholesterol in men with high LDL cholesterol would lower the incidence of heart attack. The null hypothesis was that there would be no change in incidence. A double blind study was designed in which men with high levels of LDL cholesterol (Type II hyperlipidemia) between the ages of 35 and 60 were recruited via multi-clinics throughout the US and Canada. Half the men received a cholesterol lowering drug (cholestyramine) and half were given a placebo.

The first results came out six-seven years after the study began, and they clearly proved that incidence of heart attack drop as cholesterol levels are lowered, and the null hypothesis was rejected. If this study, or a similar one, had not been carried out we would still be wondering what impact that cholesterol lowering meds would have on adults with elevated cholesterol.

Posted by: Dilbert on June 30, 2008 at 1:34 PM | PERMALINK

I usually respond by pointing out that there is only a correlation between smoking and lung cancer, but causation has never been proven.

Posted by: Luther on June 30, 2008 at 1:34 PM | PERMALINK

The problem with misunderstanding statistics probably starts with out mathematics education. For the "average person" statistics and probability are far more important than a second year of of algebra or even geometry in giving people the ability to interpret information. An understanding of statistics gives people a handle to use when dealing with uncertainty or ambiguity, such as "how certain you are about something" or "how you can make decisions with a limited amount of information".

People will be better served and more tolerant once they know that a chance of error is always with us:
either Type 1 - saying it is so when it really isn't so [false positive]
or Type 2 - saying it isn't so when it actually is so [false negative]

and that a probability of .05 means that there is a 1 in 20 chance that a false negative/Type 2 error is possible. This is the level where one is justified in jumping to some conclusion.

For many people, all this goes above their heads, but it should be a part of their education.

And the rejoinder to "correlation doesn't mean causation" is "if there is no correlation, there is no causation". Meaning that finding a correlation is the starting point for further investigation and that one should be prepared for the possibility of finding something that may go against one's preconceptions.

Posted by: natural cynic on June 30, 2008 at 1:47 PM | PERMALINK

Bah, you philosopher wannabe scientists are wasting your time as usual. Practical people like engineers know that the relationship between correlation and causation is moot, because correlation is just an abstract mathematical concept that can't be measured. If you do try to measure it, the gremlins have fun changing things just as you're making a measurement.

Posted by: alex on June 30, 2008 at 1:54 PM | PERMALINK

Causation is impossible to prove. Ever. When someone says something "causes" something, what they mean is that there is a very high degree of probability/correlation.

Posted by: duffy on June 30, 2008 at 2:01 PM | PERMALINK

In other words, when you say something "causes" something, you are saying every time the one thing happened, this other thing happened also. But there is no way to be completely certain it will happen again that way. You can only be pretty certain.

Posted by: duffy on June 30, 2008 at 2:05 PM | PERMALINK

I usually respond by pointing out that there is only a correlation between smoking and lung cancer, but causation has never been proven.

My understanding is that this used to be the case, but in 1996 a mechanism at the cellular level was identified, here.

Posted by: RSA on June 30, 2008 at 2:08 PM | PERMALINK

And, like Kieran, many of those researchers, like yours truly, have read Hume.

Hammer: The problem is in the high degree of "looseness" in medical research vs. that in the natural sciences.

Everybody, and Kieran by osmosis: Hume also may not have meant what everybody claims he meant on the is/ought distinction, too.

Posted by: SocraticGadfly on June 30, 2008 at 2:23 PM | PERMALINK

When you say "bloggers" say this, by far the biggest source of this comes from modern feminist bloggers who will attack any study anytime anywhere that is reported in the paper. Just examine Broadsheet, or Echidne (who refers to Evolutionary Psychology as "Evo Psychos and claims most of them hate women (including presumably, the women Ph.Ds amongst them)) or Amanda or any of them.

Very rarely do they read the actual paper. Very rarely do they acknowledge that a news reporters summary is most likely inaccurate.

It's just attack any source of information that disagrees with their own pet theories.

Are we the reality based party? Not that I can tell.

Posted by: jerry on June 30, 2008 at 2:55 PM | PERMALINK

Actually, fundamental philosophic issues a la Hume about what causation really is, etc. is not the main problem with correlation and causation. The main problem is, many things can vary together coincidentally because there are just so many variables and effects around. Think, if A increased markedly since 1970 there is likely, among parameters B, C, D, E, ..., some parameter say "M" that increased at about the same time in roughly the same way but with no genuine causal connection. (And the latter means, if we could increase A at will it would force M to increase, no tail wagging dependency.) Not only that, but the "which way the causation goes" question remains even when we have a link established.

Posted by: Neil B on June 30, 2008 at 3:12 PM | PERMALINK

*

Posted by: mhr on June 30, 2008 at 3:19 PM | PERMALINK

mhr: Liberals often get confused over correlation/causation.

That's true. By contrast, conservatives eliminate the confusion by eschewing all empirical data.

Posted by: alex on June 30, 2008 at 3:42 PM | PERMALINK

I think too often that people have re-interpreted "correlation doesn't mean causation" to "correlation never means causation".

Posted by: BombIranForChrist on June 30, 2008 at 4:03 PM | PERMALINK

Philosophy aside, as a student in a methodologically intense social science graduate program, I can say that, having read hundreds of fancy econ, political science, and sociology papers, for the good ones

a) the authors always try to disentangle causation from correlation
and
b) they are almost always far too ready to claim causation when, sadly, many possible correlation explanations (ie, both X and Y caused by some unmeasured factor Z) remain un-disproved.

The best scholars I know decide for about 90% of the papers they read -- papers in the best journals in the field -- that the authors have failed to adequately demonstrate causation. It's not really their fault, though -- without reproducible experiments, causation is really hard to show.

Posted by: JD on June 30, 2008 at 5:12 PM | PERMALINK

"The best scholars I know decide for about 90% of the papers they read -- papers in the best journals in the field -- that the authors have failed to adequately demonstrate causation. It's not really their fault, though -- without reproducible experiments, causation is really hard to show."

If they are claiming causation and they have not disproven correlation, they are certainly in the wrong for having done so.

Posted by: jerry on June 30, 2008 at 5:44 PM | PERMALINK

I haven't seen a lot of these blog comments about causation; so I can't judge. But I have seen a lot of really dumb social science studies that try to prove that marriage causes financial success. I've also seen a lot of nonsense about correlations between marijuana use and all kinds of other things that rebellious kids try. More seriously, I've also seen a lot of nonsense about physical neurological states causing psychological states, when all that's ever demonstrated is a coincidence or parallel of the physical and psychological states.

Posted by: Gary Sugar on June 30, 2008 at 6:49 PM | PERMALINK

I almost always read the paper if it's online. Kevin is correct if no question of political correctness is involved. But if which way the arrow of causality points has an identity politics angle, then it's common for researchers to simply assume the politically correct chain of causation and not even mention the alternative.

Posted by: Steve Sailer on June 30, 2008 at 7:22 PM | PERMALINK

Kevin,

You had it right until you said

It's true that correlations don't always imply causation,

In fact, correlations DO imply causation, or at least suggest it. They simply don't show in which direction cause and effect might work, nor do they exclude other possible causes for the effects measured. The problem is that correlations do not PROVE causation.

Posted by: Rick B on June 30, 2008 at 11:03 PM | PERMALINK

Bloggers are fond of yelling "correlation is not causation" at any piece of research that comes to a conclusion they find distasteful, but what they almost never do is actually read the paper in question, which invariably addresses most of their concerns...

Kevin, you have highlighted an interesting dilemma. I believe it has to do with information overload and scarce time resources to do the proper research and analysis on the blogger's part. There may be exponentially increasing salient information available on short notice via the internet, but determining how salient the information is will take too much time because we are swamped with a larger proportion of "chaff" to sort through with respect to the kernels that can be gleaned. What's happening is that there are more needles to find in a much larger haystack.

Posted by: Doc at the Radar Station on June 30, 2008 at 11:19 PM | PERMALINK

Serendipitiously, this is the place to drop in today's factoid via Paul Kedrosky's blog:
mobile phone ownership leads to a decline in monthly tobacco consumption.http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1152484

Posted by: mcdruid on July 1, 2008 at 3:08 AM | PERMALINK

Uh, I think you messed this up:

"It's true that correlations don't always imply causation"

Actually I think the definition of "imply" means that it does - it may well be a misfire, but it implies that there is something there. That's why we try to correlate things in the first place, to know where to focus further examination of the subject in question.

It just doesn't show it, is what you were going for I think.

Posted by: doesn't matter on July 1, 2008 at 8:01 AM | PERMALINK

err, anotherwords what Rick B. and natural cynic said....

Sorry for the over-quick skimming.

However, note since all three of us said almost exactly the same thing, there is a hard correlation but I'm not sure there is any causation at all!

Posted by: doesn't matter on July 1, 2008 at 8:07 AM | PERMALINK

For many social science problems, a really strong (magnitude, not just significance) correlation is as good as you're going to get with our current state of knowledge. If you ignore correlation, you are going to be very ignorant about social problems. Randomized controlled field experiments are either not possible or not ethical for a lot of social science, and the underlying mechanisms that could show causation would be from the distant future of psychology or neurology.

Posted by: Safron on July 1, 2008 at 10:26 AM | PERMALINK

I've gotten an earful or two on this point over the years, as I've stuck to my hypothesis "Bush's approval is driven by gasoline prices." (Which has proven immensely robust over the years: click my name below to go to pollkatz.com.)

"Correlation does not imply causation" is a chant, not an argument. A is correlated with B exactly as much as B is correlated with A. Of course you can't infer much from that.

But -- you can ask yourself, "Which is the more plausible story: A implies B, B implies A, or A and B are consequences of the elusive C?"

And then there's Granger causality, which enjoyed a vogue in econometrics some years back (possibly continuing -- I haven't kept up). Granger reasoned that if A and B are correlated, but that in (specially formulated VAR) regressions A serves to explain the variance of B but B does not do the same for A, then as a practical matter we can say A causes B. That may not be up to philosophical snuff, but it sure has pragmatic appeal.

Posted by: Stuart Eugene Thiel on July 1, 2008 at 11:26 AM | PERMALINK
Post a comment









Remember personal info?










 

 
Email Newsletter icon, E-mail Newsletter icon, Email List icon, E-mail List icon Sign up for Free News & Updates

Advertise in WM

Advertise in College Guide






Search Now:
In Association with Amazon.com


Place Your Link Here

---Paid Advertisements---

Payday Loans

Personal Loans

Addiction Treatment

Phone Cards

Less Debt = Financial Freedom

Addiction Treatment Programs

Credit Cards & Debt Consolidation

Bad Credit Loans

Vacation Rentals