Editore"s Note
Tilting at Windmills

Email Newsletter icon, E-mail Newsletter icon, Email List icon, E-mail List icon Sign up for Free News & Updates

September 20, 2006
By: Kevin Drum

LIES, DAMN LIES, AND....Via Kieran Healy, here's something way off the beaten path: a new paper by Alan Gerber and Neil Malhotra titled "Can political science literatures be believed? A study of publication bias in the APSR and the AJPS." It is, at first glance, just what it says it is: a study of publication bias, the tendency of academic journals to publish studies that find positive results but not to publish studies that fail to find results. The reason this is a problem is that it makes positive results look more positive than they really are. If two researchers do a study, and one finds a significant result (say, tall people earn more money than short people) while the other finds nothing, seeing both studies will make you skeptical of the first paper's result. But if the only paper you see is the first one, you'll probably think there's something to it.

The chart on the right shows G&M's basic result. In statistics jargon, a significant result is anything with a "z-score" higher than 1.96, and if journals accepted articles based solely on the quality of the work, with no regard to z-scores, you'd expect the z-score of studies to resemble a bell curve. But that's not what Gerber and Malhotra found. Above a z-score of 1.96, the results fit the bell curve pretty well, but below a z-score of 1.96 there are far fewer studies than you'd expect. Apparently, studies that fail to show significant results have a hard time getting published.

So far, this is unsurprising. Publication bias is a well-known and widely studied effect, and it would be surprising if G&M hadn't found evidence of it. But take a closer look at the graph. In particular, take a look at the two bars directly adjacent to the magic number of 1.96. That's kind of funny, isn't it? They should be roughly the same height, but they aren't even close. There are a lot of studies that just barely show significant results, and there are hardly any that fall just barely short of significance. There's a pretty obvious conclusion here, and it has nothing to do with publication bias: data is being massaged on wide scale. A lot of researchers who almost find significant results are fiddling with the data to get themselves just over the line into significance.

And looky here. In properly sober language, that's exactly what G&M say:

It is sometimes suggested that insignificant findings end up in file drawers, but we observe many results with z-statistics between zero and the critical value. There is, however, no way to know how many studies are missing. If scholars tweak regression specifications and samples to move barely insignificant results above conventional thresholds, then there may be many z-statistics below the critical value, but an inordinate number barely above the critical value and not very many barely below it. We see that pattern in the data.

We see that pattern in the data. Message to political science professors: you are being watched. And if you report results just barely above the significance level, we want to see your work.

Class dismissed.

Kevin Drum 12:12 AM Permalink | Trackbacks | Comments (84)

Bookmark and Share
 
Comments

Al?

Posted by: dr sardonicus on September 20, 2006 at 12:19 AM | PERMALINK

Insightful post, Kevin. A prime example of why your blog is one of the most interesting reads on the internet.

Posted by: McCord on September 20, 2006 at 12:20 AM | PERMALINK

It's just this type nuanced nit-picking that gives Commandante Bush(and his legion of zombies)a headache. In a black and white world, why would anyone want to know that things are insignificant?

Posted by: Michael7843853 G-O in 08! on September 20, 2006 at 12:34 AM | PERMALINK

Hint: this doesn't just apply to political science.

Posted by: billjones on September 20, 2006 at 12:39 AM | PERMALINK

(Caveat: I haven't read the article yet.)

The assumption that just-barely-significant results tend to come from "fiddling with the data" is unjustified. While some results may be borne of statistical trickery, it's just as likely that many of them come from an alteration of the research questions or hypotheses to account for limitations of the data. Although this is something that's often frowned upon by researchers, particularly heavy theory types, it is not necessarily something to be alarmed about in terms of publication bias.

As an example, two papers I recently co-authored looked at the effects of ethical and strategic frames in the presentation of stem cell news. We looked first at the difference between each frame and the control condition in which neither was present. What we found there approached significance, but didn't quite get there, most likely due to our sample size. However, when we compared the frames with each other directly, we did find significant results, as we did also when we examined a particular sub-group (in this case it happened to be blog readers) in the frame vs. control analysis. The papers reflected this -- the hypotheses we ultimately tested and discussed dealt with frame vs. frame in the full sample and frame vs. control in the sub-sample. They'll go out for submission next month, so I don't yet if or where they'll be published, but they are certainly not unrepresentative of how quantitative social science research is produced.

Posted by: Aaron S. Veenstra on September 20, 2006 at 12:41 AM | PERMALINK

This is what the corporatization of the universities gets you. Faculty are pressured to publish so that they are more competive fro grant funding. To get published you have to show significant findings. Voila! This will come as no surprise in academia. As long as the reward system is QUANTITY of articles published in peer reviewed journals we will continue to get this. The publish or perish system is way overdue for an overhaul...and don't even get me started on the proliferation of crappy journals!

Posted by: eb on September 20, 2006 at 12:46 AM | PERMALINK

People study stuff? Wow!

Posted by: Al on September 20, 2006 at 12:48 AM | PERMALINK

Is "political science" an oxymoron?

Posted by: craigie on September 20, 2006 at 1:08 AM | PERMALINK

That one is a toss-up craigie, the case can be made either way. My faviorites are still "jumbo shrimp" and "student teacher."

And with the jokes, I say goodnight. I have been up since five yesterday morning, and I am freakin' exhausted.

See you tommorrow.

Posted by: Global Citizen on September 20, 2006 at 1:13 AM | PERMALINK

[caveat: didn't read the article, probably never will] Formal hypothesis testing within a classical statistical framework meeting the litany of normative assumptions is flawed for the simple reason that it assumes the absence of a priori information, which is in fact false. We have nearly 200,000 years of human prehistory (hence knowledge) to presage our conclusions. Unforetunately, Bayes theorem doesn't come equipped with a tidy z-score or "pee" value to hinge our reported results.

Posted by: blackhawk on September 20, 2006 at 1:15 AM | PERMALINK

blackhawk, I once heard a lecture by a famous statistician named Tukey. He said the widely used 95% standard wasn't always the right one, for the reasons you give. However, he saw value in requring that the research had a properly done, properly vetted statistical study. There's a lot of bad statistical work done by people who aren't experts in the field.

He was talking mostly about medical research. I don't know whether his conclusions would apply to social science research.

When medical research includes statistical analysis, it can't get published unless a qualified statistician signs off on the statistical analysis. Does anyone know whether social science journals have a similar requirement?

Posted by: ex-liberal on September 20, 2006 at 1:40 AM | PERMALINK

I'm seeing another pattern here, and it ain't "pretty".

Posted by: KathyF on September 20, 2006 at 2:17 AM | PERMALINK

What we really need is more publication space - more pages - and less file drawers.

Articles pulished should be checked upon those file drawers - because many in the drawer can outweigh the one that shows a result.

IE, Publication needs to be more scientific, and less money. We need to fund unsuccessful research, because more is unsuccessful than successful!

Posted by: Crissa on September 20, 2006 at 3:30 AM | PERMALINK

Good post, Kevin.

Of course, nobody wants to publish insignificant results. There was (maybe still is) in the psychology field a journal called "The Journal of Nonsignificant Results" ("It's dull, of course", one of my professors remarked).

Perhaps the remedy to this is for journals to add to every study they publish a sidebar citing non-significant result studies on the same topic.

Posted by: captcrisis on September 20, 2006 at 3:36 AM | PERMALINK

Kevin, I don't always agree with you, but I like your blog. This post is one of the reasons why. You have done a real public service.

Posted by: jimbo on September 20, 2006 at 3:52 AM | PERMALINK

I know nothing about political science methods, but in areas I know, a journal will usually not publish a negative result except as a counter to an already published positive result.

Also, using your 1.96 cutoff, "not quite significant", is the land of ambiguity. Re-evaluation of the data may either raise or lower the value, but research that is close to 1.96 is discomforting and undergoes far greater scrutiny than results that are clearly negative or positive. I am not surprised that such results are not often submitted for publication.

Posted by: Mudge on September 20, 2006 at 4:36 AM | PERMALINK

"Publication bias" has been noted in the natural sciences, too. Scenario: Researcher A conducts an experiment, finds a result and publishes a paper regarding the experiment and the result. Researchers B and C read A's paper and subsequently conduct the same experiment. Researcher B finds the same result, confirming A's thesis, and writes a paper. Researcher C does not find the same result, which casts doubt on A's thesis, and writes a paper to that effect. Researcher B's paper is far more likely to be accepted for publication than Researcher C's. It has been hypothsized that the reason for the difference is that editors of scientific publications want--perhaps sub-consciously--to publish papers that provide evidence of some results, as opposed to papers that provide no results, even if the later casts doubt on results claimed in previously-published papers.

Posted by: raj on September 20, 2006 at 5:11 AM | PERMALINK

There's a bias in academia toward finding results -- period. Arguments get framed around the expectation that gathering certain statistics will achieve something statistically significant. We've known this since Charles Murray published "The Bell Curve."

Social phenomena like discrimination are difficult to put a number on so they rarely factor into explanations of income and achievement gaps. Who would publicly admit to being a racist? Yet conservatives today are still fascinated by the "Curve" like it's one of the lost Gospels. (They conveniently overlook the fact that Murray himself admits to having burned a cross in an African American's yard as a youth. Maybe he wasn't the most credible person to conduct the study in the first place.)

In many medical and scientific journals articles don't get published until after peer review and after findings have been replicated. Social science researchers in the same field rarely get an opportunity to comment on why one team found results and another didn't.

Posted by: pj in jesusland on September 20, 2006 at 6:35 AM | PERMALINK

I doubt that this is unique to political science. Also the z score of 1.96 is an arbitrary standard for significance. The type II error remains likely to be more prevalent because of it.

Posted by: Ba'al on September 20, 2006 at 6:57 AM | PERMALINK

Ba'al, the z score of 1.96 is only semi-arbitrary in terms of significance levels. If those who performed the study chose a higher standard, the results Kevin reported would have been even more skewed.

And if anyone thinks publication bias is a problem, now consider how prevalent is the practice of massaging numbers in order to even get a study with significant results. Having degrees in Anthropology and Economics, with a strong statistics background and a focus in political anthropology, I've long been skeptical of 99% of the bullshit published in academic journals, let alone those studies rejected for publication and consequently filtered into the mainstream media who lack even a smidgin of those standards. Welcome to cynicism.

Posted by: Ack Ack Ack Ack on September 20, 2006 at 7:50 AM | PERMALINK

I love this blog.

Posted by: calling all toasters on September 20, 2006 at 7:52 AM | PERMALINK

Or you could do the short form. Almost any study drops in importance by an order of magnitude when you consider the context, or whether it even makes any sense at all.

Reading the fine print usually adds a few more reasons not to run screaming into the street.

Posted by: serial catowner on September 20, 2006 at 8:10 AM | PERMALINK

Without a doubt, this was the most boring post of all time.

Posted by: Homer on September 20, 2006 at 8:14 AM | PERMALINK

Okay, time for an electronic show of hands. How many of you academics have sat through the following discussion with one or more of your colleagues:

"Wow, if we could show that XXX causes YYY I know where we could get some funding!"

Posted by: pj in jesusland on September 20, 2006 at 8:21 AM | PERMALINK

The thing that intrigues me about all of this is the notion that there has to be a statistical correlation for a result to have significance. The absence of a correlation is still a result--a result that tells the researcher he needs to reexamine his assumptions. That kind of result is important.

The problem is one of language. There are two ways to use the word "significance" in this context and we tend to confuse them.

Posted by: Ron Byers on September 20, 2006 at 8:29 AM | PERMALINK

What am I missing here? If the vertical axis is the number of papers, and the horizontal axis is "z-score," it looks as though half or more of the papers have a score below 1.96, and the reason it doesn't look like a bell curve is that the distance between zero and 1.96 is so much shorter than 1.96 and 13.66.

Posted by: anandine on September 20, 2006 at 8:30 AM | PERMALINK

As a statistician, I occassionally see stuff like this.

It is idiotic.

It is nonsense.

It is totally moronic.

The point about SIGNIFICANT results is that it saves us from an OCEAN of CRAP. Significant results are hard to achieve. They demonstrate that you have found something of minimal interest.

Without this screen, we will have a MOUNTAIN of totally useless studies, demonstrating that there is NO RELATIONSHIP between all sorts of things.

That's stupid, no doubt about it.

And publication bias is similarily idiotic.

Posted by: POed Lib on September 20, 2006 at 8:32 AM | PERMALINK

POed Lib,

You're a moron. Did you even read the article? No one is arguing that significane levels don't save us from an "OCEAN of CRAP"; the argument is of publication bias, which you say, at the end of your idiotic rant, is similarily (sic) idiotic. Obviously it is not similarly idiotic if it is the entire fucking point.

Posted by: Ack Ack Ack Ack on September 20, 2006 at 8:44 AM | PERMALINK

Their are subjects in my field that are so complicated that no one has been able to make sense of it in a review article.

Instead the publications are limited to those that don't try to make sense of the data and those that happen to find a data set that appears to makes sense. An electronic journal of negative results would save everyone a lot of trouble. At this point we only have a bunch of old professors in far away places who, if asked, will steer you away from these quagmires.

BTW, I don't know how this curve would look like in other fields. The problem in ours is not finding significant results, but finding interpretable results.

Posted by: B on September 20, 2006 at 8:46 AM | PERMALINK

A ocean of published crap is not better than an ocean of unpublished crap. The point of a significance level is separate out interesting results.

We can spend eternity determining that things are not related. That would be idiotic.

Posted by: POed Lib on September 20, 2006 at 8:49 AM | PERMALINK

We can spend eternity determining that things are not related. That would be idiotic.

What might be more idiotic is 10's of research groups over several decades determining that the same two things are not related.

Posted by: B on September 20, 2006 at 8:54 AM | PERMALINK

The quantity of space in journals is limited. The patience of researchers to read stuff is not unlimited either.

When you have an infinite supply of possible results, you need some rule to determine what should be published. The rule has traditionally been the significant result.

There are, additionally, many reasons why non-significant studies are non-significant. Prior to 1980 or so, this was usually insufficient power. Recently, a strong emphasis on power has resulted in studies adequately powered at design time. Thus, if they are not significant, they are not interesting.

The assumption behind all of this is "People attempt to publish research about POTENTIAL relationships." Thus, in this case, non-significant result would be interesting. However, we have no way of determining what MAY be interesting.

Replications are more likely to be interesting, and I agree that there may be some bias there. Replications are probably more likely to be underpowered, however.

Posted by: POed Lib on September 20, 2006 at 8:59 AM | PERMALINK

Thanks Kevin, I think you just convinced me to submit some negative results for publication. They aren't exactly negative I guess -- but I found a correspondance with the mundane when I was looking for a correspondance with the dramatic.

Posted by: B on September 20, 2006 at 9:00 AM | PERMALINK

POed Liberal

I agree with your general point as to why we should be seeking statistical signficance, but we have a mountain of totally useless studies showing some sort of non-relevant statistical correlations between all sorts of stuff.

The real advances in science are made when people look behind their assumptions. Unfortunately that doesn't happen in the "social sciences" all that often, if ever. Usually the "social scientist" pulls some notion out of his ass that seems to fit a popular preconception and designs a study intended to show that the notion is relevant.

Most of the time the study is done for the precise purpose of obtaining funding. To quote pj in Jesusland. "Wow, if we could show that XXX causes YYY I know where we could get some funding!" That is the basic assumption of nearly all work done in the social sciences. Hell, it is the basic assumption of nearly all work done in science everywhere.

Everybody has to eat.

Posted by: Ron Byers on September 20, 2006 at 9:05 AM | PERMALINK

If we're done discussing Methods 101, can we discuss something relevant, like the article in question?

Posted by: Ack Ack Ack Ack on September 20, 2006 at 9:09 AM | PERMALINK

Ron:

You clearly know little about the social sciences. As a defrocked psychologist working in medical research and reviewing studies and research for NIH, NIMH and other institutes, I do know something about it.

They don't pull things out of their butts. They build on research. Much of the research is crap, I agree. Much is idiotic. But it is not pulled from their butts.

Real advances are not made when people look behind the assumptions. Assumptions are seldom that. They are usually built on years of research. While this sometimes is true (BF Skinner and the behaviorist revolution; evolutionary psychology; the cognitive sciences backlash against behaviorism; etc), it is less common than you think. After all, assumptions are made because they are plausible, not because they are easy.

Posted by: POed Lib on September 20, 2006 at 9:12 AM | PERMALINK

Medical research doesn't fall under the social sciences, you moron.

Posted by: Ack Ack Ack Ack on September 20, 2006 at 9:25 AM | PERMALINK

So, let's say that we decide to divide studies into three groups:

Significant at .05
Significant JUST ABOVE .05 (to .10, for instance)
Non-significant.

We agree to publish studies in groups 1 and 2.

At that point, the level of significance has just been redefined to .10, and the same criticism would be leveled: studies with pvalues of .11 should be published, since they are ALMOST significant.

It's a useless criticism.

Posted by: POed Lib on September 20, 2006 at 9:27 AM | PERMALINK

Medical research doesn't fall under the social sciences, you moron.

Demonstrating that you know nothing of medical research.

It's half social science, half chemistry, half physics, half psychology and the remaining half is clinical practice.

Posted by: POed Lib on September 20, 2006 at 9:28 AM | PERMALINK

I do know a little about the social sciences, my undergraduate degree was in sociology. I have been reading social science clap trap for a lifetime. I actually was forced to endure a semester of BF Skinner from one of his alter boys.

No, I am not a professional social scientist, but have known enough of them to realize that most the state of social science is about the same as the state of astronomy in the centuries before the telescope. Mostly it is about the social science equivalent of astrology.

Posted by: Ron Byers on September 20, 2006 at 9:29 AM | PERMALINK

That's a lot of halves... and you say you're in statistics?

And you say the only part that is a social science is, um, social science? WTF. You do realize the social sciences is a generic term for disciplines like Anthropology, Sociology, etc.. right?

Moron. Stop posting.

Posted by: Ack Ack Ack Ack on September 20, 2006 at 9:35 AM | PERMALINK

No, I am not a professional social scientist, but have known enough of them to realize that most the state of social science is about the same as the state of astronomy in the centuries before the telescope. Mostly it is about the social science equivalent of astrology.

You are being unfair here. In the social sciences, we have statistical causes. Possibly. Or put another way, we have multiple causes for multiple outcomes.

What that means is that we are continually trying to 1) determine who is in what groups and 2) determine the relationship between Factor A and Outcome B. For Group 1, Factor A * 2 = Outcome B. For Group 2, Factor A / 6 = Outcome B. For Group 3, Factor A is not involved.

So, the difficulty is that 1) people are not molecules. But 2) to determine if Factor A is related to Outcome B would require a sample size of 100,000.

Posted by: POed Lib on September 20, 2006 at 9:36 AM | PERMALINK

For the record, I know of nobody who considers Medical Research as anything other than a Natural Science... for obvious reasons.

Posted by: Ack Ack Ack Ack on September 20, 2006 at 9:37 AM | PERMALINK

Moron. Stop posting.

Look, bozo, what's with the moron? I didn't call you a moron.

And, since you know little, I'll explain that "social science" means everything from psychology to economics to sociology to social work to polisci to anthropology and a few that I've left out.

Hey, I sit on the review panels for a bunch of these. So, yes, I do know.

How about you? What do you review?

Posted by: POed Lib on September 20, 2006 at 9:39 AM | PERMALINK

I can think of one explanation for this that has nothing to do with publication bias. As a social scientist, I often run experiments on people -- and because it takes a certain amount of resources to "run" each subject, it's standard to stop running soon after the results become significant. (In other words, if I can get statistically significant results with N=12 subjects, I won't waste the resources to run more; but if it takes N=16 or N=24 then that's how many I'll run. Obviously if it gets up to, say, N=32 or N=40 and there are no significant results, then we'll stop).

As a result of this, there are probably many studies published that are just over significance. But it has to do with good resource management, not massaging the data or other unethical practices.

Posted by: Rayven on September 20, 2006 at 9:41 AM | PERMALINK

Ack Ack Ack:

Medical research isn't social science?

What about behaviorial/risk group disease vectors?

Bob

Posted by: rmck1 on September 20, 2006 at 9:42 AM | PERMALINK

One really does have to be on guard against selection biases, in all fields. I recall hearing a very humorous (and sarcastic) conference talk on such abuses as:

"How to use meta-analysis to turn ten one-sigma results into one ten-sigma result"

There was also a real gem of a line:

"keep taking data until you get a three-sigma result, then publish ASAP, before it goes away!"

Eventually good results get replicated, and bad results discarded, but there sure can be a lot of nonsense before "eventually" rolls around.

And very occasionaly the 'nonsense' turns out to be a surprising and interesting result. So some (grudging) toleration of the whole inefficient research process is needed. It's about exploring an infinite-dimensional phase-space with a random walk. Efficient, it's not.

Posted by: Grumpy Physicist on September 20, 2006 at 9:44 AM | PERMALINK

Another problem with publication of "almost significant results" is that they are cited as "significant results." If it gets into the literature, it is cited by someone else as support for his/her research.

I hold the line at .05. It partially screens out the crap.

Posted by: POed Lib on September 20, 2006 at 9:50 AM | PERMALINK

For the record, I know of nobody who considers Medical Research as anything other than a Natural Science... for obvious reasons.

Geeezzuuussss, what a bozo.

Medical research is very much social science. People interacting with doctors, weight loss, placebo effects, medication compliance, ....

In many studies that I do, the main issue is not "Does the drug help with the disease?" The issue is "How can we determine if they took the dang medication?" That is psychology.

Do you work in marketing or something?

Posted by: POed Lib on September 20, 2006 at 9:59 AM | PERMALINK

I'm saying Medical Research doesn't fall under the Social Sciences, not that it doesn't contain aspects of the Social Sciences. Aside from Economics, a messy discipline if ever there was one, the difference between the pure Social Sciences and the Natural Sciences is in methodology--the replicability of quantifiable results and the ability to pursue objectivity. You cannot quantify behavior, only results of behavior like disease patterning--which tells you more about disease than it does behavior.

For the record, I have degrees in Anthropology & Economics with a strong background in statistics.

Posted by: Ack Ack Ack Ack on September 20, 2006 at 10:01 AM | PERMALINK

Assuming of course that a majority of papers submitted to political science journals use classic linear regression...

Which they don't.

Most papers use Maximum Likelyhood Estimation and more and more are using Bayesian estimation...there is a significant skepticism towards using the normal distribution as the sampling distribution for most questions relating to political science.

But I am sure your authors also looked at those types of estimation papers as well...

Yes?

Lies, damn lies and what again?

Posted by: Nazgul35 on September 20, 2006 at 10:05 AM | PERMALINK

Why is a bell curve expected in the first place? Wouldn't that assume that whether results are significant or not is random, with the majority not random? That would mean that what to study is chosen randomly, a pretty silly concept.

Posted by: Cindy on September 20, 2006 at 10:05 AM | PERMALINK

As someone pointed out on one of Kevin's threads several months ago, science is the process of eliminating error. The private sector may be good at creating wealth and setting prices but businesses have a long track record of overlooking methodological errors and promoting false significance in order to bring drugs and medical technology to market faster.

Now that companies like Squibb, Merck, J&J and Glaxo are underwriting so much academic research we need to be very aware of inherent statistical biases in published studies. Choosing which studies are used to justify FDA approval of new HIV or cancer drugs can literally be life and death decisions.

Posted by: pj in jesusland on September 20, 2006 at 10:07 AM | PERMALINK

I think Grumpy made the salient point. Eventually good results get replicated. I'm not a scientist -- but isn't the idea to find a bunch of interesting hypotheses to further refine with other studies using different data in different contexts, until eventually you say okay, X causes Y under such-and-such circumstances? Even tweaked data or defined-down "significance" gets weeded out with enough attempts at replication, though "eventually" is (as per Grumpy) quite the journey most of the time.

This is particularly problematic in political science, which is neither exactly political nor quite a science. I think the problem here is less publication-hungry researchers as it is that somebody like Kevin (or his counterpart on the right) will skim the literature, see something that confirms (or disconfirms) his expectations and cites the research, under color of "scientific authority."

Thus you gussy up interpretive conjectures with nifty numbers you have to squint really hard at in order to properly evaluate -- and polemicists are allergic to much heavy squinting.

Bob

Posted by: rmck1 on September 20, 2006 at 10:07 AM | PERMALINK

POed Lib

Let me tell you the study I would like to read. I would like to see if there is a statistical correlation between the introduction of all the powerful new mood altering drugs and the percentage of the population diagnosed with bipolar disorder.

When I was young bipolar disorder (manic depression) was a very rare and very serious disease. I knew a guy who had the disease and it was devastating. It was hard to miss.

Now bipolar seems to be the disease of choice. Feeling a little depressed. Take this drug. Life hit you with mood swings, take that drug.

I watched the news the other day as some twit was interviewed by Matt Lauer. She had sex with a 14 year old student. That is wrong. Teachers should never have sex with their students. Her excuse. She was bipolar and had "borderline personality" disorder or some such crap. Tell me psycology isn't just like astrology. Don't they both tell suckers willing to pay what they want to hear.

Posted by: Ron Byers on September 20, 2006 at 10:08 AM | PERMALINK

Medical research is very much social science. People interacting with doctors, weight loss, placebo effects, medication compliance, ....

In many studies that I do, the main issue is not "Does the drug help with the disease?" The issue is "How can we determine if they took the dang medication?" That is psychology.

Again, Medical Research is very much a Natural Science. That people can complicate understandings that would otherwise be quantifiable and objective does not make it not a Natural Science. It simply means behavior complicates the various aspects of Medical Research.

And what you're talking about is actually a subdiscipline of Anthropology and it's called Medical Anthropology. And it is very specific and very separate from Medical Research. Think of it like Six Sigma performance improvement in engineering; you wouldn't say engineering is Six Sigma, but you would say Six Sigma benefits engineering.

Posted by: Ack Ack Ack Ack on September 20, 2006 at 10:11 AM | PERMALINK

Ack Ack Ack Ack:

A related question: How do you (personally) define the difference between anthropology and sociology?

Bob

Posted by: rmck1 on September 20, 2006 at 10:23 AM | PERMALINK

How do you (personally) define the difference between anthropology and sociology?

This isn't an easy question.

First, I think cultural anthropology is a more holistic discipline when you consider the four subdisciplines: biological (physical), linguistics, cultural anthro, and archaeology.

Second, I find cultural anthropology itself to be more varied/holistic than traditional sociology in that it can, at once, encompass diachronic social processes/change as well as synchronic social analyses, not to mention ecological schools of thought, as well as acceptance of the more theoretical aspects like postmodernism that call into question historical processes as well as simply how we think of ourselves.

And third, there is the art of ethnography and participant observation.

Posted by: Ack Ack Ack Ack on September 20, 2006 at 10:39 AM | PERMALINK

Ack Ack Ack Ack, stop digging now before you bury yourself. You're trying too hard.

Also, stop calling people 'names' all the time.

Posted by: GOD on September 20, 2006 at 10:41 AM | PERMALINK

Ron Byers:

Bipolar disorder is still a very serious malady, treated with very serious drugs. The "designer angst" you're talking about is more like "social anxiety disorder" (what used to be called, uhh, shyness), which nobody ever heard of until a certain ad campaign for Paxil.

Personality (or character) disorders are a whole different ballgame. Most of them aren't, properly speaking, treatable by drugs. Borderline Personality Disorder can produce near-psychotic states in short bursts, but you don't treat it with anti-psychotic medicine. Borderline, Narcissistic, Avoidant, Histrionic, etc. personality disorders are some of the biggest frustrations in the therapeutic community. They're like diseases of the self.

Freud's day is past. We no longer have many neurotics in this culture, because we're no longer sexually repressed. With a neurosis, there's hope that you can get the person to understand the root cause (the fear, the unconscious desire) and they'll get better. Personality disorders aren't very amenable to either drugs or talking therapy -- and we have so many of them for a variety of cultural reasons that don't speak too well of our civilization ...

Bob

Posted by: rmck1 on September 20, 2006 at 10:43 AM | PERMALINK

nice paper and nice comments. Thank you. Things like this (and some others that I mention from time to time) bring me back to this site.

Posted by: republicrat on September 20, 2006 at 11:41 AM | PERMALINK

> Publication bias is a well-known and widely studied effect, and it would be surprising if G&M hadn't found evidence of it.

Ah, but would their results have been published in that case?

Posted by: Dan on September 20, 2006 at 11:42 AM | PERMALINK

Aaron S. Veenstra: What we found there approached significance, but didn't quite get there, most likely due to our sample size. However, when we compared the frames with each other directly, we did find significant results, as we did also when we examined a particular sub-group (in this case it happened to be blog readers) in the frame vs. control analysis. The papers reflected this -- the hypotheses we ultimately tested and discussed dealt with frame vs. frame in the full sample and frame vs. control in the sub-sample.

A perfect example of selective reporting. Also a perfect example of "testing" hypotheses after inspections of the data reveal which "hypotheses" would have been supported by data had they actually been "hypothesized" in advance. This is also an example of subgroup selection, which is known to be very unreliable.

People frequently fail to distinguish among "tesing hypotheses", "generating hypotheses while inspecting the data", and "rescuing hypotheses".

Posted by: republicrat on September 20, 2006 at 11:48 AM | PERMALINK

Perhaps a solution is for journals considering publication of a article to put out a request for articles (or at least abstracts with data) relevant to the one under consideration. Something to the effect of, "We have in our review queue an article we like on the correlation between between age and wisdom; if you have any similar findings on the subject we'd appreciate hearing about them." That way, a preface of some kind to the article could mention that 44% of research along the same lines disagreed with the article published.

Posted by: no1 on September 20, 2006 at 11:49 AM | PERMALINK

Bob

I think you are correct about bipolar disorder, but you need to convince all the pop psychologists. I ran into a woman in a few months ago who insisted her husband was divorcing her because he was bipolar and was off his meds. Turned out he had a girlfriend on the side. They got married after the divorce. He has been pretty normal since the new marriage. I don't know if he is still on bipolar medication or not. I suspect not. His former spouse could have easily driven anybody crazy.

A better pop psychology one size fits all disorder is attention deficit disorder. I never encountered ADD (or whatever they are calling it now) until sometime in the late 1980s. There have always been kids who were challenges for their parents but mostly they grew out of the disorder. Many of those kids were very successful and very creative. Well mom went to work and the drug companies came out with ADD treatments. Suddenly every challenging child has ADD and needs those high power drugs. I don't know how many little brains have been permanently damaged over the last couple of decades by doctors giving in to parents screaming that little Jimmy has ADD. I do know there are a lot of young people wandering around like zombies.

I am still convinced that many modern people diagnosed with "psychological disorders" are within the normal human range. There is simply more money to be made selling drugs to millions as opposed to thousands of people diagnosed with some mental illness or other.

Want a drug company "disease" driven by drug company marketing money, try erectile disfunction. Some old guy has a hard time getting it up for whatever and suddenly everybody has to buy viagra.

And no I am not a scientologist. I do believe there are people who really suffer from mental illnesses. Drugs are needed for those people. The number of truly psychotic, however, is not nearly as high as we are led to believe.

Posted by: Ron Byers on September 20, 2006 at 11:57 AM | PERMALINK

I work in clinical research and have seen a dramatic improvement in statistical standards for publication over the past 20 years. In systematic reviews, when there is concern for publication bias, people will compare results in the larger, higher quality studies to the total. In many cases a trend for the better studies to show no statistical significance or less statistical significance than the entire systematic review informs every thoughtful reader to be skeptical.

Hypotheses need to be stated before the analysis to avoid a problem of repeated measures. If the social scientist, faced with a statistically insignificant result, tweaks the hypothesis until the results cross the significance threshold, s/he is undermining the meaning of statistical significance. For instance, if I alter the hypothesis 20 times and find one that meets the 0.05 threshold (5% chance that the difference is due to random error) I have not truly found anything significant at all.

Lastly, a finding that is not statistically significant can be very significant. To chose a politically charged topic, if a large and well designed study showed no relationship between having gay parents and adverse psychological outcomes that would be very significant. Publication should be based on the degree of interest in the research question and the quality of the methodology, not the results.

Posted by: Mark on September 20, 2006 at 11:58 AM | PERMALINK

Who knew I was more intelligent than God...

Posted by: Ack Ack Ack Ack on September 20, 2006 at 12:36 PM | PERMALINK

I work in areas of Psychology (Personality and Industrial) that have been very concerned about problems with null hypothesis significance testing for some time. There are a number of problems here. First, since behavior is multiply determined, virtually everything of interest in psychology is related to everything else in psychology to some minimal degree, so if you want to study something the chance that there is no effect is functionally zero (Lykken or Meehl's crud factor). Secondly, p-values are driven partially by the size of the effect and by the size of the sample. Therefore, say the correlation between phenomenon A and phenomenon B is miniscule (e.g., r = .01), but I have a large probability sample of the US population like the NLSY or Project TALENT. Sample sizes of 3,000 or more will make even very small effects statistically significant.

Thus, part of the problem is the traditional reliance in the social sciences on p-values as a proxy for "importance" of the effect, when an index of the effect size would do better. When considering a bivariate relationship, the effect size estimate (correlation, mean difference, contrast, etc) will stabilize in relatively small samples (e.g., N

Some claim that this issue is solved by meta-analysis, but like all research, meta-analyses can be done well and poorly. Selecting biased samples of research, inappropriate corrections, applying fixed effect models when random effects are appropriate, and so on can lead to spurious conclusions that have great confidence.

Achieving statistical significance does not protect us from bad research, as bad research is just as capable of hitting the hallowed p

Well-designed research studies that consider power before being conducted, and that report observed power, and that report effect sizes and confidence intervals, and that ASK INTERESTING QUESTIONS are where the salvation of any social science lies. Over-reliance on Fisherian significance testing has been especially harmful in psychology, long dominated by 2x2 ANOVA designs which do not readily produce effect size estimates. Other research endeavors that do (medical research often focuses on odds-ratios, for instance) may be less harmed, because effect sizes communicate the information that has long been improperly derived from the statistical significance.

Editors and reviewers should pay more attention to a. the relevance of the research question itself, b. the quality of the research design, and c. the effect size, than to the statistical significance. These are far more informative of the actual contribution the research makes.

Posted by: Seth on September 20, 2006 at 12:49 PM | PERMALINK

The "social sciences" are inaptly named. There is nothing scientific about sociology or political science and increasingly anthropology is becoming a playground for activists with axes to grind.

Science can be defined as "systemized knowledge in general." So yes, the social sciences can be understood as scientific. Now, you may like to disagree, but doing just makes you sound like an activist with an axe to grind.

College students are still subjected to the "studies" Margaret Mead made about sexual behavior in Samoa, all of which were debunked by Derek Freeman who revisited the Pacific Island and interviewed Mead's subjects. He concluded that Mead got from her interviewees the answers she hoped she would get- that sexual libertinism was the norm in Samoa. Which was a lie.

It may come as a surprise, but college students are also "subjected" to Derek Freeman's debunking of Mead's research.

Anthropology's current "scientific" contribution is that "race" does not exist.

Actually, there is a debate in Anthropology over whether race exists. What is real is the fact that race exists as a social construct.

And Franz Boas concluded that "all cultures are equal."

Yep. And he did so at a time when "The Great Chain of Being" was all the rage.

Much of the confusion of what passes for liberal thinking derives from the sad fact that they accept this nonsense as true.

O'Really.

If one begins with false premises any conclusions based on those premises will be false.

Indeed.

Posted by: Ack Ack Ack Ack on September 20, 2006 at 12:52 PM | PERMALINK

Ron:

Well ... I dunno. It's complex. Modern life certainly does make people crazy, at least in a vernacular sense.

There are definitely more people claiming to have mental problems now than there were 30 years ago, and there's also a proliferation of drugs to treat conditions that were thought to be other things. Are the new drugs driving the diagnoses and creating new categories of "mental illness" that never existed before?

Yes and no. No question people who've perused the DSM volumes that categorize mental illness (and the DSM V will be out shortly) have seen a continual enlargement of categories. Some say this is for insurance purposes -- to find the most specific list of symptoms possible in order to make a firm diagnosis to facilitate some kind of treatment. And they argue this is bad, because most mental problems involve a combination of factors and don't easily fit into these slots. So the slots proliferate ...

There's also no question that the success of MOIs and (especially) SSRIs (the marvelously named Prozac, Paxil, Welbutrin, Effexor, Zoloft, etc.) have had a real effect on what was once intractible depression -- and there are people who swear by these medications. But it's also true that shyness, general anxiety, mood swings (not to the level of bi-polar), dysthymia or anhedonia (not getting enough "zest out of life") have been elevated into the categories of genuine mental maladies with a specific name-brand drug treatment associated with them. Is this progress -- or decadence? It's hard to tell, truthfully.

Also, don't confuse any of this (including bipolar disorder) with psychosis -- which involves a dissociative break with reality, and can be quite dangerous, both to the sufferer and other people. My first girfriend was a paranoid schizophrenic in remission for the summer I first knew her. Then she flipped out when she went to college. Not a fun condition, and nothing remotely romantic or Dickensonian about the "higher poetic truths" in the visions of the clinically insane.

ADD is real, trust me. When I was a kid in the late 60s, it was called hyperactivity. You think Ritalin is an iffy substance to stick prepubescent children on, they put me on Dexedrine. Nice to be a legal speed freak before the age of twelve. And yes -- kids grow out of it. But can you blame the parents for being driven to distraction by their kid bouncing off the wall for all those years in the meantime? Creativity, sure -- but the ability to concentrate in highchool means getting in to a better college. It's a damn good thing that these drugs have gotten more subtle and sophisticated over the years -- and harder to abuse.

As for your larger point about personal responsibility? I have mixed feelings about that. On the one hand, I'm glad anytime that a new treatment removes a burden of guilt. We got nowhere when we considered alcoholism and drug addiction the result of "bad character." I'd rather have these potential treatments available than not. If it leads to a sexual predator blaming her love affair with a 14-year-old on "bipolar disorder" she's now on meds for, I can live with that, considering the consequences of no treatment at all.

Bob

Posted by: rmck1 on September 20, 2006 at 12:59 PM | PERMALINK

Bob

Excellent thoughts. Thanks.

Posted by: Ron Byers on September 20, 2006 at 1:29 PM | PERMALINK

Antequated point. There are several power programs which work fine with ANOVA designs, including RM methods. One is unifypow and the other is GLMPOWER, a SAS PROC

Posted by: POed Lib on September 20, 2006 at 1:52 PM | PERMALINK

In my last post, I omitted by some error the following:

Over-reliance on Fisherian significance testing has been especially harmful in psychology, long dominated by 2x2 ANOVA designs which do not readily produce effect size estimates.

Current power methods like those I mention work fine with ANOVA.

Psychology is now going to other methods. Unfortunately, some of them have much poorer power computation methods (SEM, etc).

Posted by: POed Lib on September 20, 2006 at 2:00 PM | PERMALINK

I think Cindy's question above deserves an answer from Kevin or one of you other smart guys. Here it is:

Why is a bell curve expected in the first place? Wouldn't that assume that whether results are significant or not is random, with the majority not random? That would mean that what to study is chosen randomly, a pretty silly concept.

I don't see why a bell curve should be expected either, although I'm not sure I buy Cindy's reasoning, exactly. Might be a bell curve if all possible studies were done -- chance would relegate most near zero significance and fewer at higher levels. But all possible studies are not done, many being obviously fruitless. And social science is famous for studies that confirm the obvious.

On this other hand, this may not affect Kevin's point. That little trough just to the left of 1.96 is fascinating, like the bulge in basketball winning margins just below the point spread.

Posted by: David in NY on September 20, 2006 at 4:58 PM | PERMALINK

Actually a bell curve is not what we expect, in point of fact.

Most statisticians believe that p-values are uniformly distributed. That means that if you did every possible study in the world, you would be equally likely, picking one of the billion studies, to have a p-value of .01, .05, .50, .94 or .99.

p-values are uniformly distributed. The underlying data may be normally distributed or log-normal or whatever, but the distribution of the results of statistical tests is uniform.

The distribution that Kevin showed is not a bell curve. It is an exponential of some sort.

Posted by: POed Lib on September 20, 2006 at 5:08 PM | PERMALINK

Thanks POed Lib, the uniform distribution was actually my own instinct (for what instinct is worth in this area, especially from a statistical near-illiterate). I'm not sure where Kevin's curve comes from but I took it to be a half-bell whatever that would mean statistically (not much, I would think). Have to try to figure out the article, I guess, if it comes from there.

Posted by: David in NY on September 20, 2006 at 5:59 PM | PERMALINK

1: I disagree with the first statement, that the figure reflects publication bias. If social scientists aren't stupid, they don't ask blind random questions. [We don't in biology: at least we can do slightly better than random.] Therefore, the pool of z-scores from tests of those questions should NOT be a bell curve, but should be a pdf convoluting the distribution of how well social scientists pose strong hypotheses (conjectures) that happen to be correct with the z distribution. The z scores of all statistical tests should follow the Z distribution IFF all questions were completely arbitrary and the statistical tests were random and the null hypotheses were true. Even if the posed hypotheses were a random sample of some specified distribution of all possible questions, if some of the hypothesized causes were real, the distribution of z scores wouldn't follow the z distribution even without publication bias or social scientists being able to pose better than random hypotheses.

More simply: in frequentist statistics, the p value is the probability, IF THE NULL HYPOTHESIS IS TRUE, of obtaining the observed or more extreme results. The z-scores of the pooled studies should only follow a z distribution if all of the null hypotheses were true. (z-score to z distribution is equivalent to p value to uniform [0,1].) {There's a whole literature on tails of distributions that explains the "fit" for the histogram above 2: there are only 3 shapes of tails of distributions.}

2: The deficit of p just greater than .05 and excess of p just less than .05 is evidence of something rotten in the state of published social science. Unique to that field: not so much.

Posted by: tomp on September 20, 2006 at 6:48 PM | PERMALINK

3: What Aaron S. Veenstra describes above is post-hoc testing. Given his procedure of slicing and dicing data until the magic p

4: In biology and environmental science, we don't need more publication venues as much as we need the raw data available for re-analysis.

5: A counter-example for captcrisis: generic drug companies want to publish non-significant results: that their generic does not significantly differ from the brand-name drug. [Same for environmental monitoring, etc.] The trivial way would be to reduce the sample size and thus power to detect a difference. Therefore, bioequivanence tests turn it around: prove that there is less than a 5% chance that the difference is greater than a tiny threshold.

6: When results are quantitative effect sizes or strengths (z-score for effect > 0), publishing the effect size and confidence or credibility interval rather than the p value or z score resolves this problem. Crap studies with no power to detect anything have huge confidence intervals and aren't publishable. Powerful studies with "negative" results are publishable: they provide strong evidence that if there is an effect, it's magnitude is less than the high end of the confidence interval and thus can be ignored (e.g., #5 above).

Posted by: tomp on September 20, 2006 at 6:52 PM | PERMALINK

Actually, that dotted line on Kevin's graph doesn't appear on the similar graphs in the article. But the comparison of over-under the significance level is really striking.

It really is like the bulge in the scores of winning team in college basketball just below the point spread. Some of those college players aren't trying real hard to cover the spread; some political scientists are.

Posted by: David in NY on September 20, 2006 at 8:08 PM | PERMALINK

The distribution that Kevin showed is not a bell curve. It is an exponential of some sort.

So's the normal distribution (Bell curve). It's from the exponential family.

Posted by: Nazgul35 on September 20, 2006 at 10:30 PM | PERMALINK

One thing is certain...those guys will get their paper published...

Posted by: Pepe on September 21, 2006 at 12:34 AM | PERMALINK

So many of you commenters sound WAY knowledgeable about this, but it is mostly over my head.

The question that Kevin is posting has to do with bias in political science publishing, but the whole point brings to my mind the global warming issue and bias that I perceive there. Raising this issue out loud has a habit of pushing a lot of people's buttons, so I mention it with some trepidation.

My point is that in doing a LOT of reading on this issue, I find very few available - though SOME - papers showing no anthropogenesis in global warming. And those I find tend to be by older climatologists, which implies bias in the younger climatologists.

The underlying thing is that I have YET to find even ONE paper that rules out non-anthropogenesis for global warming (such as natural variation or oscillating cycles), so it is difficult for me to accept the plethora of papers that seem to find significant rises in temperatures, with human causation being asserted, yet to my mind not shown to be warranted by the data or methodology.

If anyone asks me for citations, I cannot do so. I am writing here from memory - a frustrated memory. If the allegations were adequate to convince me, I would listen and join the majority, but, despite the NUMBER of papers, the data as presented is inadequate, to my mind. On the contrary, the papers I have read contra-indicating anthropogenesis in global warming are much more convincing to me. FYI, I am a non-credentialled but highly interested student of climatology.

My underlying questions here, then, would be: Is Kevin's point here applicable to global warming? Does bias toward significant relationships get more of those articles published and get those scientists' future studies funded?

The search for funding has been one of my suspicions all along - that once the global warming ball wagon got rolling, positive findings were seen as the way to get funding, which I perceive as skewing the playing field.

Posted by: SteveGinIL on September 21, 2006 at 2:45 AM | PERMALINK

Conclusion: Check for robustness.

Those who peer-review studies with borderline significance should demand sensitivity analysis.

---
That aside, there's nothing wrong with publishing a paper with borderline significant results, as long as your caveats go beyond the usual boilerplate.

Posted by: Measure for Measure on September 21, 2006 at 6:02 PM | PERMALINK

The implication of this story is that data are being falsified. This conclusion is not supported by the analyses discussed, and, moreover, is almost certainly false. There is a much more likely answer that matches the facts given. The p-value is a function of two factors: the effect size and the sample size. When data are gathered and found to lie near, but short of, significance, it is typical for researchers to increase the sample size just enough to push the findings over that threshold. Doing so is not entirely valid, but not entirely invalid either. Rather, it implies misplaced emphasis.

A greater focus should be placed on effect sizes and less on traditional p-values.

Posted by: Jeff on September 21, 2006 at 7:45 PM | PERMALINK




 

 

Read Jonathan Rowe remembrance and articles
Email Newsletter icon, E-mail Newsletter icon, Email List icon, E-mail List icon Sign up for Free News & Updates

Advertise in WM



buy from Amazon and
support the Monthly