Special Report

May/June 2012 A Test Worth Teaching To

The race to fix America’s broken system of standardized exams.

By Susan Headden

In America, high-caliber tests like AP exams are usually the province of elite, high-achieving students on the college track. And so you might think that this two-tier assessment system is an inevitable result of inequality—that underprivileged kids wouldn’t have a prayer on these more demanding tests. Yet the industrialized nations that consistently outshine the United States on measures of educational achievement—countries like Singapore and Australia—have used such assessments for students across the educational and socioeconomic spectrum for years. Although some are multiple-choice tests, most are made up of open-ended questions that demand extensive writing, analysis, and demonstration of sound reasoning—like AP tests. “There is no country with a consistent record of superior education performance that embraces multiple-choice, machine-scored tests to a degree remotely approaching our national obsession with this testing methodology,” says Marc Tucker, the president of the National Center on Education and the Economy and an expert on testing. “They recognize that the only way to find out if a student can write a competent twenty-page history research paper is to ask that the student write one.” In other words, the kind of knowledge you can measure with a multiple-choice test is ultimately not the kind of knowledge that matters very much.

For that reason, experts have for years pleaded for the U.S. to adopt the kinds of tests that mea -sure and advance higher-order skills for all students. You won’t be surprised to hear they ’ve been frustrated. Part of the reason why they haven’t gotten their way is economic. Viewed in a certain light, “basic skills tests” are in fact just “cheaply measurable skills” tests. According to Tucker, superior assessments cost up to three times more than a typical state accountability test. Quite simply, scoring essay questions and short answers is expensive.

The other big problem is that the American testing market is fragmented. If there were some unified, standard curriculum across states—like an AP course in “What You Need to Know by the End of Third Grade”—then states would be able to pool their resources to pay for a worthwhile test they could all share, and test makers would be able to set up economies of scale, bringing prices down. The rest of the industrialized world operates much like this: countries examine their students to see how well they have mastered a certain standard nation-al curriculum. For various political reasons, we do not have a standard national curriculum. And so we have tests like the DC CAS, which establish a de facto curriculum in schools like Hart—a curriculum of “basic skills.”

Having said all that, here’s some astonishing news: quietly, over the past few years, forty-five American states plus Washington, D.C., have been working to establish something called the common core standards in math and English. While not a unified national curriculum, the common core will lay down a set of high, unified standards—rubrics that define what students should be able to know and do by, say, the end of third grade. Those standards will be enough to defragment the American testing market. With them will come a set of completely new, interactive, computerized tests that promise to be much like what you’d find in Singapore or Australia or an AP classroom—exams that test higher-order thinking by asking students to show, in a variety of different ways, whether they have mastered a set of working concepts. If this sounds like the kind of thing that might actually debut around the time we all drive electric cars, think again: these new assessments will start field testing next year, and are due to land in most American classrooms in 2014.

Most of what you know about school testing is about to change. That much is relatively certain. What remains to be seen is whether that change will be so dramatic that it overloads the current system.

American schoolchildren have been taking achievement tests for decades. In the 1950s, they used their well-sharpened number 2 pencils on some -thing called the Iowa Test of Basic Skills, which is still in use and is almost exclusively multiple choice. Tests of this period were of the low-stakes variety—indeed, they usually weren’t required at all—and they were “norm referenced,” meaning that students were rated as they compared to each other. (Nancy was in the 90th percentile, Susie in the 70th, and so on.) When the Russians launched the Sputnik satellite in 1957, U.S. schools came under pressure to up their game. The Elementary and Secondar y Education Act of 1965 (ESEA), the precursor to No Child Left Behind, focused federal funding on poor schools with low-achieving students. Meanwhile, there was a growing feeling among the public that all students should be striving for well-defined learning goals and be tested on that basis. Some of this demand for data on students’ achievement was met by the National Assessment of Educational Progress, popularly known as the Nation’s Report Card, which was first administered in 1969. The NAEP measured just a sampling of students, and it didn’t break out state results as it does now, but it marked a trend toward using tests to monitor performance.

Worries about the caliber of the nation’s schools cropped up again in the mid-1970s when the College Board revealed that average SAT scores of American students had been falling since the mid-1960s. The public started to demand proof that schools were doing their jobs, and the states responded by requiring students to take minimal competency tests in order to graduate from high school. These so-called exit exams set several important precedents: they started a trend toward more accountability; they led to more statewide testing; and they began a shift away from measuring students’ performance relative to each other and toward a new regime of measuring how well students individually met strict standards. In psychometric terms, norm-referenced tests were giving way to “criterion-referenced” tests.

Yet, for political reasons—mostly in the form of resistance by local school boards, teacher’s unions, and parents—the bar for passing these exit exams was almost universally low. According to Eric Hanushek, a senior fellow at the Hoover Institution of Stanford University, no state before 1990 administered an exit exam that even reached the ninth-grade level.

The ineffectiveness of these tests became obvious in 1983 with the publication of A Nation at Risk, the landmark federal study that warned of a rising tide of mediocrity in the nation’s schools. A number of states responded to the report by pushing for higher standards and mandating tests. Then, in the 1990s, President Bill Clinton nudged the movement along further with legislation that gave grants to state and local governments to set new standards and create tests to measure how well students were meeting them. Most states took advantage of the grants, but the legislation provided no mechanisms to punish schools that failed to make progress. To the extent that there was accountability, it was unevenly adopted by the states.

Susan Headden , a Pulitzer Prize-winning journalist, is a senior writer/editor at Education Sector, a Washington, D.C., think tank.


  • Caroline Grannan on May 09, 2012 3:08 PM:

    Education Sector is a partisan organization that promotes the currently popular package of policies known as "education reform," not an impartial source. This article needs a disclaimer cautioning that it is intended to promote the organization's viewpoint.

  • David on May 10, 2012 4:20 PM:

    Very interesting article. Thanks

  • Janet on May 18, 2012 2:51 AM:

    Standardized assessment is not a bad thing -- but in itself, it does not address two of the largest problems in the American education system.

    First, that impoverished students who (on average) are least prepared to do well in school, will find themselves in schools with the fewest resources for teaching them.

    Second, that teachers who might be willing to take on the huge challenge of teaching and inspiring students with learning disabilities or those whose homes and families haven't given them a solid foundation for school, risk low evaluations because in a school year they may help students make enormous progress and build a basis for future success, but they're not likely to have many students who score above grade level; they start too far behind.

    This article doesn't seriously address either of these problems.

  • Ritsumei on May 30, 2012 11:16 AM:

    I took one of those AP tests (US History) and did well. I remember next to nothing. US history and the philosophy of the Founders has, in the past several years, become a topic of particular interest to me. It's very clear to me that the sort of cramming for the test that year's history course was made of was useful for nothing. I didn't retain ANY knowledge to speak of, and we simply didn't cover much of what made the Founding Era great: it wasn't on the test. I'm NOT impressed with the AP tests. They are useful only as coupons for reduced-cost college credits. The teaching to the test, in my experience, guaranteed that the retention simply wasn't there.

    It is also worthy of note that all powers not delegated to the federal government are reserved to the states or to the people -thus all federal involvement is unconstitutional, as it is in no place in the Constitution delegated to the national government. Federal involvement is usurpation of rights that belong with parents, plain and simple. I find the "common core" movement deeply disturbing, as it relates to our freedom to educate our children, and to freedom in general. This sort of top-down, government-centric educational model is incompatible with our system in which sovereignty rests with "We The People," rather than the ruler. These so-called "common core" initiatives fill me with dread for the implications to our freedoms, and I say kudos to Virginia and any other state that refrains from participation.

    Frankly, putting government in charge of education - arguably the single most important leash on government excess over generations - is no different from putting the fox in charge of the hen house.

  • v98max on May 31, 2012 8:00 AM:

    When my dad's school district first flirted with competency testing, every member of the school board was given the citizenship test for legal immigrants. They all failed. Needless to say, it was quickly determined that the test must be too hard.

  • Liz Wisniewski on June 02, 2012 12:10 PM:

    And we continue to focus on weighing the pig........As a fourth grade teacher, I am encouraged to know that the tests will be improving, and yet as I read this I started smiling. The truth is that using the tests for teachers' information is not really necessary as any halfway decent teacher already knows what their students can do. Spend everyday with 21 kids teaching them for month after month and you know them as learners, you know what they can do and what they can't. If a teacher needs standardized test restults to know if a student cannot do multi-digit multiplication I would suggest that someone check what she is smoking in the outside smoking area.

    If only all this time and money was spent on helping children be "present for learning" and on making sure we hire intellectually energized and well trained people as teachers. Yet, we continue to think that weighing the pig is going to make it fatter.....sigh......

  • Bob Ellingsen on June 04, 2012 2:00 AM:

    I taught AP US History for twenty years, and I think the AP program represented the paradigm for what education ought to be. It kept my feet to the fire; I had to cover a rigorous curriculum and couldn't waste a minute. If I wanted my students to do well on the three essays on the AP test, they had to practice writing all year, and I had to read what they were writing and offer feedback. Moreover, even the multiple-choice questions on an AP exam usually require the student to do more than just recall facts. Finally, the presence of a high-stakes test that has meaning for the student changes the dynamic in the classroom. In a very real sense, the student and I were "on the same side." If he or she did well on the test, both of us would be very happy. "Teaching to the test" is as good or as bad as the test itself.