Special Report

May/June 2012 A Test Worth Teaching To

The race to fix America’s broken system of standardized exams.

By Susan Headden

That changed in 2001 when President George W. Bush signed the No Child Left Behind law, under which, for the first time, the federal government itself was demanding that school districts be held responsible for the performance of their students. The tests that gauged this performance measured minimum competency, and they required states to show that their stu -dents were making yearly progress toward the goal of becoming proficient. They also led to a fundamental change—many would say for the worse—in the relationship between testing and instruction. Whereas the original goal of achievement tests was to improve instruction by providing educators with useful information, says Daniel Koretz, an assessment expert with the Har vard Graduate School of Education, the new goal was to improve instruction by holding someone accountable for results. Koretz calls this shift “the single most important change in testing in the past half century.”

NCLB has had some successes. Because it requires states to break down data among racial and other demographic groups, it has identified significant achievement gaps, and in most states those gaps are narrowing. It deser ves credit for inducing broad gains in achievement in the key subjects of reading and math (even as it crowds out other subjects), and it has encouraged teachers to use data to shape instruction. But more than ten years after the law ’s passage, 50 percent of schools are not making the Adequate Yearly Progress required by the law.

The greatest drawback of NCLB, meanwhile, is the one that so unnerves Caryn Voskuil: the tests it spawned ask students to restate and recall facts rather than to analyze and interpret them. It turns out that this is largely a legacy problem: because the stakes of standardized tests in America before NCLB were historically very low, states had no interest in paying much for them. And when those stakes got higher and states did need measures of accountability, they simply used or replicated the cheap tests they already had. They did so, in part, because each state had its own standards, and thus needed its own tests. That fragmented demand, along with the need for lightning-fast scoring, led to a shortage of experts to build the tests, as well as downward pressure on the profit margins of testing companies. The troubles in the industry, according to Thomas Toch, a senior fellow with the Carnegie Foundation for the Advancement of Teaching, created a strong incentive for states and testing contractors to write tests that measure largely low-level skills.

When President Obama took office in 2009, he inherited all the flaws of NCLB and standardized testing. But just as frustration with the law was reaching its height, he was also handed an opportunity: the common core movement, an initiative of state governors and the heads of large school systems, was guiding the states toward uniform academic standards, thus solving one of the biggest obstacles to improving tests and raising achievement. Using the vast pool of money established by the 2009 federal stimulus package, Obama prodded the movement along. Specifically, he allocated $330 million for the states to design a cutting-edge, state-of-the-art, nationwide test. In laying out this challenge, the administration established the following guidelines: the tests should be aligned to the new high standards; they should measure deeper learning; they should be computerized; and they should be capable of being used to evaluate not just students but educators. Oh—and they would have to be up and running in classrooms by 2014.

The states banded together to embrace the challenge, and eventually they winnowed themselves into two pioneering R&D teams (with, alas, anesthetically long names): the Partnership for Assessment of Readiness for College and Careers (PARCC) and the Smarter Balanced Assessment Consortium (SBAC). Each team, or consortium, is producing its own tests, but their scoring systems will be comparable—as those of the ACT and the SAT are to each other—so there will essentially be one national benchmark of readiness for college and careers. Over the past several months, these networks of state assessment directors, teachers, college administrators, content experts, and psychometricians have been racking up frequent-flyer miles and phone minutes, hashing out all the intricacies of twenty-first-century assessment. It’s not exactly astronauts and rocket scientists in The Right Stuff—but their efforts may ultimately have as much or more of an impact on the country.

Because the consortia are still letting out contracts, they won’t have test prototypes until this summer. But already the outlines of the two projects are taking shape. Both groups are designing interactive computerized tests that will have far more essays and open-response questions, more practical math exercises, and more word problems than current models. They will both use more nonfiction and informational text in addition to literary text. Both also call for fast machine scoring. The groups have similar goals for the long term, but PARCC, whose assessments won’t even be fully computerized until 2016, is less ambitious and more practical in the short term.

Since they are required by NCLB, most of the tests offered by both consortia will be “summative,” meaning that they summarize the development of learners at a particular time. PARCC, which represents a collaboration between twenty-one different states, is focusing on these kinds of tests, which states will use to hold educators accountable and to judge students’ readiness for college. It will have two assessments, and in a big departure from current practice, one will include performance tasks, such as asking a student to analyze a text using evidence to support claims or having him apply math skills and processes to solve real-world problems. At the end of the year, it will combine these into one summative score. PARCC will also have a speaking and listening test graded by a teacher.

The SBAC, a collaboration between twenty-six different states, will also create summative tests, but it will also develop “formative” assessments—tests that are used to gauge student progress in midstream and help teachers make course corrections. A formative test, which takes place during a sequence of instruction, can consist of anything from calling on a student in class to giving him a math quiz or assigning him a lab report. In each case, the teacher uses the resulting information to adjust her instruction. Designers of the next-generation tests believe that some standardized assessments can be formative, as well.

Susan Headden , a Pulitzer Prize-winning journalist, is a senior writer/editor at Education Sector, a Washington, D.C., think tank.


  • Caroline Grannan on May 09, 2012 3:08 PM:

    Education Sector is a partisan organization that promotes the currently popular package of policies known as "education reform," not an impartial source. This article needs a disclaimer cautioning that it is intended to promote the organization's viewpoint.

  • David on May 10, 2012 4:20 PM:

    Very interesting article. Thanks

  • Janet on May 18, 2012 2:51 AM:

    Standardized assessment is not a bad thing -- but in itself, it does not address two of the largest problems in the American education system.

    First, that impoverished students who (on average) are least prepared to do well in school, will find themselves in schools with the fewest resources for teaching them.

    Second, that teachers who might be willing to take on the huge challenge of teaching and inspiring students with learning disabilities or those whose homes and families haven't given them a solid foundation for school, risk low evaluations because in a school year they may help students make enormous progress and build a basis for future success, but they're not likely to have many students who score above grade level; they start too far behind.

    This article doesn't seriously address either of these problems.

  • Ritsumei on May 30, 2012 11:16 AM:

    I took one of those AP tests (US History) and did well. I remember next to nothing. US history and the philosophy of the Founders has, in the past several years, become a topic of particular interest to me. It's very clear to me that the sort of cramming for the test that year's history course was made of was useful for nothing. I didn't retain ANY knowledge to speak of, and we simply didn't cover much of what made the Founding Era great: it wasn't on the test. I'm NOT impressed with the AP tests. They are useful only as coupons for reduced-cost college credits. The teaching to the test, in my experience, guaranteed that the retention simply wasn't there.

    It is also worthy of note that all powers not delegated to the federal government are reserved to the states or to the people -thus all federal involvement is unconstitutional, as it is in no place in the Constitution delegated to the national government. Federal involvement is usurpation of rights that belong with parents, plain and simple. I find the "common core" movement deeply disturbing, as it relates to our freedom to educate our children, and to freedom in general. This sort of top-down, government-centric educational model is incompatible with our system in which sovereignty rests with "We The People," rather than the ruler. These so-called "common core" initiatives fill me with dread for the implications to our freedoms, and I say kudos to Virginia and any other state that refrains from participation.

    Frankly, putting government in charge of education - arguably the single most important leash on government excess over generations - is no different from putting the fox in charge of the hen house.

  • v98max on May 31, 2012 8:00 AM:

    When my dad's school district first flirted with competency testing, every member of the school board was given the citizenship test for legal immigrants. They all failed. Needless to say, it was quickly determined that the test must be too hard.

  • Liz Wisniewski on June 02, 2012 12:10 PM:

    And we continue to focus on weighing the pig........As a fourth grade teacher, I am encouraged to know that the tests will be improving, and yet as I read this I started smiling. The truth is that using the tests for teachers' information is not really necessary as any halfway decent teacher already knows what their students can do. Spend everyday with 21 kids teaching them for month after month and you know them as learners, you know what they can do and what they can't. If a teacher needs standardized test restults to know if a student cannot do multi-digit multiplication I would suggest that someone check what she is smoking in the outside smoking area.

    If only all this time and money was spent on helping children be "present for learning" and on making sure we hire intellectually energized and well trained people as teachers. Yet, we continue to think that weighing the pig is going to make it fatter.....sigh......

  • Bob Ellingsen on June 04, 2012 2:00 AM:

    I taught AP US History for twenty years, and I think the AP program represented the paradigm for what education ought to be. It kept my feet to the fire; I had to cover a rigorous curriculum and couldn't waste a minute. If I wanted my students to do well on the three essays on the AP test, they had to practice writing all year, and I had to read what they were writing and offer feedback. Moreover, even the multiple-choice questions on an AP exam usually require the student to do more than just recall facts. Finally, the presence of a high-stakes test that has meaning for the student changes the dynamic in the classroom. In a very real sense, the student and I were "on the same side." If he or she did well on the test, both of us would be very happy. "Teaching to the test" is as good or as bad as the test itself.