Special Report

May/June 2012 A Test Worth Teaching To

The race to fix America’s broken system of standardized exams.

By Susan Headden

One problem with today ’s standardized tests is that they are virtually useless when given to children who are not performing at grade level. The sixth-grade DC CAS, for instance, doesn’t tell Voskuil much about her many students who are barely reading at the third-grade level—it just says that they’re failing. (By the same token, many students who are way ahead of the curve simply register as “proficient.”)The SBAC addresses this challenge with a new kind of test: one whose questions change based on individual student performance. If the student does well, the questions get progressively harder; if he does poorly, they get easier. These so-called computer adaptive tests are more costly to create, but the beauty of them is that they can pinpoint where students really are in their abilities. The SBAC will also offer two optional interim assessments, which will ask students to perform such tasks as making an oral presentation or writing a long article. These exercises, which would take students one or two class periods to complete, will require students to use other materials or work with other people.

Another problem with most current testing regimes is that they consist almost entirely of big tests administered at the end of the year. By the time a teacher learns that her students were having trouble with double-digit multiplication, the kids are already off to summer camp. Thus the new common core system will include more frequent assessments, which will measure skills that have recently been taught, allowing teachers to make mid-course corrections. Assessment, says Margaret Heritage, a testing expert with the University of California, Los Angeles, “needs to be a moving picture, a video stream.”

While it is still too early to describe any of these common core tests in detail, some testing companies have developed prototypes using the same kind of interactive assessment models that the two R&D teams are talking about. One of these prototypes is being developed by the Educational Testing Service (ETS) and piloted in a number of schools. Watching a student use this prototype offers a more concrete glimpse of what the near future of testing might look like.

In a ninth-grade classroom in North Brunswick, New Jersey, a student logs on to a computer. As if viewing an online slide show, he clicks on an aerial photograph of drought-stricken Lake Meade. A pop-up box tells him that his task is to determine what water conservation measures are necessary. Photographs and sketches depict a spillway, a river, a dam, and the lake that the dam has created. Next to these illustrations is a sketch of a sink with a faucet and a stopper. The prototype then asks the student to draw analogies between the pictures—between the stopper and the dam, the faucet and the river, the sink and the lake.

After showing the capacity of the sink in gallons, the prototype asks the student to perform a number of calculations (onscreen and using a built-in calculator) that determine the water ’s f low rate and speed, then to plot them on graphs using the mouse and cursor. It even asks the student to explain some of his choices in writing: for instance, how can you tell from the graph that the slope is three gallons per minute? What is remarkable about this test—aside from the fact that all these calculations actually feed into a simulation of water flowing, just so, into the sink—is how much time it devotes to one subject. It goes deep, in other words, and presents the kind of problem a student might see real value in solving.

The ETS prototype’s writing exam gets at the same kind of deeper learning. On this test, students consider whether junk food should be sold in schools. They must do some quick research using materials supplied by the test, summarize and analyze arguments from those materials, and evaluate their logic. The test even does some teaching along the way, reminding students what the qualities of a good summary are and defining certain words as the cursor rolls over them. The test provides writing samples, such as letters to the editor written by a principal and a student. Do these samples display the qualities of a good summary? The test asks the student to explain why. Is the writer’s logic sound? The student must prove he knows the answer to that question too. Does certain evidence support or weaken a position? The test taker checks off which.

According to ETS researchers, exercises like these are effective at both assessing and encouraging deeper learning. Teachers seem to agree. “The test improves motivation because students make the connection between the assessment and the classroom,” says Amy Rafano, an English teacher in North Brunswick. “The scaffolding is right in the test. Rather than having students just write an essay, the task encourages them to read source materials and adjust [their thinking] while writing. They have to understand where information comes from. This is real-world problem solving , and it gives the students a sense of why these skills are important.” At the very least, exercises like these mark a distinct departure from the generic prompts that serve as essay items on many current tests.

If experts agree on the need for radically different tests, they also agree on how difficult it’s going to be to implement them, especially under the timetable and cost constraints dictated by the Obama administration. Just designing the tests themselves is a monumental job: the writing exercise on the ETS prototype in North Brunswick, for instance, took a team of developers several weeks to create. PARCC and the SBAC must craft hundreds of similar exercises while also making sure they work well together.

Designers of the new tests must also decide how the items should be weighted. Should syntax be counted more than punctuation? Should multiplying fractions be stressed more than graphing linear functions? In addition, educators agree on the need for more open-ended questions, such as those be -ing tested in New Brunswick, but open-ended tests have drawbacks of their own. They are less reliable than multiple-choice exams (an acceptable response can take several different forms, whereas there is only one correct response to a multiple-choice question), and they are “memorable”—meaning they can’t be reused very often if the test is to have any level of security. Most important, scoring open-ended tests is more difficult and time-consuming than scoring fill-in-the-bubble tests. To ensure consistency among the raters, each item must be reviewed many times over. Scoring a short open response that consists of a sentence or two might take a minute—compared to a fraction of a second for a machine-scored multiple-choice item—and scoring an essay could take an hour.

Susan Headden , a Pulitzer Prize-winning journalist, is a senior writer/editor at Education Sector, a Washington, D.C., think tank.


  • Caroline Grannan on May 09, 2012 3:08 PM:

    Education Sector is a partisan organization that promotes the currently popular package of policies known as "education reform," not an impartial source. This article needs a disclaimer cautioning that it is intended to promote the organization's viewpoint.

  • David on May 10, 2012 4:20 PM:

    Very interesting article. Thanks

  • Janet on May 18, 2012 2:51 AM:

    Standardized assessment is not a bad thing -- but in itself, it does not address two of the largest problems in the American education system.

    First, that impoverished students who (on average) are least prepared to do well in school, will find themselves in schools with the fewest resources for teaching them.

    Second, that teachers who might be willing to take on the huge challenge of teaching and inspiring students with learning disabilities or those whose homes and families haven't given them a solid foundation for school, risk low evaluations because in a school year they may help students make enormous progress and build a basis for future success, but they're not likely to have many students who score above grade level; they start too far behind.

    This article doesn't seriously address either of these problems.

  • Ritsumei on May 30, 2012 11:16 AM:

    I took one of those AP tests (US History) and did well. I remember next to nothing. US history and the philosophy of the Founders has, in the past several years, become a topic of particular interest to me. It's very clear to me that the sort of cramming for the test that year's history course was made of was useful for nothing. I didn't retain ANY knowledge to speak of, and we simply didn't cover much of what made the Founding Era great: it wasn't on the test. I'm NOT impressed with the AP tests. They are useful only as coupons for reduced-cost college credits. The teaching to the test, in my experience, guaranteed that the retention simply wasn't there.

    It is also worthy of note that all powers not delegated to the federal government are reserved to the states or to the people -thus all federal involvement is unconstitutional, as it is in no place in the Constitution delegated to the national government. Federal involvement is usurpation of rights that belong with parents, plain and simple. I find the "common core" movement deeply disturbing, as it relates to our freedom to educate our children, and to freedom in general. This sort of top-down, government-centric educational model is incompatible with our system in which sovereignty rests with "We The People," rather than the ruler. These so-called "common core" initiatives fill me with dread for the implications to our freedoms, and I say kudos to Virginia and any other state that refrains from participation.

    Frankly, putting government in charge of education - arguably the single most important leash on government excess over generations - is no different from putting the fox in charge of the hen house.

  • v98max on May 31, 2012 8:00 AM:

    When my dad's school district first flirted with competency testing, every member of the school board was given the citizenship test for legal immigrants. They all failed. Needless to say, it was quickly determined that the test must be too hard.

  • Liz Wisniewski on June 02, 2012 12:10 PM:

    And we continue to focus on weighing the pig........As a fourth grade teacher, I am encouraged to know that the tests will be improving, and yet as I read this I started smiling. The truth is that using the tests for teachers' information is not really necessary as any halfway decent teacher already knows what their students can do. Spend everyday with 21 kids teaching them for month after month and you know them as learners, you know what they can do and what they can't. If a teacher needs standardized test restults to know if a student cannot do multi-digit multiplication I would suggest that someone check what she is smoking in the outside smoking area.

    If only all this time and money was spent on helping children be "present for learning" and on making sure we hire intellectually energized and well trained people as teachers. Yet, we continue to think that weighing the pig is going to make it fatter.....sigh......

  • Bob Ellingsen on June 04, 2012 2:00 AM:

    I taught AP US History for twenty years, and I think the AP program represented the paradigm for what education ought to be. It kept my feet to the fire; I had to cover a rigorous curriculum and couldn't waste a minute. If I wanted my students to do well on the three essays on the AP test, they had to practice writing all year, and I had to read what they were writing and offer feedback. Moreover, even the multiple-choice questions on an AP exam usually require the student to do more than just recall facts. Finally, the presence of a high-stakes test that has meaning for the student changes the dynamic in the classroom. In a very real sense, the student and I were "on the same side." If he or she did well on the test, both of us would be very happy. "Teaching to the test" is as good or as bad as the test itself.