Special Report

May/June 2012 Grand Test Auto

The end of testing.

By Bill Tucker

But the solution may be self-generating. While both stealth assessments and GPS systems must start from an initial map, they also share another critical capability: the potential to become more accurate over time. As GPS software records millions of data points on destinations and routes, it begins to detect otherwise unknown traffic patterns, leading to better and better routing. The same potential holds for stealth assessments. Researchers can use student performance data from across a variety of tasks to update conceptual models and better understand how students learn. David Kuntz, vice president of research at Knewton, one of the companies developing new “adaptive” learning platforms, notes that, just as data collection improves the recommendations of a GPS system, collecting large data sets in the classroom can help to confirm or disconfirm hypotheses about how students learn. And, by comparing how similar students perform when given different types of content or instructional activities, researchers can also begin to understand which learning interventions work for which students, under which conditions.

Education, of course, can’t be reduced to a series of online games. More than just a set of concepts to be learned, it’s also a complex set of relationships: between students, teachers, and the environment in which they learn. Florida State University professor Valerie Shute, who coined the term “stealth assessment,” agrees. Her assessments aren’t meant to replace human teachers, but to help teachers understand student misconceptions and provide recommendations for action. She sees automated scoring and machine-based reasoning techniques as tools for teachers to “infer things that would be too hard for humans.” Just as a pilot uses a navigational computer to crunch vast amounts of data for use in flight, teachers should use these tools to play an ever more active role, reviewing students’ progress and providing better-informed guidance and assistance as they solve problems.

While computers have long been used for drilling facts or equations, Shute is designing her assessments to keep tabs on a deeper kind of learning—the kind that takes greater care and effort to measure but is essential for making sound progress. In math, for example, heavy drilling may help students pass a quiz on, say, fractions. But in order for them to put their knowledge of fractions to good use later on in, say, algebra class, students need a real conceptual grasp of what fractions are and how they work.

Shute’s new project involves building and embedding stealth assessments in the game Crayon Physics Deluxe (CPD). This is meant to cultivate and measure just that kind of mastery. In CPD’s virtual world, students must discover and/or apply their knowledge of the principles of physics, such as gravity, kinetic energy, and inertia, to propel a red ball through various puzzles toward its destination, marked by a yellow star. But in this world, just as in the real world, students aren’t just given problems with one predefined answer with which to solve them. Instead, students experiment with different approaches in a world largely of their own creation. Using a virtual crayon, they draw their solutions. In one instance, they might draw a ramp to roll the ball across an obstacle. In another, they draw a rock that falls on a lever to thrust the ball upward. The game encourages students to continually refine their approach, rewarding not just what it calls “old school” solutions but also more “elegant” ways to move the ball toward its destination.

As they play, CPD is assessing their performance constantly, collecting information on both simple indicators, such as the time spent on a particular problem, and complex information, such as the agents of force and motion—a springboard, say—that students use to accomplish a task. As students play, the assessment draws on more and more of the data points, which are constantly mapped against a model to update an estimate of the student’s competencies. In this case, a teacher could use CPD alongside more traditional instruction, ensuring that students understand the mathematical equations in physics but also the concepts underlying it.

Another stealth assessment in CPD strives to measure students on their care, organization, and persistence in trying to solve problems—what researchers call “conscientiousness.” Research consistently shows that these skills can predict academic achievement but are independent from intelligence or cognitive ability. They are also essential to success in school and life. In CPD, data on persistence, for example, comes primarily from problems that students have trouble solving. CPD tracks the number of times a student tries to solve each problem and the overall time spent on each try. And the assessment is designed so that even the cleverest students are given problems that challenge them; that way, all students are measured on their level of persistence. Of course, there are various pencil-and-paper tests that can measure these skills. But those tests typically involve self-reported items and are taken in isolation from the learning process, as if persistence were a static quality, unrelated to the actual task or content at hand. Shute’s goal is to build stealth assessments that can be inserted into almost any game or interactive lesson, allowing both students and teachers to see how qualities like persistence and creativity relate to their overall performance throughout the course of learning, and may even be improved over time as a function of game play.

So far, all these stealth assessment prototypes fall into the category of what educators call “formative assessments.” That means they are functionally analogous to the kinds of short-term tests, like chapter quizzes, that teachers use for diagnostic purposes—to gauge whether students grasp the lesson you just taught, so you can adjust your instruction in real time. This sets such tests apart from “summative assessments”—the weightier, more stress-inducing tests taken at semester’s or year’s end to judge the performance of students, teachers, and schools. It’s reasonably clear that stealth technology can someday be used for formative assessments. The big question is whether this technology can also eliminate the need for the annual summative testing.

The answer, in theory anyway, is yes. If done correctly, stealth assessments could help educators amass much greater evidence, over time, and at a deeper level, of what a student knows and is able to do. But doing so will require major changes in instruction—changes that would probably be beneficial for a whole host of reasons.

Today, a calendar defines what students learn and how they progress. An eighth-grade U.S. history course fits into two eighteen-week semesters, with a test at the end of each. And no matter what knowledge students walk in with or what they manage to absorb in the first eighteen weeks, the teacher must move on to the second eighteen weeks’ worth of content when the schedule dictates. This is what some educators call a “time-based” approach to education; a “competency-based” approach flips this paradigm. In this model, rather than wait for an end-of-year test, students can demonstrate their competency in a subject over time, allowing them to move on as they are deemed ready. Learning, instead of seat time, defines progress.

Bill Tucker ,since 2005 the managing direc tor of Education Sector, a D.C .-based think tank , will soon be joining the Bill & Melinda Gates Foundation as deputy director, policy development, U. S. Program. He has written about education technology, innovation, and policy for publications including Education Next, Education Week, and Educational Leadership.


  • Judy Willis, M.D., M.Ed. on May 09, 2012 2:42 PM:

    PREDICTION from a neurologist who then became a teacher (2nd, 5th, 7th) and now does professional development: Within five years in some countries (five to ten in others) open internet access for information acquisition will be available on standardized tests. This access will significantly reduce the quantity of data designated for rote memorization.

    The Current Information Load Is Too BIG
    Recall that before 1994 a student would be expelled from the SAT exams for bringing any type of calculator. Starting in 1994, calculators were not only permitted, but were essentially required. The driving factors came from the level of mathematics taught and tested and the availability of graphing calculator technology. This change gave students the appropriate tool for accuracy and efficiency (and the one used by most professionals who used mathematics beyond basic arithmetic). Consider also, that calculator access for these standardized tests did not reduce the instruction in and development of arithmetic automaticity. Mental access, of such facts and procedures as the multiplication tables and manipulation of fractions, without a calculator remains a valued goal for all students.
    We are now in the same nexus of advancement of information and technology to make the equivalent jump for other subjects. Access to the internet for information acquisition during tests (and learning) is the appropriate response now, just as the calculator access was in mathematics almost two decades ago.
    As technology and globalization exponentially increase the available facts and knowledge base of all subjects and professions, the response in education has been to incorporate more and more information into the requirements for each school year. The current system of - if its information � teach it and test it - can no longer support the volume of information. Textbooks cannot get much bigger and the impact of the increasing demands on students to memorize data is increasingly counterproductive.
    In the "real world�, professionals in all specialties and businesses use the superiority of the web over the human brain to accurately hold and retrieve facts and to keep up as �facts� change too quickly for even eBooks to be current and accurate by the time they are released.
    Many practicing physicians do not rely memory, or even textbooks or the latest journals for the most current, accurate information about diagnostic testing, best treatments, and other facts that change daily. For example, before prescribing a medication, the Medscape or Epocrates websites are searched for the most current facts that could have significant impact on a patient�s reactions to the medication. Even for a medication that has been evaluated for cross reactions with other medications when it was tested and when the FDA product information was most recently reported, new information can be critical. That medication could have just been found to cause problems when taken by patients also taking a different medication for another medical condition. Thanks to the physician having access to that new information before prescribing medications, the risk of potential complications is reduced.

    Memorization Breaking Point
    Boredom, frustration, negativity, apathy, self-doubt and the behavioral manifestations of these brain stressors the have increased in the past decade. As facts increase, over-packed curriculum expands, and demands for rote memorization for high stakes testing, the brains of our students have reacted to the increased stress. High stress, including that provoked by sustained or frequent boredom or frustration, detours brain processing away from the higher, rational, prefrontal cortex. In the stress state, the lower, reactive brain is in control. Retrievable memory is not formed and behavioral responses are limited to involuntary fight/flight/freeze � seen in

  • Judy Willis, M.D., M.Ed. on May 09, 2012 2:48 PM:


    Memorization Breaking Point
    Boredom, frustration, negativity, apathy, self-doubt and the behavioral manifestations of these brain stressors the have increased in the past decade. As facts increase, over-packed curriculum expands, and demands for rote memorization for high stakes testing, the brains of our students have reacted to the increased stress. High stress, including that provoked by sustained or frequent boredom or frustration, detours brain processing away from the higher, rational, prefrontal cortex. In the stress state, the lower, reactive brain is in control. Retrievable memory is not formed and behavioral responses are limited to involuntary fight/flight/freeze seen in the classroom as act-out, zone-out, or drop out.

    Students Don't Get the Brains They Need
    Even if the medical, social, psychological, and ethical problems do not promote the change in testing, the economic demands as to what employers want as employee skill sets will inevitably topple the factory model of education.
    The factory model of memorization of facts and procedures that was preparation for assembly line work cannot keep up with the information age requirements for an educated workforce. With the growing in the information base, employers in global industries that develop new products or systems already report they are more interested in a potential employees' abilities to respond quickly and successfully to frequent change, and to communicate, lead, and collaborate, than they are in their like work experience. Desirable employees are those capable of making use of new information and technology to solve new problems and innovate ahead of the competition.
    The lives our students will live and the jobs for which they'll compete will not be about answering questions correctly, but about how they use knowledge and respond to changes. Yet currently the time sacrificed to fact memorization and test prep is resulting in more high school dropouts and students graduating from the secondary system without the preparation to succeed in college, employment, or to lead fulfilling lives.

    Freedom from excessive rote fact memorization focus means teachers can be creative individually and as professional learning communities. There will be reduction of the "management" problems that currently result from stressed-brain reactive behavior. Educators will be able to develop and use more engaging, relevant, and equitable learning experiences enhancing cross-curricular skills and competences. More access to foundational facts, which are not equally acquired by some students with language or learning differences, will mean they are not held back from applying other strengths to build conceptual knowledge and understanding. as students are guided with learning opportunities that develop their executive functions they will develop understanding beyond just knowing. Their extended their neural networks will empower them transfer knowledge to new applications as we help them build the brains to achieve their greatest creative potentials.

    Read Complete Comment in my upcoming staff blog for EDUTOPIA.org Education's Next Big Bang May 11. WEBSITE www.RADTeach.com

  • skeptonomist on June 07, 2012 10:34 AM:

    What might actually be done to evaluate student performance is largely irrelevant, because the current testing regime has been imposed for largely ulterior motives - the desire to break teacher's unions, cut down on expenses, prove that public schools are inferior and allow higher-income people to send their children to private schools without paying school taxes, etc. Thus the support for real improvements in teaching is actually limited, especially as funds are being choked off for public schools. For-profit schools are generally not going to spend a lot of money on novel methods.