Faulty Teacher Evaluations Come Under Fire at Education Conference”
by Derrick Haynes
A controversial teacher evaluation method was put on trial at the opening panels of a New York education conference. Panelists, ranging from the D.C. Public Schools chancellor to a one-time candidate for Secretary of Education, drummed out the practice as an ineffective and ”humiliating” way to bring accountability to the education reform movement.
The New York Times Second Annual Schools for Tomorrow conference opened with a panel about raising the low social status of teaching. Linda Darling-Hammond of Stanford University recounted how often students interested in teaching are discouraged from going into the profession. The question “Why are you going into teaching when you can….[insert higher-earning career]?” is the weapon of choice welded by admonishers.
”Other college graduates earn 60 percent [more than] the average teacher makes,” Darling-Hammond told David Brooks, Times columnist and panel moderator, ”In Singapore, they earn what a beginning doctor makes.”
For the next panel, others joined Darling-Hammond, including Brian Crosby, a teacher and co-chair of the English Department at Hoover High School in Glendale, Calif.
Crosby spoke passionately about the feeling that teachers aren’t respected in society and school administrators don’t value their opinions.
”I’m not viewed as an expert,” he said, ”I’m not viewed as a consultant.”
He also attributed part of the disregard for teachers by administrators to the field still being dominated by women - 76 percent of U.S. public school teachers are women.
As discussion progressed, with mentions of the ongoing Chicago strike sprinkled throughout, teacher evaluations quickly became the chief topic of discussion.
Darling-Hammond argued against value-added teacher evaluations, saying that the evaluation’s metric had huge distortions that led to inaccurate results. No professional would want to be evaluated solely by its value-added score, she said.
Value-added modeling (or VAM) is a statistical method that uses a students’ test score data, usually for a year-to-year comparison, to estimate the effects (“added value”) of a teacher to the student’s performance. Some teachers and education experts have argued against using the measure in deciding whether to grant tenure to teachers.
She brought up the case of Carolyn Abbott as an example of what could go wrong when value-added evaluations are weighed heavily in tenure for teachers, especially when those results are published.
Abbott, now an ex-teacher of mathematics for gifted children, began to question her future as a teacher after New York published its Teacher Data Reports for the 2009-2010 school year in February. When Abbott’s seventh-graders took the state test in 2008, they aced it, scoring in the 98th percentile. The following year, as eight-graders who already knew what high schools they would attend took the test, they scored in the 89th percentile. The 10 percentile dropped placed Abbott near the bottom in ranking near the bottom in ranking and complicated her tenure-seeking process in spite of having support from the school’s principal.
The next panel focused entirely on teacher evaluations with Bill Keller, former editor-in-chief of the New York Times, swapping places with Brooks as moderator.
”Let me lay that grenade out,” Keller said to begin the panel discussion after reading a letter from a teacher friend. In the letter, Keller’s friend said that the Chicago strike showed us the result of what happens when schools are micromanaged by people who’ve never been in a classroom: ”foot soldiers” rebel.
While Monty Neil of FairTest, an advocacy non-profit for more accurate standardized testing, tried to use the letter’s focus on test-based evaluations to rally against value-added assessments, one fellow panelist objected.
”We can have a reasonable conversation about what the weight on student achievement gains should be,” said Thomas Kane of Harvard’s Graduate School of Education, before adding that, ”It’s an extreme statement with absolutely no empirical base to say student achievement gains…have no role to play. It would be a mistake.”
Kaya Henderson, the DCPS chancellor, said the district’s three-year-old IMPACT evaluation system, which uses a weighted value-added criteria, is supported by D.C.’s teacher union because she collaborated with it to decrease the weight of value-added from 50 to 35 percent.
Henderson said that the goal of the evaluation system was not simply to ”weed out” bad teachers, but to figure out who the good teachers are, in order to partner them with inexperienced teachers as mentor-mentees.
While the panelists debated how teacher evaluations should be done, there was unanimous agreement in one area: the publishing of teacher evaluations publicly. When Keller asked about the publishing of evaluations, using New York as an example, every single education expert on the stage shook his head for a resounding ”No.”
”Humiliating” and ”embarrassing” (for teachers) were two adjectives thrown around a bit. Yet none of the panelists made clear how the unattended cost of publishing rankings based on statistics with unreliable results: driving away good teachers like Abbott. A 2010 Mathematica Policy Research study found that using a three-year model for value-added scores led to one out of every four teachers rated incorrectly by their value-added score. If policymakers want to actually raise the status of teachers in society, they can start by not forcing out good teachers on the basis of faulty statistics.