Just yesterday I said about the Chicago teacher’s strike “In my city (Chicago) there is a teacher’s strike currently stalling the opening of the school year. This is one of very, very few strikes in this era of crippled unions. The surrounding discourse is very disappointing. Mostly there are newspaper editorials that are shallow, often appealing to doing our “best for our children.” The real issues are nowhere to be seen.”
Well, today Eric Zorn of the Chicago Tribune made a liar out of me with the thoughtful (and fact-based!) column I reblog below. (I do not reblog stuff as a rule, but Mr. Zorn graciously gave his permission.)
Why teachers have test anxiety, too
by Eric Zorn, Chicago Tribune
The statement of the obvious: Bad teachers are afraid of being evaluated based on how well their students perform on standardized tests. When they fail their students, their students fail them.
The question: But why are so many presumptively good teachers also afraid? Why has the role of testing in teacher evaluations been a major sticking point in the public schools strike in Chicago?
The short answer: Because student test scores provide unreliable and erratic measurements of teacher quality. Because studies show that from subject to subject and from year to year, the same teacher can look alternately like a golden apple and a rotting fig.
The background: Statisticians have known for years that end-of-year student test scores alone aren’t a good gauge of teacher performance and have sought instead to try to measure the degree to which year-to-year improvements (or decreases) in student achievement can be attributed to the individual teacher.
This is the “value-added” approach that Chicago Public Schools have proposed to use for up to 40 percent of teacher-evaluation scores.
Refinements over the years have tried to take into account more and more of the different educational challenges, even within the same school, that can distort the scores. Add in a few extra pupils with learning disabilities, behavioral issues or language difficulties, for example, and even the best teachers will struggle to add value.
The analogy: A fertilizer test.
In a critical takedown of the value-added approach (.pdf) published this year in Notices of the American Mathematical Society, John Ewing, president of Math for America, an organization dedicated to improving high school math education, invited readers to consider the way scientists might compare the effectiveness of an array of fertilizers on different plants under various types of soil conditions.
Scientists would mix and match on dozens of plots of land and chart the growth and health of the plants over time. And with luck, in the end, they’d come up with simple fertilizer ratings that gardeners and farmers could use with confidence, year after year, on plants and in conditions not specifically measured by the test.
The bad luck: It turns out that when you chart the achievement growth of students (plants in our analogy) and try to take into account the socioeconomic factors (soil conditions) that affect educational attainment, there still are too many variables to yield a reliable, consistent measurement of the quality of teachers (the fertilizers).
Ewing quotes from a 2010 report from the Economic Policy Institute:
Analyses of (value-added model) results have led researchers to doubt whether the methodology can accurately identify more and less effective teachers. (Value-added model) estimates have proven to be unstable across statistical models, years and classes that teachers teach.
One study found that across five large urban districts, among teachers who were ranked in the top 20 percent of effectiveness in the first year, fewer than a third were in that top group the next year, and another third moved all the way down to the bottom 40 percent.
Another found that teachers’ effectiveness ratings in one year could only predict from 4 percent to 16 percent of the variation in such ratings in the following year.
The confirmation: Last year, 10 leading academics in the field of educational testing wrote a letter that said value-added measurements “are too unstable and too vulnerable to many sources of error to be used as a major part of teacher evaluation.”
It concluded, “Proposals that would place significant emphasis on this untested strategy . . . could have serious negative consequences for teachers and for the most vulnerable students.”
Author and independent education researcher Gary Rubinstein published on the Teach for Us blog a five-part analysis of 2007-10 value-added data collected on individual New York City teachers ( Part 1 and Part 2 and Part 3 and Part 4 and Part 5).
He found so many startling and absurd results that he begged his readers to “spread the word, since calculations like these will soon be used in nearly every state.”
The statement of the obvious, part deux: School officials need to find ways to identify and weed out bad teachers. But they, and the good teachers in their charge, should be very wary of using test scores.
(links at http://blogs.chicagotribune.com/news_columnists_ezorn/2012/09/test-anxiety.html)
Problems with the use of student test scores to evaluate teachers — Economic Policy Institute, 2010
Analyzing Released NYC Value-Added Data Part 1 by Gary Rubinstein, Teach for Us. See also Part 2 and Part 3 and Part 4 and Part 5.
“Here’s the letter that 10 assessment experts sent to the New York State Board of Regents (in 2011) urging it not to approve a system that links student standard test scores to the evaluations of teachers and principals”…Valerie Strauss, Washington Post
Mathematical Intimidation: Driven by the Data (.pdf) by John Ewing, Math for America
Standardized test scores are worst way to evaluate teachers by Isabel Nunez, associate professor at the Center for Policy Studies and Social Justice at Concordia University Chicago (Sun-Times)
The Toxic Trifecta, Bad Measurement & Evolving Teacher Evaluation Policies by Bruce D. Baker, professor in the Graduate School of Education at Rutgers (School Finance 101 blog)
Performance or Effectiveness? A Critical Distinction for Teacher Evaluation – Rod McCloy & Andrea Sinclair, Education Week
Why Standardized Tests Don’t Measure Educational Quality by W. James Popham, emeritus professor, UCLA Graduate School of Education and Information Studies (ASCD –formerly the Association for Supervision and Curriculum Development — “an educational leadership organization dedicated to advancing best practices and policies for the success of each learner”)
Director of Private School Where Rahm Sends His Kids Opposes Using Testing for Teacher Evaluations In These Times
Writing on the University of Chicago’s Lab School website two years ago, (Chicago Lab School director David) Magill noted, “Measuring outcomes through standardized testing and referring to those results as the evidence of learning and the bottom line is, in my opinion, misguided and, unfortunately, continues to be advocated under a new name and supported by the current [Obama] administration.”
Review of Learning About Teaching, National Education Policy Center
The data in fact indicate that a teachers’ value-added for the state test is not strongly related to her effectiveness in a broader sense. … many teachers whose value-added for one test is low are in fact quite effective when judged by the other . . .there is every reason to think that the problems with value-added measures … would be worse in a high-stakes environment…