Rethinking high-stakes testing
Early in my teaching career I realized how easy it would be to cheat on standardized tests. I realized this by accident, due to my lack of a poker face. The day after a big end-of-unit assessment, a parent of a third-grade student told me her son had gone home upset, because he knew he was not going to get a good grade. She asked him how he knew, and he told her I was making faces that told him he needed to change his answers.
Because I wear my feelings all over my face, I’m sure he was speaking truth. I was probably thinking, “But Timmy, we worked on decomposing three-digit numbers for two weeks, and you demonstrated mastery! Why are you mixing up your 10s and your ones?!”
Sure enough, when we looked at his answer sheet we saw many erasures. Timmy was trying his best to please me. Had I been communicating that his answer was wrong (rather than being unaware of my expression), that would be cheating.
That’s just one of countless ways teachers can manipulate testing outcomes.[1] The tactics can be intentional, as in Atlanta (latex gloves to avoid leaving fingerprints when answer sheets were changed?), or unintentional, as with my experience.[2]
In today’s educational climate, testing scandals are commonplace. Ironically, the testing-heavy No Child Left Behind federal law (2001) was based on reform efforts called the “Houston Miracle.” Houston was the first recipient of the prestigious Broad Prize in 2002 for dramatic increases in student achievement. Yet in 2004, a Dallas Morning News investigation revealed Houston’s “miracle” to be similar to Atlanta’s: endemic testing fraud.
The National Center for Fair & Open Testing released a report last month documenting testing fraud in 37 states and the District of Columbia.[3]
Many educators see the details emerging from Atlanta not as proof that high-stakes testing is destructive to schools, but rather as evidence that leaders must be extra vigilant about testing security and integrity, focusing on rooting out cheating rather than questioning the tests themselves.[4]
Diane Ravitch, a senior fellow at the Brookings Institute, sees testing manipulation as evidence that such accountability-based school reforms are harmful. She contends that “high-stakes testing is ruining education.”[5] Advocates for the end of high-stakes testing, including Ravitch,[6] often mention “Campbell’s Law,” referring to the work of social scientist David Campbell from the 1970s. He wrote:
The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to measure.[7]
In other words, relying on only one measure for decision-making can lead to distortions and bad decisions. Essentially, using high-stakes tests perverts what we are trying to measure – student achievement.
Anti-testing enthusiasts use Campbell’s law as evidence testing is inherently harmful to education. Yet Campbell was a data guy, who spent his career as an evaluator. He identified the limitations of quantitative data, but believed in the power of assessment. In a less quoted section of his paper, he goes on to state:
From my own point of view, achievement tests may well be valuable indicators of general school achievement under conditions of normal teaching aimed at general competence.[8]
As an evaluator, he acknowledged that measuring progress is essential. Knowing that indicators can become distorted or manipulated is not a reason to stop measuring. We can prevent distortion and manipulation in ways that don’t involve videotaping test environments or using armored cars to carry secure test booklets from classrooms to independent testing agencies.
We must use common-sense policies to test under “normal,” or pressure-free, conditions. That means not tying incentives (like the $500,000 Atlanta’s former school superintendent was paid) or life-altering consequences (such as a promotion, job loss or failing a grade) to standardized test outcomes. And no “miracle” should be declared until a school or district shows sustained academic growth over a long period of time.
To combat the limitations of testing, Campbell endorses one simple methodological tool: multiple measures. To evaluate teaching and learning, we must use data that tell a story with words, based on observations and experiences (qualitative data) while simultaneously telling a story based on numbers (quantitative data). If data streams tell a similar story, there should be what researchers call triangulation, meaning the data points coalesce to tell the same story in different ways.
For example, I assume Campbell would support quantitative measures of student academic growth as part of a teacher’s evaluation, not as the only indicator but as one indicator within a larger picture of performance. Campbell is clear: Assessment, valuation and standardized measures are not destructive by design, but destructive when there is no attempt to mitigate their perversion.
As we move past disgust at the latest testing scandal, we must distinguish between assessment and high-stakes testing. We must use data to inform our decisions, but we must realize the inherent limitations of that same data and recognize the harm that undoubtedly comes from over-relying on arbitrary measures to make high-stakes decisions.
Views expressed in this commentary are those of the author and do not necessarily represent the views of the UNC Charlotte Urban Institute or the University of North Carolina at Charlotte.
[1] http://fairtest.org/sites/ui.charlottewp.psapp.dev/files/media/articles/Cheating-50WaysSchoolsManipulateTestScores.pdf
[2] http://www.nytimes.com/2013/03/30/us/former-school-chief-in-atlanta-indicted-in-cheating-scandal.html?pagewanted=all&_r=1&