******************************** From the New York Times [NYTimes.com], Monday, May 21, 2001. See http://www.nytimes.com/2001/05/21/business/21EXAM.html ----------------------------------- This is Part I of two parts. ******************************** When a Test Fails the Schools, Careers and Reputations Suffer By Jacques Steinberg and Diana B. Henriques Sitting in his cramped office in Fort Wayne, Ind., with his calculator running, John Kline became the first to suspect that a major test publisher had erred in computing the standardized test scores of thousands of his students. As testing director for the local school system, Mr. Kline quickly alerted the company, CTB/McGraw-Hill, but it did not fully investigate his complaint at the time. If it had, CTB would have discovered a crippling programming error in time to prevent it from upending the lives of students, parents and educators as it rippled across the nation over the first eight months of 1999. This mishap, the most far-reaching in the recent history of school testing, jolted school districts in at least six states, including New York City, where it mistakenly sent nearly 9,000 students packing off to summer school. A post-mortem of how this error spread unimpeded for so long lays bare a basic truth of standardized testing: school districts lack the ability to uncover serious testing errors on their own, and must rely on the testing companies to do so voluntarily. Because the testing industry has succeeded in fending off various proposals for federal oversight, the companies themselves decide what they will disclose and when. CTB's error hit hardest in New York City, the nation's largest school system. Apart from the children, the most prominent victim may have been the city's schools chancellor, Rudy Crew. The error showed - incorrectly - that reading scores citywide had stagnated after rising for two years, raising questions about Dr. Crew's leadership. Within months, he was out of a job. Before the mistake was discovered, Dr. Crew had been a leading advocate for using standardized tests to hold students and educators accountable. But now, as Congress is poised to vote on a presidential proposal that would sharply increase the nation's reliance on standardized testing, Dr. Crew says he has been chastened by his personal experience with the testing industry. "The answer is not to use test scores as the sole source of information about a student's performance," he said. "These are human errors. They're going to happen again." The issue, then, is how the test companies handle mistakes once they occur, educators say. A New York Times examination of CTB's error shows that the company had been warned repeatedly by testing officials in Indiana, New York City and other districts that their percentile scores seemed wrong. While CTB told each not to worry, the company did not mention the other complaints. Then, after finding an error, CTB officials waited seven weeks before passing that critical information on to New York City and other school districts. When told of these findings, Dr. Crew, who begins work next month at an education foundation in San Francisco, expressed disappointment and anger. "What CTB did was lie," he said. CTB officials say they did their best to uncover a deeply imbedded software problem. Once the problem was located, the officials say, they did not immediately alert any school districts because they wanted to be absolutely sure of the damage it had caused. "It was hard to see this," David M. Taggart, the company president, said. "But, and I think this speaks to the integrity of our company, we didn't stop looking." Robert Tobias, the longtime testing director in New York City, does not accept the company's explanation, particularly in light of the early warnings that CTB received. "They clearly did not check carefully enough," he said. "It's that simple." Dr. Crew sees a broader problem. "The largest testing companies are guilty of what most people accuse public schools of," he said. "They've actually got a monopoly." In Indiana: The First Indication of a Costly Error CTB has its headquarters in a tan fortress perched atop a hill overlooking California's idyllic Monterey Peninsula. Founded in 1926 by a Los Angeles public school official and his wife, CTB grew into an industry giant after being acquired in 1965 by McGraw-Hill, a financial information and publishing company. CTB's biggest rival, NCS Pearson, might score more student tests - about one in every two nationwide - but CTB is an industry giant, too, providing test design as well as scoring. By 1998, nine million students were taking CTB tests annually, about 40 percent of the market. Each spring, answer sheets descend on Monterey like a steady rain, with postmarks from as far away as American military bases in Japan. Once scored, the results are shipped back to the schools in boxes full of numbers that are regarded as the definitive educational measure of children and teachers and schools. Though CTB's work is widely praised by educators, the company did make two errors in 1998: one resulted in wrong math scores for a number of Missouri school districts; the other affected the math scores of a small number of Florida students who took the company's tests. Still, as the 1999 testing season began, CTB was the envy of the testing industry. The company could claim nearly 20 states as customers, all under contract for several years. Indiana was one state that believed in CTB, hiring the company to test about 320,000 students in grades 3, 6, 8 and 10. But when Mr. Kline, the testing director in Fort Wayne, got his district's scores in early 1999, he saw that they had plunged unexpectedly. "I felt sick," he said. "How am I going to explain it to the superintendent?" Although Indiana did not use the test to promote students, as many states do, the scores gave politicians and educators a yardstick to measure student progress. Bad test scores, Mr. Kline knew, would echo through the city like a tornado warning, causing parents to worry and teachers to wonder what they had done wrong. Before releasing the bad news, Mr. Kline called half a dozen other testing directors to see how they fared. To his surprise, each described nearly identical drops in scoring. "It was almost unbelievable how similar the patterns were," Mr. Kline recalled. It did not make sense, Mr. Kline thought, for so many students in so many places to fail by nearly the same margin. So he called the testing company. CTB officials were not particularly alarmed to hear Mr. Kline's complaint, because they knew that when test scores drop, the first and easiest reaction of school officials is to blame the test. But CTB did agree to look into Indiana's scores, and within days it found a problem. In trying to compare Indiana students with the rest of the country, CTB had used an old formula. When the problem was fixed, most student scores rose, some as much as 10 percentage points. But Mr. Kline still was not satisfied. He and his colleagues told CTB that the error did not account for other large, unexplained drops. "Our feeling was, `There is still more to it, there's something out there that no one's been able to explain,' " he said. By now, Mr. Kline had come to suspect that the scoring drop could be traced to an arcane area of test design called equating. This process is necessary so scores one year can be compared with those from previous years, even if different questions are used. States ask for new questions because they are worried the old questions will leak out. CTB told Indiana that its sophisticated software program had insured that the current test was comparable, or equated, to the previous year's test. But just to be sure, the company agreed to take another look. This time, the company said it found nothing wrong. "Our confidence in the accuracy of the equating was reconfirmed," CTB told Indiana in a memorandum on Jan. 18, 1999. CTB even sent its president, Mr. Taggart, to Indiana in early March, to personally assure educators that the test scores were solid. In a follow-up letter, though, the company said it was developing "procedures to improve quality control in the future." Reluctantly, Fort Wayne distributed the results to its schools, but not before Mr. Kline had ordered them stamped: "May contain inaccurate scores." Then, with no options left, Mr. Kline gave up, assuming he had heard the last of the matter. In New York: Unearned Tickets to Summer School In April, about the time Mr. Kline was conceding his fight, 300,000 students in New York City's public schools were taking their reading and math tests in grades 3, 5, 6 and 7. Those tests, too, were designed by CTB. And though many of the multiple-choice questions were different from Indiana's, both school systems drew some of their questions from the same versions of the company's flagship test, Terra Nova. But the New York City Board of Education and its chancellor, Dr. Crew, had decided to attach a much greater value to CTB's tests than Indiana did. For the first time that spring, students in grades 3 and 6 were required to pass CTB's test, or attend summer school. And if they did poorly in summer school, they would be held back. Making such decisions based on a single test score violates the testing industry's standards, and both CTB and city school officials agree that the company advised the city against putting such a premium on its test. But the board forged ahead anyway. Dr. Crew raised the stakes not only for children but also for school principals and superintendents of the city's 32 neighborhood school districts. He announced that, for the first time, school officials would be judged by how well their students did on the CTB tests. Those educators whose students scored poorly faced the loss of their jobs. Dr. Crew's future was also at stake. For two years, Dr. Crew had managed to do something that had eluded his predecessor, Ramon C. Cortines: forge a warm relationship with Mayor Rudolph W. Giuliani. But that was changing. The issue: school vouchers. Mr. Giuliani said he believed that taxpayer money should help finance private-school tuition for thousands of students who were attending failing public schools. Dr. Crew disagreed with the mayor, and he did so publicly. So long as test scores kept going up, Dr. Crew felt that he could defend his position. If the scores were bad, Dr. Crew's own job would be on the line. When the eagerly awaited reading scores arrived from Monterey in early May, Mr. Tobias, the New York system's testing director, was among the first to see them. The news was not good. As in Indiana, many of the students' scores had dipped sharply from the previous year - so steeply and uniformly as to appear improbable, Mr. Tobias thought. Knowing how high the stakes were this year, Mr. Tobias directed his staff to ask CTB whether it had made a mistake. The company's response, Mr. Tobias recalls, was as swift as it was definitive: "We can't find anything wrong." Mr. Tobias continued to press CTB, eventually calling the company himself to make an argument the company had already heard: perhaps the tests from one year to the next were not quite equal. No one told him that he was echoing Indiana's earlier suspicions. Still, CTB held firm. "If we were not comfortable, we would have advised them not to release the data," said Mr. Taggart, CTB's president. Unsure of what to do, Mr. Tobias held off releasing the results until June 8, the last possible day the scores could be used to make summer-school assignments. As the date approached, Mr. Tobias finally told Dr. Crew about his doubts. Dr. Crew says he seriously considered calling the press to disavow the results. But as a national spokesman for the movement toward standardized assessment, Dr. Crew decided his credibility would be lost. He thought he would be seen as a crybaby. Mr. Tobias concurred. "Errors of measurement are a fact of life in this business," Mr. Tobias said in an interview. "There are times you can explain them. Other times you just bite the bullet and accept the data as they are." And so, Dr. Crew summoned reporters to deliver the disappointing news: two years of progress in reading had apparently stalled. The mayor said he was "very alarmed and concerned." And Dr. Crew knew he had some homework to do. In Tennessee: State Officials Seek Review of Test Most school districts, including New York City, gauge progress by comparing students in a particular grade with their predecessors in the same grade a year earlier. But Tennessee has long used a more sophisticated approach: it compares a student's test scores as a first grader with that same student's scores as a second grader, third grader, and so on through school. This approach was pioneered and overseen by William Sanders, a longtime professor at the University of Tennessee, who was curious about how class size and teaching styles influenced student performance. In early May 1999, when Professor Sanders received Tennessee's scores from CTB, he knew from his own data that they could not be right, state testing officials said. The drops were much too sharp. Again, state officials recall the company saying not to worry - the scores were accurate. But Tennessee had something that Indiana and New York City did not: a treasure trove of data on the performance of actual children going back six years or more. CTB's results broke patterns in individual students' scores that had been uninterrupted for years. Professor Sanders was so insistent that there was a problem that he told the company he would call a news conference to challenge the results, Tennessee school officials said. Then CTB did something that it would not do in any other state: it simply raised the comparative rankings of many Tennessee students, and lowered some others, to conform with Mr. Sanders's statistical models - even though the company could find no error to justify those changes. The company made this adjustment in late May or early June, just as it was assuring New York City that its results were correct. CTB did not tell any of its other customers what it had done for Tennessee. CTB considers its relationship with each state or district to be confidential, even if the products that state uses are similar to others, said Mr. Taggart, the company president. Moreover, Mr. Taggart said, CTB's researchers had not yet detected any similarity in the complaints from New York City, Tennessee, Indiana and another state, Nevada, which had contacted the company around the same time. Finding a common thread was difficult, Mr. Taggart said, because each had used a customized version of the same basic test. But after certifying New York City's results as accurate, and altering Tennessee's results, CTB began to have its own doubts, the company now says. In June and into July, unbeknown to its customers, CTB assigned an army of researchers to investigate its results. --------------------------- PART II WILL FOLLOW SHORTLY. *************************************************** -- Jerry P.Becker Department of Curriculum & Instruction Southern Illinois University Carbondale, IL 62901-4610 USA Phone: (618) 453-4241 [O] (618) 457-8903 [H] Fax: (618) 453-4244 E-mail: [EMAIL PROTECTED] ---------------------------------------------------- This is the CPS Mathematics Teacher Discussion List. To unsubscribe, send a message to <[EMAIL PROTECTED]> For more information: <http://home.sprintmail.com/~mikelach/subscribe.html>. To search the archives: <http://www.mail-archive.com/science%40lists.csi.cps.k12.il.us/>