********************************
 From the New York Times [NYTimes.com], Monday, May 21, 2001.  See
http://www.nytimes.com/2001/05/21/business/21EXAM.html
-----------------------------------
This is Part I of two parts.
********************************
When a Test Fails the Schools, Careers and Reputations Suffer

By Jacques Steinberg and Diana B. Henriques

Sitting in his cramped office in Fort Wayne, Ind., with his 
calculator running, John Kline became the first to suspect that a 
major test publisher had erred in computing the standardized test 
scores of thousands of his students.

As testing director for the local school system, Mr. Kline quickly 
alerted the company, CTB/McGraw-Hill, but it did not fully 
investigate his complaint at the time.

If it had, CTB would have discovered a crippling programming error in 
time to prevent it from upending the lives of students, parents and 
educators as it rippled across the nation over the first eight months 
of 1999. This mishap, the most far-reaching in the recent history of 
school testing, jolted school districts in at least six states, 
including New York City, where it mistakenly sent nearly 9,000 
students packing off to summer school.

A post-mortem of how this error spread unimpeded for so long lays 
bare a basic truth of standardized testing: school districts lack the 
ability to uncover serious testing errors on their own, and must rely 
on the testing companies to do so voluntarily.

Because the testing industry has succeeded in fending off various 
proposals for federal oversight, the companies themselves decide what 
they will disclose and when.

CTB's error hit hardest in New York City, the nation's largest school 
system. Apart from the children, the most prominent victim may have 
been the city's schools chancellor, Rudy Crew. The error showed - 
incorrectly - that reading scores citywide had stagnated after rising 
for two years, raising questions about Dr. Crew's leadership. Within 
months, he was out of a job.

Before the mistake was discovered, Dr. Crew had been a leading 
advocate for using standardized tests to hold students and educators 
accountable. But now, as Congress is poised to vote on a presidential 
proposal that would sharply increase the nation's reliance on 
standardized testing, Dr. Crew says he has been chastened by his 
personal experience with the testing industry.

"The answer is not to use test scores as the sole source of 
information about a student's performance," he said. "These are human 
errors. They're going to happen again."

The issue, then, is how the test companies handle mistakes once they 
occur, educators say. A New York Times examination of CTB's error 
shows that the company had been warned repeatedly by testing 
officials in Indiana, New York City and other districts that their 
percentile scores seemed wrong. While CTB told each not to worry, the 
company did not mention the other complaints.

Then, after finding an error, CTB officials waited seven weeks before 
passing that critical information on to New York City and other 
school districts.

When told of these findings, Dr. Crew, who begins work next month at 
an education foundation in San Francisco, expressed disappointment 
and anger.

"What CTB did was lie," he said.

CTB officials say they did their best to uncover a deeply imbedded 
software problem. Once the problem was located, the officials say, 
they did not immediately alert any school districts because they 
wanted to be absolutely sure of the damage it had caused.

"It was hard to see this," David M. Taggart, the company president, 
said. "But, and I think this speaks to the integrity of our company, 
we didn't stop looking."

Robert Tobias, the longtime testing director in New York City, does 
not accept the company's explanation, particularly in light of the 
early warnings that CTB received.

"They clearly did not check carefully enough," he said. "It's that simple."

Dr. Crew sees a broader problem. "The largest testing companies are 
guilty of what most people accuse public schools of," he said. 
"They've actually got a monopoly."

In Indiana: The First Indication of a Costly Error

CTB has its headquarters in a tan fortress perched atop a hill 
overlooking California's idyllic Monterey Peninsula. Founded in 1926 
by a Los Angeles public school official and his wife, CTB grew into 
an industry giant after being acquired in 1965 by McGraw-Hill, a 
financial information and publishing company.

CTB's biggest rival, NCS Pearson, might score more student tests - 
about one in every two nationwide - but CTB is an industry giant, 
too, providing test design as well as scoring. By 1998, nine million 
students were taking CTB tests annually, about 40 percent of the 
market.

Each spring, answer sheets descend on Monterey like a steady rain, 
with postmarks from as far away as American military bases in Japan. 
Once scored, the results are shipped back to the schools in boxes 
full of numbers that are regarded as the definitive educational 
measure of children and teachers and schools.

Though CTB's work is widely praised by educators, the company did 
make two errors in 1998: one resulted in wrong math scores for a 
number of Missouri school districts; the other affected the math 
scores of a small number of Florida students who took the company's 
tests.

Still, as the 1999 testing season began, CTB was the envy of the 
testing industry. The company could claim nearly 20 states as 
customers, all under contract for several years.

Indiana was one state that believed in CTB, hiring the company to 
test about 320,000 students in grades 3, 6, 8 and 10. But when Mr. 
Kline, the testing director in Fort Wayne, got his district's scores 
in early 1999, he saw that they had plunged unexpectedly.

"I felt sick," he said. "How am I going to explain it to the 
superintendent?" Although Indiana did not use the test to promote 
students, as many states do, the scores gave politicians and 
educators a yardstick to measure student progress. Bad test scores, 
Mr. Kline knew, would echo through the city like a tornado warning, 
causing parents to worry and teachers to wonder what they had done 
wrong.

Before releasing the bad news, Mr. Kline called half a dozen other 
testing directors to see how they fared. To his surprise, each 
described nearly identical drops in scoring. "It was almost 
unbelievable how similar the patterns were," Mr. Kline recalled.

It did not make sense, Mr. Kline thought, for so many students in so 
many places to fail by nearly the same margin. So he called the 
testing company.

CTB officials were not particularly alarmed to hear Mr. Kline's 
complaint, because they knew that when test scores drop, the first 
and easiest reaction of school officials is to blame the test.

But CTB did agree to look into Indiana's scores, and within days it 
found a problem. In trying to compare Indiana students with the rest 
of the country, CTB had used an old formula. When the problem was 
fixed, most student scores rose, some as much as 10 percentage points.

But Mr. Kline still was not satisfied. He and his colleagues told CTB 
that the error did not account for other large, unexplained drops. 
"Our feeling was, `There is still more to it, there's something out 
there that no one's been able to explain,' " he said.

By now, Mr. Kline had come to suspect that the scoring drop could be 
traced to an arcane area of test design called equating.

This process is necessary so scores one year can be compared with 
those from previous years, even if different questions are used. 
States ask for new questions because they are worried the old 
questions will leak out.

CTB told Indiana that its sophisticated software program had insured 
that the current test was comparable, or equated, to the previous 
year's test. But just to be sure, the company agreed to take another 
look. This time, the company said it found nothing wrong. "Our 
confidence in the accuracy of the equating was reconfirmed," CTB told 
Indiana in a memorandum on Jan. 18, 1999.

CTB even sent its president, Mr. Taggart, to Indiana in early March, 
to personally assure educators that the test scores were solid. In a 
follow-up letter, though, the company said it was developing 
"procedures to improve quality control in the future."

Reluctantly, Fort Wayne distributed the results to its schools, but 
not before Mr. Kline had ordered them stamped: "May contain 
inaccurate scores."

Then, with no options left, Mr. Kline gave up, assuming he had heard 
the last of the matter.

In New York: Unearned Tickets to Summer School

In April, about the time Mr. Kline was conceding his fight, 300,000 
students in New York City's public schools were taking their reading 
and math tests in grades 3, 5, 6 and 7. Those tests, too, were 
designed by CTB. And though many of the multiple-choice questions 
were different from Indiana's, both school systems drew some of their 
questions from the same versions of the company's flagship test, 
Terra Nova.

But the New York City Board of Education and its chancellor, Dr. 
Crew, had decided to attach a much greater value to CTB's tests than 
Indiana did. For the first time that spring, students in grades 3 and 
6 were required to pass CTB's test, or attend summer school. And if 
they did poorly in summer school, they would be held back.

Making such decisions based on a single test score violates the 
testing industry's standards, and both CTB and city school officials 
agree that the company advised the city against putting such a 
premium on its test. But the board forged ahead anyway.

Dr. Crew raised the stakes not only for children but also for school 
principals and superintendents of the city's 32 neighborhood school 
districts. He announced that, for the first time, school officials 
would be judged by how well their students did on the CTB tests. 
Those educators whose students scored poorly faced the loss of their 
jobs.

Dr. Crew's future was also at stake. For two years, Dr. Crew had 
managed to do something that had eluded his predecessor, Ramon C. 
Cortines: forge a warm relationship with Mayor Rudolph W. Giuliani. 
But that was changing. The issue: school vouchers.

Mr. Giuliani said he believed that taxpayer money should help finance 
private-school tuition for thousands of students who were attending 
failing public schools. Dr. Crew disagreed with the mayor, and he did 
so publicly.

So long as test scores kept going up, Dr. Crew felt that he could 
defend his position. If the scores were bad, Dr. Crew's own job would 
be on the line.

When the eagerly awaited reading scores arrived from Monterey in 
early May, Mr. Tobias, the New York system's testing director, was 
among the first to see them.

The news was not good. As in Indiana, many of the students' scores 
had dipped sharply from the previous year - so steeply and uniformly 
as to appear improbable, Mr. Tobias thought. Knowing how high the 
stakes were this year, Mr. Tobias directed his staff to ask CTB 
whether it had made a mistake. The company's response, Mr. Tobias 
recalls, was as swift as it was definitive: "We can't find anything 
wrong."

Mr. Tobias continued to press CTB, eventually calling the company 
himself to make an argument the company had already heard: perhaps 
the tests from one year to the next were not quite equal. No one told 
him that he was echoing Indiana's earlier suspicions.

Still, CTB held firm. "If we were not comfortable, we would have 
advised them not to release the data," said Mr. Taggart, CTB's 
president.

Unsure of what to do, Mr. Tobias held off releasing the results until 
June 8, the last possible day the scores could be used to make 
summer-school assignments.

As the date approached, Mr. Tobias finally told Dr. Crew about his 
doubts. Dr. Crew says he seriously considered calling the press to 
disavow the results. But as a national spokesman for the movement 
toward standardized assessment, Dr. Crew decided his credibility 
would be lost. He thought he would be seen as a crybaby.

Mr. Tobias concurred.

"Errors of measurement are a fact of life in this business," Mr. 
Tobias said in an interview. "There are times you can explain them. 
Other times you just bite the bullet and accept the data as they are."

And so, Dr. Crew summoned reporters to deliver the disappointing 
news: two years of progress in reading had apparently stalled.

The mayor said he was "very alarmed and concerned." And Dr. Crew knew 
he had some homework to do.

In Tennessee: State Officials Seek Review of Test

Most school districts, including New York City, gauge progress by 
comparing students in a particular grade with their predecessors in 
the same grade a year earlier. But Tennessee has long used a more 
sophisticated approach: it compares a student's test scores as a 
first grader with that same student's scores as a second grader, 
third grader, and so on through school.

This approach was pioneered and overseen by William Sanders, a 
longtime professor at the University of Tennessee, who was curious 
about how class size and teaching styles influenced student 
performance.

In early May 1999, when Professor Sanders received Tennessee's scores 
from CTB, he knew from his own data that they could not be right, 
state testing officials said. The drops were much too sharp.

Again, state officials recall the company saying not to worry - the 
scores were accurate. But Tennessee had something that Indiana and 
New York City did not: a treasure trove of data on the performance of 
actual children going back six years or more. CTB's results broke 
patterns in individual students' scores that had been uninterrupted 
for years.

Professor Sanders was so insistent that there was a problem that he 
told the company he would call a news conference to challenge the 
results, Tennessee school officials said.

Then CTB did something that it would not do in any other state: it 
simply raised the comparative rankings of many Tennessee students, 
and lowered some others, to conform with Mr. Sanders's statistical 
models - even though the company could find no error to justify those 
changes.

The company made this adjustment in late May or early June, just as 
it was assuring New York City that its results were correct.

CTB did not tell any of its other customers what it had done for 
Tennessee. CTB considers its relationship with each state or district 
to be confidential, even if the products that state uses are similar 
to others, said Mr. Taggart, the company president.

Moreover, Mr. Taggart said, CTB's researchers had not yet detected 
any similarity in the complaints from New York City, Tennessee, 
Indiana and another state, Nevada, which had contacted the company 
around the same time. Finding a common thread was difficult, Mr. 
Taggart said, because each had used a customized version of the same 
basic test.

But after certifying New York City's results as accurate, and 
altering Tennessee's results, CTB began to have its own doubts, the 
company now says. In June and into July, unbeknown to its customers, 
CTB assigned an army of researchers to investigate its results.
---------------------------
PART II WILL FOLLOW SHORTLY.
***************************************************
-- 
Jerry P.Becker
Department of Curriculum & Instruction
Southern Illinois University
Carbondale, IL  62901-4610  USA
Phone:  (618) 453-4241  [O]
             (618)  457-8903 [H]
Fax:      (618) 453-4244
E-mail:   [EMAIL PROTECTED]

----------------------------------------------------
This is the CPS Mathematics Teacher Discussion List. 

To unsubscribe, send a message to
<[EMAIL PROTECTED]>

For more information:
<http://home.sprintmail.com/~mikelach/subscribe.html>.

To search the archives:
<http://www.mail-archive.com/science%40lists.csi.cps.k12.il.us/>

Reply via email to