----- Original Message ----- 
> Subject: definition of "standardized test"
> From: Ken Steele <[EMAIL PROTECTED]>
> Date: Fri, 20 May 2005 13:32:18 -0400
>
> Here is a definition I have never seen before:
>
> "By 'standardized test,' I mean simply a test administered under
> controlled conditions and carefully monitored to prevent cheating."
>
> Richard C. Atkinson,
>
> from "College Admissions and the SAT: A personal perspective.
> APS Observer, May 2005, 18 (5), p. 15
>
> Ken

NOTE:  I receive TiPS in digest format and was able to
read follow-up responses to the original post provided
above.

My first reaction to the post above is: And?
There appears to be a question here but because it is
implicitly stated (something like: "I've never seen
standardized tests defined this way, have you?")
it is not clear whether you're seeking a social consensus
(i.e., there are other people who have "never" seen
this kind of definition for "standardized tests") or
whether you're asking the definition is technically
incorrect.  The responses that I've read so far seem
to try to address both points, raising additional issues
about what Atkinson might be talking about but
I'm not really sure they're on point (however, since
the question is unclear, why should I quibble about
the answers?).

However, I do have some familiarity the situation
involving the use of the SAT and Atkinson's role
in reviewing how the U of California system should
use such a test.  I've also been involved in test
assessment and development myself, having analyzed
SAT scores in different contexts (particularly with
its relationship to age of English acquisition), and,
if the foregoing aren't sufficient qualifications, I've
dated a female research scientist from the College
Board who was involved in some aspects of the SAT
program (actually, the last point was thrown in only
for humor even though I really did date that woman).

It's useful to keep in mind some historical points
regarding psychological testing and the SAT (much
is based on Nicholas Lehman's "The Big Test"
and other refs that problems with source memory
prevent me from acknowledging):

(1)  The concept of "aptitude",  that is, what we today
might define as the ability to perform certain types of
tasks right now but more importantly serving as a predictor
of performance in the future, derives from the intelligence
testing tradition which has historically viewed intelligence
and aptitude from a eugenicist perspective, that is,
intelligence and aptitude are racially/genetically based.

Depending upon one's metaphysical, philosophical,
political, and theoretical orientations, "aptitude" may
be a purely genetic phenomenon (i.e., an aspect of
"native intelligence"),  a purely environmental/experiential
phenomenon (i.e., a  traditional behaviorist/empiricist
perspective), or some combination of these two positions
(i.e., a hopelessly confused but possibly more realistic
perspective).  Aptitude, therefore, seems less like a
psychometric conception and more like an assumption
about the source(s) of some demonstrated performance.

However, if "ability" (as represented by performance on
some test or measure) involves an interaction between
genetically based ability and environmentally based
experience, then "aptitude" by itself is not a terribly
meaningful concept unless one can demostrate that there
are tasks or real-life situations that rely solely on "aptitude"
instead of one's experience/familiarity with them.

(2)  The Scholastic Aptitude Test (SAT) was initially
developed to be a variation on IQ tests, hence the
people behind the development of the SAT (mostly
eugenicists) assumed that they were measuring
innate capabilities or "native intelligence" (the failure
of these assumptions is partially shown in the College
Board's change in the name of the SAT to "Scholastic
Assessment Test" in an attempt to discourage any link
between the test and genetically based ability to today
where "SAT" is simply the "brand name" of the test).

The major players in developing the SAT according
to Lemann include:

James Bryan Conant [president of Harvard in the 1930s
who wanted admissionn to Harvard on academic merit
(i.e., innate intelligence and ability) instead of environmental
factors (i.e., family wealth and social connections); the way
to do so would be to establish a scholarship program for
supporting students solely on the basis of their "aptitude",
the need for which the SAT would fill]

Henry Chauncey [an assistant dean at Harvard during the
1930s who would implement Conant's plans and spearhead
what would become the SAT program, also becoming the
first president of ETS, the company charged by the College
Board to adminster the SAT testing program],

William Bender [another Harvard assistant dean who
would assist Chauncy in converting Harvard in to a
meritocracy]

and  the psychologist Carl Campbell Brigham.

Brigham had worked with Yerkes in conducting mass
IQ testing program during WWI, and who would
go on to become a Princeton psychology professor and
who would actually author the Scholastic Aptitude Test.
Actually, he upgraded versions of the Army IQ tests
to have tougher questions and tested them on Princeton
undergraduates (Lehman dates the first SAT by Brigham
as 1926 -- see his pp30-32).  Though Brigham was a
eugencist early in his life and wrote the eugenicist based
"A Study of American Intelligence", in his later years he
would discount the connection between one's racial/genetic
components and one's intellectual abilities.

However, Chauncey's eugencist perspective led him to
want an "aptitude" test that was really a variation on the
old IQ tests, which is one reason Brigham was selected
for the development of the SAT instead of a competitor named
Ben Wood whose background was that of an achievement
tester (i.e., measurement of mastry of material in a subject
area in contrast to innate mental ability).  Wood had been
a tester with Yerkes as well, studied with Thorndyke at
Columbia, and would go on to oversee the New York
State Regents exam program (a high school graduation
and scholarship program for all NY high schoolers) and
to create the GRE (for the Carnegie Corporation) among
other things.  Again, the SAT was not supposed to reflect
what one had learned or experienced but the concept of
innate ability.

Final word on Chauncy:  he was actually interested in
developing something called the "Census of Abilities"
that would provide a broad profile of native intellectual
abilities of Americans so that one could then assign
individuals to appropriate schools, jobs, "positions in
life" solely on the basis of test scores.  This would be
a true "meritocracy" where everyone's place in society
would be identified for them (something like Huxley's
"Brave New World" but instead of creating different
"genetic castes", testing would identify "natural genetic
castes").  The SAT is just a snapshot of the abilities
that Chauncy thought people had and though he tried,
he never was able to implement his plan for a national
"census of abilities". (see Lehmann, p4-5,70-72)
Chauncy is memorialized at ETS through its "Chauncy
Group" which specializes in testing for particular professions.

(3)  Although concerns about implementing a meritocracy
at Harvard and at other Ivy League schools served as a
primary impetus for the development of the SAT, other
considerations were working to develop a form of nation-
wide "standardized testing".  The IQ testing during WWI
was perceived by some as demonstrating significant problems
with the American educational system (a eugenicist might
re-phrase this into saying that the IQ results just showed
how much deterioation had occurred in the nation's genetic
stock).

Ben Wood (Brigham's competitor for the SAT job) had
supervised a study funded by the Carnegie Corporation
in the late 1920-early 1930s on the state of high schools
and colleges in Pennsylvania.  The conclusion was that
the schools were all a mess primarily because one could
pass a course just by showing up for class and the practice
that today we would call "social promotion".  There was
simply no way to independently demonstrate that students
had learned anything in the courses they had taken. (see
Lehmann p22) This is one source for the development of
"standardized achievement testing" (NOTE:  Anastasi
& Urbina's Pscyh Testing [11th Ed] has an index entry
for "standardized achievement testing" but none for
"standardized test" or "standardized testing" though
there is an entry for "standardization, test").

The mechanism that Wood saw as the basis for developing
this accountability was mass-testing of mastery of material
that was presumably provided in the context of a school
course (i.e., achievement testing).  However, to ensure that
testing conditions did not affect performance on the test,
test-taking conditions had to be standardized, that is, making
identical testing conditions for all students, thus minimizing
variance in the test scores due to test-taking conditions
(I believe that this is the sense in which Atkinson is
relying upon most in statement above).

Wood's goals here were:
(a)  identify students who, on the basis of their achievement
in prior coursework, would be likely to perform well in college, and
(b) "take away the absolute power arbitrary power of teachers
by creating a way for students to show they had mastered
a subject" (Lehmann, p36, 2nd from bottom paragraph).

This left the technical problem of how to actually test and grade
large numbers of students.  Long story short:  a science school
teacher named Reynold B. Johnson developed a device called
the Markograph which electronically detected pencil marks on
paper becasue the carbon in the pencil conducted electricity which
allowed the machine to sense where the pencil mark was on the
page, thus allowing one to identify "correctly placed" and "wrongly
placed" marks.  In 1936, IBM, having bought the Markograph
machine and having hired Reynolds, released its own machine which
was used to grade the NYS Regents exams as well as exams in
the public schools of Rhode Island. (see Lehmann p 37-38).

(4)  For anyone still reading this, let me try to summarize what I
say above in the following statements:

(a)  the aptitude-achievement distinction is not really a psychometrically
based one, rather it depends upon the kinds of assumptions one is
willing to make about the sources for demonstrated "ability":
(i)  aptitude, with the source being genetic or "native intelligence"
(or, in an attempt to avoid a genetic/racial linkage, some diffuse set
of historical or environmental experiences unrelated to schoolish
or classroom learning), or
(ii) achievement, with the source being experiences linked to specific
activities both inside and outside of a classroom, all serving to the
development of knowledge consistent with an academic subject.

The simplest position to take is that performance on a test like the
SAT is due to aptitude.  Clearly, the early developers and proponents
of the SAT thought performance on it was due to aptitude or "native
intelligence".  But the recognition of racial, gender, and class disparities
in the performance on the SAT, from this perspective, would be that
these differences reflect the "innate intelligence" of these different
groups.  Needless to say, the "aptitude only" assumption lacks not
only scientific credibility but political viability as well.  If the SAT is
not an aptitude test, then what is it?  An achievement test?  A reflection
of an aptitude-achievement interaction? Or a model like the following:

SAT Score = contrib(aptitude) + contrib(achiev) + contrib(apt * achiev)

That is, performance on the SAT is due the independent contributions
of aptitude and achievement as well as their interaction.  The real question
is why does this even matter?  What does the SAT tell us, if anything,
about whether or not a person will stay in college and perform well,
especially if we can't condition such predictions on the basis of one's
aptitude and/or achievement, as defined above?

(b)  Atkinson's use of "standardized testing" is more consistent
with Wood's notion that some form of independent testing under
consistent conditions for all students everywhere should occur if
we want to have an adequate or accurate indication of what a
child has learned in school.  This is consistent with achievement
testing because course curricula should specify what children
should learn and know by the end of a course and how to fairly
test all students who went through the curriculum.  I haven't
commented on it before this point but I am assuming that any test
used in the above situations have demonstrated reliability and validity
without which testing under standardized conditions would probably
be meaningless.

Mike Palij
New York University
[EMAIL PROTECTED]






---
You are currently subscribed to tips as: [email protected]
To unsubscribe send a blank email to [EMAIL PROTECTED]

Reply via email to