ABSTRACT: Pre/post testing is commonly dismissed as a valid gauge of
course effectiveness on the grounds that change scores are
unreliable, as claimed by Cronback & Furby in their influential "How
we should measure 'change'- or should we?" The work of David Rogosa
and others is cited as casting considerable doubt on the
Cronback/Furby thesis, consistent with the fact that *formative*
pre/post testing has been quite effective in promoting the
improvement of introductory physics courses worldwide.
If you reply to this long (17kB) post, please don't hit the reply
button unless you prune the copy of this post that may appear in your
reply down to a few relevant lines, otherwise the entire already
archived post may be needlessly resent to subscribers.
In an AERA-D post of 22 March titled "The Value of Pre/post Testing,"
[Hake (2006a] I quoted Carnegie scholar Lloyd Bond:
"Psychometricians don't like 'change' or 'difference' scores in
statistical analyses because, among other things, they tend to have
lower reliability than the original measures themselves. Their
objection to change scores is embodied in the very title of a famous
paper by Cronbach and Furby (1970) 'How should we measure change, or
should we?' "
A subscriber to Gerald Bracey's "Education Disinformation Detection and
Reporting Agency (EDDRA) <http://groups.yahoo.com/group/eddra/> wrote
to me privately: ". . .the testing situation of Cronbach and Furby
(1970) . . . was such a special case that it almost never arose in
real life. The most important [criticism] was by a prof at Stanford,
about whom I can only recall that his first name is David. . ."
The Stanford professor is David Rogosa
<http://ed.stanford.edu/suse/faculty/displayRecord.php?suid=rag>/>
and <http://www.stanford.edu/~rag/>.
In "Design-Based Research: A Primer for Physics Education
Researchers" [Hake (2004a)] I wrote:
". . . . the canonical anti-pre/post arguments by the psychometric
authorities Lord (1956, 1958) and Cronbach & Furby (1970) that gain
scores are unreliable, have been called into question by e.g., Werner
Wittmann (1997), former Cronbach student David Rogosa (1995), Rogosa
& Willett (1983, 1985), Zimmerman & Williams (1982), and Collins and
Horn (1991). All this more recent work should (but does not) serve as
an antidote for the emotional pre/post paranoia that grips many
educational researchers."
And in "The Physics Education Reform Effort: A Possible Model for
Higher Education," [Hake (2005)], I wrote [bracketed by lines
"HHHHHHH. . .":
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
In my opinion, the physics-education reform model - measurement and
improvement of cognitive gains by faculty disciplinary experts in
their own courses - can provide a crucial complement to the top-down
approaches of Hersh (2005) and Klein et al. (2005). Such pre/post
testing, pioneered by economists [Paden & Moyer (1969)] and
physicists [Halloun & Hestenes (1985a,b)], is rarely employed in
higher education, in part because of the tired old canonical
objections recently lodged by Suskie (2004) and countered by Hake
(2004b) and Scriven (2004). Despite the nay-sayers, pre/post testing
is gradually gaining a foothold in introductory astronomy, economics,
biology, chemistry, computer science, economics, engineering, and
physics courses [see Hake (2004c) for references].
It should be emphasized that such low-stakes formative pre/post
testing is the polar opposite of the high-stakes summative testing
mandated by the U.S. Department of Education's "No Child Left Behind
Act" for K-12 (USDE 2005a) that is now contemplated for higher
education (USDE 2005b). As the NCLB experience shows, such testing
often falls victim to "Campbell's Law" (Campbell 1975, Nichols &
Berliner 2005):
"THE MORE ANY QUANTITATIVE SOCIAL INDICATOR IS USED FOR SOCIAL
DECISION MAKING, THE MORE SUBJECT IT WILL BE TO CORRUPTION PRESSURES
AND THE MORE APT IT WILL BE TO DISTORT AND CORRUPT THE SOCIAL
PROCESSES IT IS INTENDED TO MONITOR."
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Why is the pre/post testing discussed above regarded as formative? In
the disciplines indicated above, both teachers' "action research" and
education researchers' scientific research is carried out to improve
classroom teaching and learning, NOT to rate instructors or students.
Thus it's "formative" as defined by JCSEE (1994): "Formative
evaluation is evaluation designed and used to improve an object,
especially when it is still being developed."
BTW, for a recent book on value-added assessment see Lissitz (2005).
I have not seen the book but I suspect that the contributors -
evidently mostly from the PEP (Psychology/Education/Psychometric)
community - are all either oblivious or dismissive of the pre/post
testing research being carried out in such fields as astronomy,
economics, biology, chemistry, computer science, engineering, and
physics.
Richard Hake, Emeritus Professor of Physics, Indiana University
24245 Hatteras Street, Woodland Hills, CA 91367
<[EMAIL PROTECTED]>
<http://www.physics.indiana.edu/~hake>
<http://www.physics.indiana.edu/~sdi>
REFERENCES [Tiny URL's courtesy <http://tinyurl.com/create.php>]
Bond, L. 2006. "Who Has the Lowest Prices?" online at
<http://www.carnegiefoundation.org/perspectives/sub.asp?key=245&subkey=569>.
Campbell, D.T. 1975. "Assessing the impact of planned social change,"
in G. Lyons, ed., Social research and public policies: The
Dartmouth/OECD Conference, Chapter 1, pp. 3-45. Dartmouth College
Public Affairs Center, p. 35; online at
<http://www.wmich.edu/evalctr/pubs/ops/ops08.pdf> (196 kB).
Collins, L.M. & J.L. Horn. 1991. "Best methods for the analysis of
change," American Psychological Association. See also Collins & Sayer
(2001).
Collins, L.M. & A.G. Sayer. 2001. "New Methods for the Analysis of
Change." American Psychological Association. APA information at
<http://www.apa.org/books/4318991.html>, including the Table of
Contents <http://www.apa.org/books/4318991t.html>.
Cronbach, L. & L. Furby. 1970. "How we should measure 'change'- or
should we?" Psychological Bulletin 74: 68-80.
Hake, R.R. 2004a. "Design-Based Research: A Primer for Physics
Education Researchers," submitted to the "American Journal of
Physics" on 10 June 2004 (NO RESPONSE!!); online as reference 34 at
<http://www.physics.indiana.edu/~hake>, or download directly as a
310kB pdf by clicking on
<http://www.physics.indiana.edu/~hake/DBR-AJP-6.pdf>.
Hake, R.R. 2004b. "Re: pre-post testing in assessment," online at
<http://listserv.nd.edu/cgi-bin/wa?A2=ind0408&L=pod&P=R9135&I=-3>.
Post of 19 Aug 2004 13:56:07-0700 to POD.
Hake, R.R. 2004c. "Re: Measuring Content Knowledge," POD posts of 14
&15 Mar 2004, online at
<http://listserv.nd.edu/cgi-bin/wa?A2=ind0403&L=pod&P=R13279&I=-3> and
<http://listserv.nd.edu/cgi-bin/wa?A2=ind0403&L=pod&P=R13963&I=-3>.
Hake, R. R. 2005. "The Physics Education Reform Effort: A Possible
Model for Higher Education," online at
<http://www.physics.indiana.edu/~hake/NTLF42.pdf> (100 kB). This is a
slightly edited version of an article that was (a) published in the
National Teaching and Learning Forum 15(1), December 2005, online to
subscribers at
<http://www.ntlf.com/FTPSite/issues/v15n1/physics.htm>, and (b)
disseminated by the Tomorrow's Professor list
<http://ctl.stanford.edu/Tomprof/postings.html> as Msg. 698 on 14 Feb
2006.
Hake, R.R. 2006a. "The Value of Pre/post Testing," online at
<http://lists.asu.edu/cgi-bin/wa?A2=ind0603&L=aera-d&T=0&F=&S=&X=175E691B545B6F570B&Y=rrhake%40earthlink.net&P=3335>,
or more compactly at <http://tinyurl.com/jnfvc>. Post of 22 Mar 2006
09:20:26-0800 to AERA-D, AERA-L, & EdStat. See also Hake (2006b).
Hake, R.R. 2006b. The Value of Pre/post Testing (was preschool)"
online at
<http://listserv.nd.edu/cgi-bin/wa?A2=ind0603&L=pod&O=D&P=22220>.
Post of 19 March to AERA-D, AERA-L, ARN-L, ASSESS, EDDRA, EdStat,
EvalTalk, PhysLrnR, POD, & STLHE-L.
Halloun, I. & D. Hestenes. 1985a. "The initial knowledge state of
college physics students," Am. J. Phys. 53: 1043-1055; online at
<http://modeling.asu.edu/R&E/Research.html>. Contains the "Mechanics
Diagnostic" test (omitted from the online version), precursor to the
widely used "Force Concept Inventory" [Hestenes et al. (1992)].
Halloun, I. & D. Hestenes. 1985b. "Common sense concepts about
motion," Am. J. Phys. 53: 1056-1065; online at
<http://modeling.asu.edu/R&E/Research.html>.
Hersh, R.H. 2005. "What Does College Teach? It's time to put an end
to 'faith-based' acceptance of higher education's quality," Atlantic
Monthly 296(4): 140-143, November; freely online to (a) subscribers
of the Atlantic Monthly at <http://tinyurl.com/dwss8>, and (b) (with
hot-linked academic references) to educators at
<http://tinyurl.com/9nqon> (scroll to the APPENDIX).
Hestenes, D., M. Wells, & G. Swackhamer, 1992. "Force Concept
Inventory," Phys. Teach. 30: 141-158; online (except for the test
itself) at
<http://modeling.asu.edu/R&E/Research.html>. The 1995 revision by
Halloun, Hake, Mosca, & Hestenes is online (password protected) at
the same URL, and is available in English, Spanish, German,
Malaysian, Chinese, Finnish, French, Turkish, Swedish, and Russian.
JCSEE. 1994. Joint Committee on Standards for Educational
Evaluation; A.R. Gullickson, Chair; "Glossary of Program Evaluation
Terms" in "The Program Evaluation Standards," 2nd ed. Sage; online
with permission from the publisher at
<http://ec.wmich.edu/glossary/prog-glossary.htf>.
Klein, S.P., G.D. Kuh, M.Chun, L. Hamilton, & R. Shavelson. 2005. "An
Approach to Measuring Cognitive Outcomes Across Higher Education
Institutions." Research in Higher Education 46(3): 251-276; online at
<http://www.stanford.edu/dept/SUSE/SEAL/> // "Reports/Papers" scroll
to "Higher Education," where "//" means "click on."
Lissitz, R.W., ed. 2005. "Value Added Models in Education: Theory
and Applications." JAM Press. Contents and ordering information are
at the Journal of Applied Measurement web site
<http://www.jampress.org> / "JAM Press Books!" where "/" means
"click on."
Lord, F.M. 1956. "The measure of growth," Educational and
Psychological Measurement 16: 42-437.
Lord, F.M. 1958. "Further problems in the measurement of growth,"
Educational and Psychological Measurement 18: 437-454.
Nichols, S.L & D.C. Berliner. 2005. "The Inevitable Corruption of
Indicators and Educators Through High-Stakes Testing," Arizona State
Univ. Education Policy Studies Laboratory, online at
<http://tinyurl.com/7butg> (1.7 MB).
Paden, D.W. & M.E. Moyer. 1969. "The Relative Effectiveness of
Teaching Principles of Economics," Journal of Economic Education 1:
33-45.
Rogosa, D.R., & J.B. Willett. 1983. "Demonstrating the reliability of
the difference score in the measurement of change," Journal of
Educational Measurement 20: 335- 343.
Rogosa, D. R. & Willet, J. B. 1985. Understanding correlates of
change by modeling individual differences in growth. Psychometrika
50: 203-228.
Rogosa, D.R. 1995. "Myth and methods: 'Myths about longitudinal
research' plus supplemental questions," in J.M. Gottmann, ed. The
Analysis of Change," pp. 3-66. Erlbaum; examples from this paper are
online at
<http://www.stanford.edu/~rag/Myths/myths.html>.
Suskie, L. 2004. "Re: pre- post testing in assessment," ASSESS post
19 Aug 2004 08:19:53-0400; online at <http://tinyurl.com/akz23>.
Scriven, M. 2004. "Re: pre- post testing in assessment," AERA-D post
of 15 Sept 2004 19:27:14-0400; online at <http://tinyurl.com/942u8>.
USDE. 2005a. U.S. Department of Education, No Child Left Behind Act,
online at <http://www.ed.gov/nclb/landing.jhtml?src=pb>.
USDE. 2005b. U.S. Dept. of Education, "Secretary Spellings Announces
New Commission on the Future of Higher Education," press release
online at <http://tinyurl.com/cxgfz>: "Spellings noted that the
achievement gap is closing and test scores are rising among our
nation's younger students, due largely to the high standards and
accountability measures called for by the No Child Left Behind Act.
More and more students are going to graduate ready for the challenges
of college, she said, and we must make sure our higher education
system is accessible and affordable for all these students."
Wittmann, W.W. 1997. "The reliability of change scores: many
misinterpretations of Lord and Cronbach by many others; revisiting
some basics for longitudinal research," online at
<http://www.psychologie.uni-mannheim.de/psycho2/psycho2.en.php3?language=en>
/ "Publications" / "Papers and preprints" where "/" means "click on."
Zimmerman, D.W. & R.H. Williams. 1982. "Gain scores in research can
be highly reliable," Journal of Educational Measurement 19: 149-154.
The abstract <http://mypage.direct.ca/z/zimmerma/earlier.html> reads:
"The common belief that gain scores are unreliable is based on
certain assumptions about the values of parameters in a well known
formula for the reliability of differences. In this paper we show
that a reliability coefficient calculated from the formula can be
high, provided one makes other assumptions about the values of
pretest and posttest reliability coefficients and standard
deviations. Furthermore, there is reason to believe that the revised
assumptions are more realistic than the usual ones in testing
practice."
---
You are currently subscribed to tips as: [email protected]
To unsubscribe send a blank email to [EMAIL PROTECTED]