Should We Measure Change? YES!

Richard Hake Fri, 24 Mar 2006 10:51:39 -0800

ABSTRACT: Pre/post testing is commonly dismissed as a valid gauge ofcourse effectiveness on the grounds that change scores areunreliable, as claimed by Cronback & Furby in their influential "Howwe should measure 'change'- or should we?" The work of David Rogosaand others is cited as casting considerable doubt on theCronback/Furby thesis, consistent with the fact that *formative*pre/post testing has been quite effective in promoting theimprovement of introductory physics courses worldwide.

If you reply to this long (17kB) post, please don't hit the replybutton unless you prune the copy of this post that may appear in yourreply down to a few relevant lines, otherwise the entire alreadyarchived post may be needlessly resent to subscribers.

In an AERA-D post of 22 March titled "The Value of Pre/post Testing,"[Hake (2006a] I quoted Carnegie scholar Lloyd Bond:

"Psychometricians don't like 'change' or 'difference' scores instatistical analyses because, among other things, they tend to havelower reliability than the original measures themselves. Theirobjection to change scores is embodied in the very title of a famouspaper by Cronbach and Furby (1970) 'How should we measure change, orshould we?' "


A subscriber to Gerald Bracey's "Education Disinformation Detection and

Reporting Agency (EDDRA) <http://groups.yahoo.com/group/eddra/> wroteto me privately: ". . .the testing situation of Cronbach and Furby(1970) . . . was such a special case that it almost never arose inreal life. The most important [criticism] was by a prof at Stanford,about whom I can only recall that his first name is David. . ."

The Stanford professor is David Rogosa<http://ed.stanford.edu/suse/faculty/displayRecord.php?suid=rag>/>and <http://www.stanford.edu/~rag/>.

In "Design-Based Research: A Primer for Physics EducationResearchers" [Hake (2004a)] I wrote:

". . . . the canonical anti-pre/post arguments by the psychometricauthorities Lord (1956, 1958) and Cronbach & Furby (1970) that gainscores are unreliable, have been called into question by e.g., WernerWittmann (1997), former Cronbach student David Rogosa (1995), Rogosa& Willett (1983, 1985), Zimmerman & Williams (1982), and Collins andHorn (1991). All this more recent work should (but does not) serve asan antidote for the emotional pre/post paranoia that grips manyeducational researchers."

And in "The Physics Education Reform Effort: A Possible Model forHigher Education," [Hake (2005)], I wrote [bracketed by lines"HHHHHHH. . .":


HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

In my opinion, the physics-education reform model - measurement andimprovement of cognitive gains by faculty disciplinary experts intheir own courses - can provide a crucial complement to the top-downapproaches of Hersh (2005) and Klein et al. (2005). Such pre/posttesting, pioneered by economists [Paden & Moyer (1969)] andphysicists [Halloun & Hestenes (1985a,b)], is rarely employed inhigher education, in part because of the tired old canonicalobjections recently lodged by Suskie (2004) and countered by Hake(2004b) and Scriven (2004). Despite the nay-sayers, pre/post testingis gradually gaining a foothold in introductory astronomy, economics,biology, chemistry, computer science, economics, engineering, andphysics courses [see Hake (2004c) for references].

It should be emphasized that such low-stakes formative pre/posttesting is the polar opposite of the high-stakes summative testingmandated by the U.S. Department of Education's "No Child Left BehindAct" for K-12 (USDE 2005a) that is now contemplated for highereducation (USDE 2005b). As the NCLB experience shows, such testingoften falls victim to "Campbell's Law" (Campbell 1975, Nichols &Berliner 2005):

"THE MORE ANY QUANTITATIVE SOCIAL INDICATOR IS USED FOR SOCIALDECISION MAKING, THE MORE SUBJECT IT WILL BE TO CORRUPTION PRESSURESAND THE MORE APT IT WILL BE TO DISTORT AND CORRUPT THE SOCIALPROCESSES IT IS INTENDED TO MONITOR."

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Why is the pre/post testing discussed above regarded as formative? Inthe disciplines indicated above, both teachers' "action research" andeducation researchers' scientific research is carried out to improveclassroom teaching and learning, NOT to rate instructors or students.Thus it's "formative" as defined by JCSEE (1994): "Formativeevaluation is evaluation designed and used to improve an object,especially when it is still being developed."

BTW, for a recent book on value-added assessment see Lissitz (2005).I have not seen the book but I suspect that the contributors -evidently mostly from the PEP (Psychology/Education/Psychometric)community - are all either oblivious or dismissive of the pre/posttesting research being carried out in such fields as astronomy,economics, biology, chemistry, computer science, engineering, andphysics.


Richard Hake, Emeritus Professor of Physics, Indiana University
24245 Hatteras Street, Woodland Hills, CA 91367
<[EMAIL PROTECTED]>
<http://www.physics.indiana.edu/~hake>
<http://www.physics.indiana.edu/~sdi>

REFERENCES [Tiny URL's courtesy <http://tinyurl.com/create.php>]
Bond, L. 2006. "Who Has the Lowest Prices?" online at
<http://www.carnegiefoundation.org/perspectives/sub.asp?key=245&subkey=569>.

Campbell, D.T. 1975. "Assessing the impact of planned social change,"in G. Lyons, ed., Social research and public policies: TheDartmouth/OECD Conference, Chapter 1, pp. 3-45. Dartmouth CollegePublic Affairs Center, p. 35; online at<http://www.wmich.edu/evalctr/pubs/ops/ops08.pdf> (196 kB).

Collins, L.M. & J.L. Horn. 1991. "Best methods for the analysis ofchange," American Psychological Association. See also Collins & Sayer(2001).

Collins, L.M. & A.G. Sayer. 2001. "New Methods for the Analysis ofChange." American Psychological Association. APA information at<http://www.apa.org/books/4318991.html>, including the Table ofContents <http://www.apa.org/books/4318991t.html>.

Cronbach, L. & L. Furby. 1970. "How we should measure 'change'- orshould we?" Psychological Bulletin 74: 68-80.

Hake, R.R. 2004a. "Design-Based Research: A Primer for PhysicsEducation Researchers," submitted to the "American Journal ofPhysics" on 10 June 2004 (NO RESPONSE!!); online as reference 34 at<http://www.physics.indiana.edu/~hake>, or download directly as a310kB pdf by clicking on

<http://www.physics.indiana.edu/~hake/DBR-AJP-6.pdf>.

Hake, R.R. 2004b. "Re: pre-post testing in assessment," online at

<http://listserv.nd.edu/cgi-bin/wa?A2=ind0408&L=pod&P=R9135&I=-3>.Post of 19 Aug 2004 13:56:07-0700 to POD.

Hake, R.R. 2004c. "Re: Measuring Content Knowledge," POD posts of 14&15 Mar 2004, online at

<http://listserv.nd.edu/cgi-bin/wa?A2=ind0403&L=pod&P=R13279&I=-3> and
<http://listserv.nd.edu/cgi-bin/wa?A2=ind0403&L=pod&P=R13963&I=-3>.

Hake, R. R. 2005. "The Physics Education Reform Effort: A PossibleModel for Higher Education," online at<http://www.physics.indiana.edu/~hake/NTLF42.pdf> (100 kB). This is aslightly edited version of an article that was (a) published in theNational Teaching and Learning Forum 15(1), December 2005, online tosubscribers at<http://www.ntlf.com/FTPSite/issues/v15n1/physics.htm>, and (b)disseminated by the Tomorrow's Professor list<http://ctl.stanford.edu/Tomprof/postings.html> as Msg. 698 on 14 Feb2006.

Hake, R.R. 2006a. "The Value of Pre/post Testing," online at<http://lists.asu.edu/cgi-bin/wa?A2=ind0603&L=aera-d&T=0&F=&S=&X=175E691B545B6F570B&Y=rrhake%40earthlink.net&P=3335>,or more compactly at <http://tinyurl.com/jnfvc>. Post of 22 Mar 200609:20:26-0800 to AERA-D, AERA-L, & EdStat. See also Hake (2006b).

Hake, R.R. 2006b. The Value of Pre/post Testing (was preschool)"online at<http://listserv.nd.edu/cgi-bin/wa?A2=ind0603&L=pod&O=D&P=22220>.Post of 19 March to AERA-D, AERA-L, ARN-L, ASSESS, EDDRA, EdStat,EvalTalk, PhysLrnR, POD, & STLHE-L.

Halloun, I. & D. Hestenes. 1985a. "The initial knowledge state ofcollege physics students," Am. J. Phys. 53: 1043-1055; online at<http://modeling.asu.edu/R&E/Research.html>. Contains the "MechanicsDiagnostic" test (omitted from the online version), precursor to thewidely used "Force Concept Inventory" [Hestenes et al. (1992)].

Halloun, I. & D. Hestenes. 1985b. "Common sense concepts aboutmotion," Am. J. Phys. 53: 1056-1065; online at<http://modeling.asu.edu/R&E/Research.html>.

Hersh, R.H. 2005. "What Does College Teach? It's time to put an endto 'faith-based' acceptance of higher education's quality," AtlanticMonthly 296(4): 140-143, November; freely online to (a) subscribersof the Atlantic Monthly at <http://tinyurl.com/dwss8>, and (b) (withhot-linked academic references) to educators at<http://tinyurl.com/9nqon> (scroll to the APPENDIX).

Hestenes, D., M. Wells, & G. Swackhamer, 1992. "Force ConceptInventory," Phys. Teach. 30: 141-158; online (except for the testitself) at<http://modeling.asu.edu/R&E/Research.html>. The 1995 revision byHalloun, Hake, Mosca, & Hestenes is online (password protected) atthe same URL, and is available in English, Spanish, German,Malaysian, Chinese, Finnish, French, Turkish, Swedish, and Russian.

JCSEE. 1994. Joint Committee on Standards for EducationalEvaluation; A.R. Gullickson, Chair; "Glossary of Program EvaluationTerms" in "The Program Evaluation Standards," 2nd ed. Sage; onlinewith permission from the publisher at<http://ec.wmich.edu/glossary/prog-glossary.htf>.

Klein, S.P., G.D. Kuh, M.Chun, L. Hamilton, & R. Shavelson. 2005. "AnApproach to Measuring Cognitive Outcomes Across Higher EducationInstitutions." Research in Higher Education 46(3): 251-276; online at<http://www.stanford.edu/dept/SUSE/SEAL/> // "Reports/Papers" scrollto "Higher Education," where "//" means "click on."

Lissitz, R.W., ed. 2005. "Value Added Models in Education: Theoryand Applications." JAM Press. Contents and ordering information areat the Journal of Applied Measurement web site<http://www.jampress.org> / "JAM Press Books!" where "/" means"click on."

Lord, F.M. 1956. "The measure of growth," Educational andPsychological Measurement 16: 42-437.

Lord, F.M. 1958. "Further problems in the measurement of growth,"Educational and Psychological Measurement 18: 437-454.

Nichols, S.L & D.C. Berliner. 2005. "The Inevitable Corruption ofIndicators and Educators Through High-Stakes Testing," Arizona StateUniv. Education Policy Studies Laboratory, online at<http://tinyurl.com/7butg> (1.7 MB).

Paden, D.W. & M.E. Moyer. 1969. "The Relative Effectiveness ofTeaching Principles of Economics," Journal of Economic Education 1:33-45.

Rogosa, D.R., & J.B. Willett. 1983. "Demonstrating the reliability ofthe difference score in the measurement of change," Journal ofEducational Measurement 20: 335- 343.

Rogosa, D. R. & Willet, J. B. 1985. Understanding correlates ofchange by modeling individual differences in growth. Psychometrika50: 203-228.

Rogosa, D.R. 1995. "Myth and methods: 'Myths about longitudinalresearch' plus supplemental questions," in J.M. Gottmann, ed. TheAnalysis of Change," pp. 3-66. Erlbaum; examples from this paper areonline at

<http://www.stanford.edu/~rag/Myths/myths.html>.

Suskie, L. 2004. "Re: pre- post testing in assessment," ASSESS post19 Aug 2004 08:19:53-0400; online at <http://tinyurl.com/akz23>.

Scriven, M. 2004. "Re: pre- post testing in assessment," AERA-D postof 15 Sept 2004 19:27:14-0400; online at <http://tinyurl.com/942u8>.

USDE. 2005a. U.S. Department of Education, No Child Left Behind Act,online at <http://www.ed.gov/nclb/landing.jhtml?src=pb>.

USDE. 2005b. U.S. Dept. of Education, "Secretary Spellings AnnouncesNew Commission on the Future of Higher Education," press releaseonline at <http://tinyurl.com/cxgfz>: "Spellings noted that theachievement gap is closing and test scores are rising among ournation's younger students, due largely to the high standards andaccountability measures called for by the No Child Left Behind Act.More and more students are going to graduate ready for the challengesof college, she said, and we must make sure our higher educationsystem is accessible and affordable for all these students."

Wittmann, W.W. 1997. "The reliability of change scores: manymisinterpretations of Lord and Cronbach by many others; revisitingsome basics for longitudinal research," online at

<http://www.psychologie.uni-mannheim.de/psycho2/psycho2.en.php3?language=en>
/ "Publications" / "Papers and preprints" where "/" means "click on."

Zimmerman, D.W. & R.H. Williams. 1982. "Gain scores in research canbe highly reliable," Journal of Educational Measurement 19: 149-154.The abstract <http://mypage.direct.ca/z/zimmerma/earlier.html> reads:"The common belief that gain scores are unreliable is based oncertain assumptions about the values of parameters in a well knownformula for the reliability of differences. In this paper we showthat a reliability coefficient calculated from the formula can behigh, provided one makes other assumptions about the values ofpretest and posttest reliability coefficients and standarddeviations. Furthermore, there is reason to believe that the revisedassumptions are more realistic than the usual ones in testingpractice."



---
You are currently subscribed to tips as: [email protected]
To unsubscribe send a blank email to [EMAIL PROTECTED]

Should We Measure Change? YES!

Reply via email to