ABSTRACT: Pre/post testing is commonly dismissed as a valid gauge of course effectiveness on the grounds that change scores are unreliable, as claimed by Cronback & Furby in their influential "How we should measure 'change'- or should we?" The work of David Rogosa and others is cited as casting considerable doubt on the Cronback/Furby thesis, consistent with the fact that *formative* pre/post testing has been quite effective in promoting the improvement of introductory physics courses worldwide.

If you reply to this long (17kB) post, please don't hit the reply button unless you prune the copy of this post that may appear in your reply down to a few relevant lines, otherwise the entire already archived post may be needlessly resent to subscribers.

In an AERA-D post of 22 March titled "The Value of Pre/post Testing," [Hake (2006a] I quoted Carnegie scholar Lloyd Bond:

"Psychometricians don't like 'change' or 'difference' scores in statistical analyses because, among other things, they tend to have lower reliability than the original measures themselves. Their objection to change scores is embodied in the very title of a famous paper by Cronbach and Furby (1970) 'How should we measure change, or should we?' "

A subscriber to Gerald Bracey's "Education Disinformation Detection and
Reporting Agency (EDDRA) <http://groups.yahoo.com/group/eddra/> wrote to me privately: ". . .the testing situation of Cronbach and Furby (1970) . . . was such a special case that it almost never arose in real life. The most important [criticism] was by a prof at Stanford, about whom I can only recall that his first name is David. . ."

The Stanford professor is David Rogosa <http://ed.stanford.edu/suse/faculty/displayRecord.php?suid=rag>/> and <http://www.stanford.edu/~rag/>.

In "Design-Based Research: A Primer for Physics Education Researchers" [Hake (2004a)] I wrote:

". . . . the canonical anti-pre/post arguments by the psychometric authorities Lord (1956, 1958) and Cronbach & Furby (1970) that gain scores are unreliable, have been called into question by e.g., Werner Wittmann (1997), former Cronbach student David Rogosa (1995), Rogosa & Willett (1983, 1985), Zimmerman & Williams (1982), and Collins and Horn (1991). All this more recent work should (but does not) serve as an antidote for the emotional pre/post paranoia that grips many educational researchers."

And in "The Physics Education Reform Effort: A Possible Model for Higher Education," [Hake (2005)], I wrote [bracketed by lines "HHHHHHH. . .":

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
In my opinion, the physics-education reform model - measurement and improvement of cognitive gains by faculty disciplinary experts in their own courses - can provide a crucial complement to the top-down approaches of Hersh (2005) and Klein et al. (2005). Such pre/post testing, pioneered by economists [Paden & Moyer (1969)] and physicists [Halloun & Hestenes (1985a,b)], is rarely employed in higher education, in part because of the tired old canonical objections recently lodged by Suskie (2004) and countered by Hake (2004b) and Scriven (2004). Despite the nay-sayers, pre/post testing is gradually gaining a foothold in introductory astronomy, economics, biology, chemistry, computer science, economics, engineering, and physics courses [see Hake (2004c) for references].

It should be emphasized that such low-stakes formative pre/post testing is the polar opposite of the high-stakes summative testing mandated by the U.S. Department of Education's "No Child Left Behind Act" for K-12 (USDE 2005a) that is now contemplated for higher education (USDE 2005b). As the NCLB experience shows, such testing often falls victim to "Campbell's Law" (Campbell 1975, Nichols & Berliner 2005):

"THE MORE ANY QUANTITATIVE SOCIAL INDICATOR IS USED FOR SOCIAL DECISION MAKING, THE MORE SUBJECT IT WILL BE TO CORRUPTION PRESSURES AND THE MORE APT IT WILL BE TO DISTORT AND CORRUPT THE SOCIAL PROCESSES IT IS INTENDED TO MONITOR."
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Why is the pre/post testing discussed above regarded as formative? In the disciplines indicated above, both teachers' "action research" and education researchers' scientific research is carried out to improve classroom teaching and learning, NOT to rate instructors or students. Thus it's "formative" as defined by JCSEE (1994): "Formative evaluation is evaluation designed and used to improve an object, especially when it is still being developed."

BTW, for a recent book on value-added assessment see Lissitz (2005). I have not seen the book but I suspect that the contributors - evidently mostly from the PEP (Psychology/Education/Psychometric) community - are all either oblivious or dismissive of the pre/post testing research being carried out in such fields as astronomy, economics, biology, chemistry, computer science, engineering, and physics.

Richard Hake, Emeritus Professor of Physics, Indiana University
24245 Hatteras Street, Woodland Hills, CA 91367
<[EMAIL PROTECTED]>
<http://www.physics.indiana.edu/~hake>
<http://www.physics.indiana.edu/~sdi>

REFERENCES [Tiny URL's courtesy <http://tinyurl.com/create.php>]
Bond, L. 2006. "Who Has the Lowest Prices?" online at
<http://www.carnegiefoundation.org/perspectives/sub.asp?key=245&subkey=569>.

Campbell, D.T. 1975. "Assessing the impact of planned social change," in G. Lyons, ed., Social research and public policies: The Dartmouth/OECD Conference, Chapter 1, pp. 3-45. Dartmouth College Public Affairs Center, p. 35; online at <http://www.wmich.edu/evalctr/pubs/ops/ops08.pdf> (196 kB).

Collins, L.M. & J.L. Horn. 1991. "Best methods for the analysis of change," American Psychological Association. See also Collins & Sayer (2001).

Collins, L.M. & A.G. Sayer. 2001. "New Methods for the Analysis of Change." American Psychological Association. APA information at <http://www.apa.org/books/4318991.html>, including the Table of Contents <http://www.apa.org/books/4318991t.html>.

Cronbach, L. & L. Furby. 1970. "How we should measure 'change'- or should we?" Psychological Bulletin 74: 68-80.

Hake, R.R. 2004a. "Design-Based Research: A Primer for Physics Education Researchers," submitted to the "American Journal of Physics" on 10 June 2004 (NO RESPONSE!!); online as reference 34 at <http://www.physics.indiana.edu/~hake>, or download directly as a 310kB pdf by clicking on
<http://www.physics.indiana.edu/~hake/DBR-AJP-6.pdf>.

Hake, R.R. 2004b. "Re: pre-post testing in assessment," online at
<http://listserv.nd.edu/cgi-bin/wa?A2=ind0408&L=pod&P=R9135&I=-3>. Post of 19 Aug 2004 13:56:07-0700 to POD.

Hake, R.R. 2004c. "Re: Measuring Content Knowledge," POD posts of 14 &15 Mar 2004, online at
<http://listserv.nd.edu/cgi-bin/wa?A2=ind0403&L=pod&P=R13279&I=-3> and
<http://listserv.nd.edu/cgi-bin/wa?A2=ind0403&L=pod&P=R13963&I=-3>.

Hake, R. R. 2005. "The Physics Education Reform Effort: A Possible Model for Higher Education," online at <http://www.physics.indiana.edu/~hake/NTLF42.pdf> (100 kB). This is a slightly edited version of an article that was (a) published in the National Teaching and Learning Forum 15(1), December 2005, online to subscribers at <http://www.ntlf.com/FTPSite/issues/v15n1/physics.htm>, and (b) disseminated by the Tomorrow's Professor list <http://ctl.stanford.edu/Tomprof/postings.html> as Msg. 698 on 14 Feb 2006.

Hake, R.R. 2006a. "The Value of Pre/post Testing," online at <http://lists.asu.edu/cgi-bin/wa?A2=ind0603&L=aera-d&T=0&F=&S=&X=175E691B545B6F570B&Y=rrhake%40earthlink.net&P=3335>, or more compactly at <http://tinyurl.com/jnfvc>. Post of 22 Mar 2006 09:20:26-0800 to AERA-D, AERA-L, & EdStat. See also Hake (2006b).

Hake, R.R. 2006b. The Value of Pre/post Testing (was preschool)" online at <http://listserv.nd.edu/cgi-bin/wa?A2=ind0603&L=pod&O=D&P=22220>. Post of 19 March to AERA-D, AERA-L, ARN-L, ASSESS, EDDRA, EdStat, EvalTalk, PhysLrnR, POD, & STLHE-L.

Halloun, I. & D. Hestenes. 1985a. "The initial knowledge state of college physics students," Am. J. Phys. 53: 1043-1055; online at <http://modeling.asu.edu/R&E/Research.html>. Contains the "Mechanics Diagnostic" test (omitted from the online version), precursor to the widely used "Force Concept Inventory" [Hestenes et al. (1992)].

Halloun, I. & D. Hestenes. 1985b. "Common sense concepts about motion," Am. J. Phys. 53: 1056-1065; online at <http://modeling.asu.edu/R&E/Research.html>.

Hersh, R.H. 2005. "What Does College Teach? It's time to put an end to 'faith-based' acceptance of higher education's quality," Atlantic Monthly 296(4): 140-143, November; freely online to (a) subscribers of the Atlantic Monthly at <http://tinyurl.com/dwss8>, and (b) (with hot-linked academic references) to educators at <http://tinyurl.com/9nqon> (scroll to the APPENDIX).

Hestenes, D., M. Wells, & G. Swackhamer, 1992. "Force Concept Inventory," Phys. Teach. 30: 141-158; online (except for the test itself) at <http://modeling.asu.edu/R&E/Research.html>. The 1995 revision by Halloun, Hake, Mosca, & Hestenes is online (password protected) at the same URL, and is available in English, Spanish, German, Malaysian, Chinese, Finnish, French, Turkish, Swedish, and Russian.

JCSEE. 1994. Joint Committee on Standards for Educational Evaluation; A.R. Gullickson, Chair; "Glossary of Program Evaluation Terms" in "The Program Evaluation Standards," 2nd ed. Sage; online with permission from the publisher at <http://ec.wmich.edu/glossary/prog-glossary.htf>.

Klein, S.P., G.D. Kuh, M.Chun, L. Hamilton, & R. Shavelson. 2005. "An Approach to Measuring Cognitive Outcomes Across Higher Education Institutions." Research in Higher Education 46(3): 251-276; online at <http://www.stanford.edu/dept/SUSE/SEAL/> // "Reports/Papers" scroll to "Higher Education," where "//" means "click on."

Lissitz, R.W., ed. 2005. "Value Added Models in Education: Theory and Applications." JAM Press. Contents and ordering information are at the Journal of Applied Measurement web site <http://www.jampress.org> / "JAM Press Books!" where "/" means "click on."

Lord, F.M. 1956. "The measure of growth," Educational and Psychological Measurement 16: 42-437.

Lord, F.M. 1958. "Further problems in the measurement of growth," Educational and Psychological Measurement 18: 437-454.

Nichols, S.L & D.C. Berliner. 2005. "The Inevitable Corruption of Indicators and Educators Through High-Stakes Testing," Arizona State Univ. Education Policy Studies Laboratory, online at <http://tinyurl.com/7butg> (1.7 MB).

Paden, D.W. & M.E. Moyer. 1969. "The Relative Effectiveness of Teaching Principles of Economics," Journal of Economic Education 1: 33-45.

Rogosa, D.R., & J.B. Willett. 1983. "Demonstrating the reliability of the difference score in the measurement of change," Journal of Educational Measurement 20: 335- 343.

Rogosa, D. R. & Willet, J. B. 1985. Understanding correlates of change by modeling individual differences in growth. Psychometrika 50: 203-228.

Rogosa, D.R. 1995. "Myth and methods: 'Myths about longitudinal research' plus supplemental questions," in J.M. Gottmann, ed. The Analysis of Change," pp. 3-66. Erlbaum; examples from this paper are online at
<http://www.stanford.edu/~rag/Myths/myths.html>.

Suskie, L. 2004. "Re: pre- post testing in assessment," ASSESS post 19 Aug 2004 08:19:53-0400; online at <http://tinyurl.com/akz23>.

Scriven, M. 2004. "Re: pre- post testing in assessment," AERA-D post of 15 Sept 2004 19:27:14-0400; online at <http://tinyurl.com/942u8>.

USDE. 2005a. U.S. Department of Education, No Child Left Behind Act, online at <http://www.ed.gov/nclb/landing.jhtml?src=pb>.

USDE. 2005b. U.S. Dept. of Education, "Secretary Spellings Announces New Commission on the Future of Higher Education," press release online at <http://tinyurl.com/cxgfz>: "Spellings noted that the achievement gap is closing and test scores are rising among our nation's younger students, due largely to the high standards and accountability measures called for by the No Child Left Behind Act. More and more students are going to graduate ready for the challenges of college, she said, and we must make sure our higher education system is accessible and affordable for all these students."

Wittmann, W.W. 1997. "The reliability of change scores: many misinterpretations of Lord and Cronbach by many others; revisiting some basics for longitudinal research," online at
<http://www.psychologie.uni-mannheim.de/psycho2/psycho2.en.php3?language=en>
/ "Publications" / "Papers and preprints" where "/" means "click on."

Zimmerman, D.W. & R.H. Williams. 1982. "Gain scores in research can be highly reliable," Journal of Educational Measurement 19: 149-154. The abstract <http://mypage.direct.ca/z/zimmerma/earlier.html> reads: "The common belief that gain scores are unreliable is based on certain assumptions about the values of parameters in a well known formula for the reliability of differences. In this paper we show that a reliability coefficient calculated from the formula can be high, provided one makes other assumptions about the values of pretest and posttest reliability coefficients and standard deviations. Furthermore, there is reason to believe that the revised assumptions are more realistic than the usual ones in testing practice."


---
You are currently subscribed to tips as: [email protected]
To unsubscribe send a blank email to [EMAIL PROTECTED]

Reply via email to