Michael Scriven on Student Ratings of Instruction-2

JC Damron Wed, 19 Apr 2000 10:43:32 -0700
Hi folks!, FYI the Scriven excerpt below...

Exerpted From:
Michael Scriven (1993).
The Validity of Student Ratings.
In: Teacher Evaluation,
Evaluation & Development Group,
AERA.
-------------------------->>

....Begin longish Scriven excerpt ...

9. The existence of a positive correlation (even a correlation of 1.0)
between the scores on several forms does not show the presence of a common
property; there must also be logical or theoretical grounds for the
identification and usually also further factual evidence for it. See
"Fallacies of Statistical Substitution" by the present author in
Argumentation, 1987, pp. 333-349, D. Reidel.
10. (i Comparative ratings of teachers can't support the claim that the
worst are bad or the best good, without which conclusions few personnel
actions can be supported. (ii) Friends with similar interests may be
committed to other programs because of different career choices, hence no
recommendation would be appropriate (there are other weaknesses in this
question). (iii) Course preferences are not usually relevant when
evaluating teachers.
11. More than that will stand up in legal hearings at the moment, because
the law has hardly started on questions of validity in personnel actions,
only questions of due process. But the barrier is eroding, as Judge Rebell
points out in his contribution to the Millman and Darling-Hammond volume
New Handbook of Teacher Evaluation (Sage, 1990). A 'serious' hearing is one
in which the state of the art in evaluation is combined with serious
ethical analysis; the law is often well behind the leading edge of these
issues but we would presumably want our fellow-teachers to receive the
benefit of the best investigation we can mount, not just one that meets the
minimum standards the law requires.
12. For example, where the teacher is trying to match a certain, perfectly
acceptable but not obligatory, model of teaching such as the Socratic or
non-directive (questions-only) model).
13. Apart from validity, long forms massively increase processing costs and
raise serious problems about dilution of impact.
14. Common examples of questions about optional courses to which students
want to know the answer include: (i) how heavy the work load is, compared
to other courses; (ii) whether grading is easy; (iii) whether it is really
necessary to buy all the 'required' texts; (iv) what style of teaching is
employed (discussion vs. lecturing, for example); (v) whether the course is
'relevant' or 'too academic'. In the limit, failure to attend to these
concerns can lead student government to set up a duplicate system, with
consequent waste of resources, especially classroom time and leads to
refusals to fill in the official form, or fatigue effects from doing both.
15. The usual ones concern: (i) expected grade in this course; (ii) overall
grade in school to date; (iii) whether the student has been required to
take the course. These are inappropriate, partly because the evidence is
that the answers are not reliable (determined at the college level by
comparing the registrar's figures with the class reports; and also by
asking students to say, anonymously, whether they lie on such questions),
but mainly because they encourage faculty in the entirely improper response
of disregarding the complaints of the weaker students.
16. The obvious example is requesting their names. One still hears faculty
arguing that if students haven't the courage to sign their evaluations, the
evaluations should not be taken seriously. This is reminiscent of dictators
who say their door is always open for dissidents who wish to complain. But
it is also extremely unprofessional unless one has very well-thought-out
procedures to ensure that one's grading or letters of recommendation will
not be influenced by complaints or praise and has proven to the
satisfaction of the students that those procedures will be enforced.
Applications for the prize for the first person to meet those conditions
are welcomed.
17. It's not enough just to say this. To mean it entails that the following
kinds of evidence or procedures be guaranteed: (i) there has to be space on
the form for suggestions about how to improve the forms and the evaluation
process; (ii) student government must have some input into procedures and
content; (iii) the results are at least sometimes used to improve the
course whose members are asked to fill in the forms, e.g. by running a
mid-term version for improvement as well as the end-of-term one for the
record; (iv) the students are informed about how their ratings are weighted
in the faculty evaluation process, and student government is assisted in
verifying any such claims. Absent a policy which addresses these
considerations, one must face the problem that students have good reasons
for running their own ratings system. This involves a considerable
duplication of production resources, of class time, and a reduction of
student interest. It may involve open hostility.
18. There is no great reason to object to teachers distributing the forms
and appointing a student to take them in to the department head or, better,
central office, as long as students are informed in some other way about
the importance of the process. But having an adminstrative assistant or
secretary, or the staff of a faculty development center do the whole
business is in general preferable. If teachers are to do it, students must
be independently counseled and asked about possible abuses of the system
perhaps by means of some remarks on the form itself and provided with space
on the form to register a belief that there has been an attempt to use
inappropriate influence; sob stories about family circumstances are one of
the problems. It is usually not necessary to ask staff to absent themselves
from the room while the forms are being filled in, but it is one way to
keep students informed about changes in the process, and to provide them
with a chance for asking questions without risk.
19. There must of course be a warning to faculty that this is
unprofessional behavior which will be treated very seriously in personnel
evaluation.
20. This was a serious problem in Berkeley in the late sixties, when some
of the radical left got control of the student evaluation process. There
are various ways to detect and control such conspiracies, but the mere
possibility of them is a strong reason for allowing appeals against student
ratings.
21. In a sophisticated system, there could be reasons for unannounced
visits, but in general it is better to encourage the presence of those who
wish to put in rating forms.
22. This requires that the evaluators: (i) know the current enrollment
figure; (ii) ensure that those present do fill out the form, if they have
the authority to do this; (iii) do something about absentee ballots. My
inclination is not to accept less than 95% return rates, and I have found
this to be achievable; certainly anything below 80% is very hard on
validity.
23. Too early rules out reactions to grading procedures and feedback; too
late loses input from those who drop the course because they think it's
bad. I prefer the following arrangement. Distribute forms on the first day,
with envelopes, and request that those dropping the course fill them in
before or after doing so, using campus mail to return them. Request returns
after the mid-term tests have been handed back, and use them to improve the
course. Arrange for the final to be given on the last day of class. Require
attendance at the time scheduled for the final in the exam period, as a
condition of getting a grade. At that session, return the marked exams for
immediate study, with comments, or at least a demonstration or examples of
answers that would have received an A. Allow questions and then protests
about the grades. Collect the exams, for the archives, and distribute the
rating forms. Collect the forms, and check off that every student has
turned one in, as well as returned an exam. This procedure has a salutary
effect on any headroom problems; but it is also a good way to avoid wasting
the final exam as a learning experience.
24. Bi-modal distributions tell a very different story from bell-shaped
curves with the same mean, so at least the standard deviation is required;
but in fact, a graph is considerably more informative, especially but not
only for those lacking statistical sophistication.
25. Most of the other alleged differences between difficult technical
courses and the humanities courses, and between required and optional
courses, turn out to be either non-existent or less than a serious threat
in a well-run system.
26. An early version of this was in "Duties-Based Teacher Evaluation" in
Vol. 1, No. 3 of Journal of Personnel Evaluation in Education, 1987. This
list readily generates student rating forms to match the parts of it that
students can rate, and converts easily to cover post-secondary duties. A
considerably revised later version is in another chapter of this book.
27. It might be mentioned that one rarely sees post-secondary institutions
using the hard-won discipline of program evaluation to evaluate their
programs. Even the use of alumni to help with the needs assessment is
extremely rare. This is a replay of colleges not using what was long known
about how to evaluate teaching (and quite like their limited application of
what is known about how to evaluate student performance). The track record
is quite different on admissions; presumably a cynic would say the
explanation is that the latter exercise does not require the institutions
to look critically at themselves.
28. I do not here discuss most of the very serious technical objections to
the use of gain scores that have been raised by measurement
specialistsÑincluding for example the problem of regression to the meanÑand
special problems of reliability and validity that arise with any tests. The
problems discussed here are conceptual problems that apply even if the
technical problems can be dismissed, as is possible in cases of massive or
near-zero gains.
29. The 'Harvard fallacy' is the fallacy of supposing that the teaching at
Harvard must be good because its graduates do very well in later life, in
proportion to their numbers. All that one can infer from that data is that
Harvard does not usually inflict permanent brain damage. The rest of the
trick lies in selecting a talented entering class and not getting in the
way of their use of the library, labs, peer tutoring, and family influence
to which brand-name reverence adds a good deal. The contribution from the
faculty, if any, is the residue after factoring out the non-faculty
influences on the academic side, and the effect of the 'old boys network'
and brand recognition, on job selection and promotion. While Harvard is
demonstrably a great university, it is certainly not demonstrably a great
teaching university, just a well-equipped one.
30. Self-evaluation is, of course, not evaluation without input from
others, but evaluation that is self- initiated and directed; the use of
anonymous student ratings is a sine qua non of any self-evaluation by
teachers. Naturally, any systematic process for the evaluation of faculty
should reward serious self- evaluation and systematic self-development
based on it. In teaching as in any profession, the combination of these two
practices are the hallmarks of professionalism, the minimum standards for
social tolerance of the practitioner.
31. That is, all the teachers being compared may be weak, or the worst of
them may be very good and the others better still, etc. Student ratings,
especially in upper secondary and post-secondary contexts, are based on a
much wider range of comparisons than this, which provides a better
approximation to criterion- referencing.
32. At earlier grade levels, student evaluations need to be supported by a
substantial effort at prior training for the students, (something of
considerable educational merit in its own right), would not be sound well
down into lower primary. But we need more experimentation in that area, as
well as legitimation by leadership use.
33. There are also conceptions of teaching which make them less important, even
in extreme cases completely inappropriate, for example the conception of
teaching as creating a climate for learning rather than as transmitting it.
34. In a common college situation, there is one visit selected from 30Ð120
sessions: at the school level, it is one or two or, rarely, as many as
five, visits, usually for less than a complete period, out of 500-1000
class meetings. Given the way in which individual class meetings vary, as
every instructor knows, both for idiosyncratic reasons and as the term goes
on, as the first or the final test looms, or as the topics vary in
interest, or as visitors are present, this cannot be thought of as an
adequate sample. One should also take into account the way in which
visitors' ratings can change as they come to 'see through' the teacher's
style, a process which may continue over a large number of visits (Studies
at the University of California at Davis make clear that this effect can be
very substantial, and I know of none that found it to be small; Wilson et
al. College professors and their impact on students., (Wiley, 1975))
35. There is some reduction of impact of these criticisms if we videotape
all sessions of a course and select a substantial random sample of these to
evaluate. But the cost and connotations of this approach are worrying, and
we lack experience with it.
36. There are some extreme cases where the visitor can make a reliable
judgement of pedagogical skill. The validity of these judgements is skewed
along the merit axis; it is easier to identify deep trouble than great
merit. But of course, the students can make the same judgement with the
same or greater validity, so the visit is unnecessary.
37. The teaching materials and test or project work done by the students
will better serve that purpose.
38. Students are similarly in a uniquely strong position to rate the
presence of immediately identifiable benefits from the material and skills
acquired from a teacher, but this is arguably not crucial in evaluating
teaching merit. (Technically, it's relevant to rating worth not merit; one
can't blame the Latin teacher because the subject isn't seen as immediately
beneficial.) However, this kind of rating can be useful for formative
evaluation, telling you whether you should be spending more time persuading
the class of the importance of the subject to them, if you believe that it
is. It is also significant for many discussions of a department's
curriculum. Hence it should be considered for inclusion on a 'general
purpose' form. This is a different matter from rating the eventual or
long-term value of the course to the student's e.g. professional needs,
about which the students are usually not in a good position to pass
judgement.
39. Particularly in light of this point, it seems sensible to use student
rating forms in a two-stage procedure. In the first stage, a good
summative-valid form is used, administered in a summative-valid way
(security procedures, etc.). Only if someone does so badly on that stage as
to jeopardize their job, or offend their own sense of satisfaction with the
quality of their work, should they then move to the use of a second form.
That second form can simply call for a more detailed analysis of the duties
(expanding on the type of questions mentioned in D). But it could also if
the teacher wishes ask the student to answer questions about style (as in
E), so that the information provided by the style literature as to models
that have worked well for many teachers can be applied. 40 The usual
distinction here is between merit and worth (or suitability). Both are
legitimate in the evaluation of faculty, within limits. It is worth and not
just merit that leads to a position being advertised in the first place,
and to the verdict of redundancy. Worth can sometimes be used, properly, to
justify gender preference; and is often used, improperly, to rationalize
political discrimination. A longer discussion of it is provided in a paper
on teacher selection in the New Handbook of Teacher Evaluation: Assessing
Elementary & Secondary School Teachers, J. Millman and L. Darling-Hammond,
eds., Sage, 1990. Handbook of Teacher Evaluation, Second Edition, J.
Millman and L. Darling-Hammond, editors, (Sage, 1988) 41 For example, in
"Validity in Personnel Evaluation" in Journal of Personnel Evaluation in
Education , vol. 1, no. 1, 1987; and in earlier chapters in this book.
42. This viewpoint was expressed in the late sixties, by the radical left,
in the bitter phrase 'the student as nigger'.
43. Provided, of course, that the results of the student evaluations do
carry weight in the decisions made. A monitor from student government on
the committee is desirable here, and appropriate controls of anonymity are
possible.
44. A prompt is simply a hint as to something that the respondent may wish
to underscore as significant, or comment on, or simply take into account
when selecting an answer to the One Big Question, but which does not
require a response. We get more than 50 prompts in small print on our
one-page four-question rating form; but the average time to complete is
still around 3 minutes compared to 10-15 for a 50-question form. Other
advantages are: (i) coding for summative evaluation is simpler (and it's no
more difficult for formative evaluation); (ii) the integration of multiple
considerations is done
by the respondent, not by the evaluator who lacks good reasons for any
particular relative weighting; (iii) it uses relatively little paper, time,
and computer processing. Readers are welcome to a copy of the form we use
upon request.
45. Which are based on the idea that if someone is not doing well at
teaching, they must be 'going about it the wrong way', i.e., they need to
have their teaching style improved. The correct approach would be to see,
first, if their discharge of teaching obligations needed improvement. There
are many such obligations that need improvement in most teachers, and can
easily be improved, as one can immediately show. You can't prove that the
usual recommendations for improving style actually improve teaching in a
given individual (without massive testing with follow-ups). A common sign
of this is that we usually can't get
good agreement between two independent observers as to the best style; but
even if we could, the size of the benefits are not demonstrable.
46. Bad temper is an obvious example, but excessive repetition, reading
from texts, and the failure to ask questions except in the presence of
visitors are others.
47. Certainly, since the visitor cannot avoid seeing the style features,
and hence cannot guarantee not being influenced by them, visitors cannot be
used for input on personnel decisions.
48. As well as possible benefits for campus morale, as already mentioned.
49. The usual problem of initial faculty opposition to weighting student
ratings sometimes turns into the opposite one; after a few years' use,
there is a tendency towards overweighting student evaluations. A major
source of benefit comes from improved faculty self-evaluation, resulting
from the need to face up to and discuss the student ratings of their work,
which in other institutions will only occur to those who actively and
independently undertake to get their performance rated by students.
50 Some suggestions about such a system are provided in "Summative Teacher
Evaluation" in Handbook of Teacher Evaluation, ed. J. Millman, (Sage,
1981).

......>>






---------------------------------------->>>
                              +
....John C Damron, PhD        *
....Douglas College, DLC   *  *  *
....P.O Box 2503         *    *    *
....New Westminster, British Columbia
....Canada V3L 5B2  FAX: (604) 527-5969
....e-mail: [EMAIL PROTECTED]

  Re: Student Ratings......

  "It cannot be emphasized strongly enough that the evaluation
questionnaires of the type that we are discussing here measure only
the attitudes of the students towards the class and the instructor.
They do not measure the amount of learning which has taken place...."
-- From: CAUT (Canadian Association of University Teachers).
---------------------------------------------------------------->>

               http://www.douglas.bc.ca/

        http://www.douglas.bc.ca/psychd/index.html


 Student Ratings Critique:

 http://www.mankato.msus.edu/dept/psych/Damron_politics.html
Michael Scriven on Student Ratings of Instruction-2

Reply via email to