Do teaching evaluations lead to lower standards in class?

Jacob and Levitt have an article in the quarterly journal of economics that looks at teachers cheating in public schools due to compensation based on their class performance. They find that teachers will do things to help their students get higher grades if it effects their compensation.

Rotten Apples: An Investigation of the Prevalence and Predictors of Teacher Cheating. Quarterly Journal of Economics. 2003

An article by Nelson and Lynch look at the relationship between grade inflation and teaching evaluations suggesting professors buy better teacher evaluations with grades.

Grade Inflation, Real Income, Simultaneity, and Teaching Evaluations. The Journal of Economic Education. 1984.

It depends on what they are evaluating, and how.

I studied at a university in a mess of a country that was recovering from a period of war. The educational system was not just depressingly dated, it was also falling apart at the seams. Enthusiasts were trying to reform the system, and one of the bigger pushes in the right direction was achieved through course evaluations. This evaluation had questions such as these:

How often does the lecturer show up for class?
Does each lesson have a clear topic?
Is it clear which parts of the printed course materials are covered in which lecture?
Were all the exam questions linked to some printed course material?
Does the lecturer answer students' questions?
Is the lecturer available to students at any point outside the lectures?
Does the lecturer use e-mail to correspond with students?
Do you feel that the lecturer treated you unfairly at some point? How so?
Do you feel that the lecturer engages in any problematic behaviors during class? Please describe.
Did the lecturer ask you for any favors in return for a higher grade?
What are, in your opinion, the good aspects of this course?
What are the bad aspects?

...etc.

There were more questions - many were about lecturing style for example; these are just off the top of my head. Now, this evaluation made lecturers begin to come to class, made them finally pick textbooks, forced them to pick a topic for every lesson (rather than just rambling on), forced them to tell students which part of the book corresponds to which lecture so that students could read the materials in parallel. It also rapidly cut down on truly problematic behaviors such as smoking in class. Furthermore, it helped lecturers improve their performance through providing feedback on the strong and weak points of the course, at least as students saw them. Here, I think the evaluations very clearly helped improve standards in class, especially in truly problematic departments. The reason they helped was twofold: (1) there was a lot of room for improvement, and (2) the questions were well thought out, i.e. each question was linked to a particular goal in the educational reform.

I've also studied at a wonderful, well organized university where most of these questions would be completely ridiculous. There, the evaluations had questions such as:

How many hours per week did you study for this course?
How important would you say this course is for your overall academic development?
Would you say this course was easy, just right, or difficult in terms of content?
Do you think the lecturers evaluate students' knowledge fairly?

...etc.

I honestly have no clue what is gained by such an evaluation, and I hope nobody's salary depends on it. With the right (i.e. wrong) questions, I'm sure you could lower teaching standards by giving financial incentive to score well. The question, then, boils down to what the evaluation sheets look like. To the best of my knowledge, these are not standardized across universities, so the results may vary a lot.

Grade inflation has been an issue in the US since mid 1970s, so welcome to the club. See http://www.endgradeinflation.org/. None of the attempts to curb it have been successful so far; the practice of student evaluations is deep-rooted in US colleges, and cannot be easily modified.

The uphill battle against grade inflation has been spearheaded by University of North Carolina, Chapel Hill, one of top 5 large public US universities. They put a rather extensive research effort into figuring out the patterns of grade inflation. The cause, as you observed, is what economists call market failure, when the self-motivated actions of the players lead to outcomes that are worse for everybody. The employers of the graduates, and the grad programs they apply for, suffer the most, as they cannot distinguish good students from bad students. Organizations and student societies that rely solely on GPA (grade point average) discover great differences between disciplines: the humanities end of the spectrum have been hit the hardest by grade inflation, while engineering and sciences that have more specific assessment and evaluation criteria tend to produce lower grades. The opening page of this 2000 report provides a specific figure to answer your question: about 15% increase in student evaluations associated with 1 standard deviation increase in the course average grade. This standard deviation was 0.4 on the American scale that goes from 0 to 4; at the time of writing the report, the average GPA at UNC was 3.18.

In mid 2000s, UNC came up with an idea of an effective grade, called achievement index. In very simplistic terms, it essentially normalizes each class to have the same GPA. Each student is mapped onto a percentile implied by his grade in a given class, relative to the distribution of grades in this class; percentiles across all classes that a student took would be aggregated; and the ultimate student's achievement GPA would be reported based on the normative judgement of what the university wants to see as the average GPA and the range of grades. This idea is based on item-response theory, or can alternatively be explained using Bayesian methods (a maximum a posteriori estimate of student ability). As you can imagine, this literally caused a student unrest that UNC has not seen since the civil rights movement of the 1960s (o tempora o mores... how petty motives are these days), so the faculty chickened out and ruled against it.

Still, UNC has found a way to put the grades into the context by augmenting the transcript with the average GPA of other students who took this particular class, student's percentile in a given class, and the "schedule point average" = average GPA of all the students in the classes that a student took. The above link shows a clear picture of somebody who had a nominal GPA of 3.6, way up from the average GPA of classmates of 3.0, consistently performing above the median (7 grades above the median, 5 at the median, 0 below), vs. somebody who has only be able to achieve GPA of 2.5 in easier classes with average GPA of 3.2 (1 grade above the median, 3 at the median, 9 below).

The dramatic timeline (if you know how to read between the lines... I grew up in Soviet Union and have this unfortunate skill) of UNC attempts to deal with grade inflation is available here. Some other institutions are likely to use these or similar ideas, including another high-profile public school, Berkeley. (The administrator's claim that the university's computer system cannot handle the additional evaluation method is ridiculous; I could do these numbers on my laptop.)

Do teaching evaluations lead to lower standards in class?

Tags:

Teaching

Course Evaluation

Evaluation Criteria

Related

Recent Posts