Grading curves in quantitative subjects

I do not ask the opinion-based question of how an instructor should curve his or her course. Rather, I ask the fact-based, or at any rate experience-based, question of how a normal, responsible, experienced instructor does curve his or her course.

I think the following would be a typical story of the process. Jane is hired to a tenure-track faculty position in physics, and she is going to teach freshman mechanics her first semester. To find out what is considered the normal academic level of the course at this school, she looks at available information such as the text being used, the lab manual, the problem sets assigned by previous instructors, and, if possible, exams set by previous instructors. She looks at myedu.com to see histograms of grades given by other teachers at this school in this course.

Based on this information, she tries to set standards that will seem somewhat easier than average at this school. She wants to make sure that her grading isn't seen as too harsh, which could result in poor teaching evaluations and endanger her hopes of tenure. She can't afford to be as stringent a grader as some of the senior faculty, because she's not insulated by tenure as they are. She can use an infinite variety of factors to adjust the level of difficulty, but she is constrained by this need to appear a little more lenient than average.

In most cases, people end up teaching at schools that are less selective than the ones at which they got their own degrees. This is the case for Jane, so after the first midterm she finds that her best attempts to be perceived as a lenient grader are failing rather badly. Many of her students are very upset. Jane is alarmed. She has recently acquired a mortgage, and a terrier that everyone says looks like President Trump. If she doesn't get tenure, she and the dog will be out on the street. She makes further downward adjustments in her standards. Within a few semesters, she arrives at a set of standards that become the ones she considers normal for the rest of her career as a "normal, responsible, experienced instructor."

Jane gets tenure, and as time goes on, more tenure-track faculty members are hired in her department. These new folks go through the same process described above, arriving at standards that are in turn incrementally lower than the ones Jane uses. This trend continues over the years. By the time she retires, Jane owns a bulldog that everyone says looks like President Bannon, and she has become legendary as an extremely harsh grader compared to her younger colleagues.

For more on this topic, see:

Arum and Roksa, Academically Adrift, 2011

Valen Johnson, Grade Inflation: A Crisis in College Education, 2003


For years, I have taught engineering part-time at a typical agriculture-and-mining-type state college with a big engineering program. I have no opinion of my own as to how to curve a course, but because I wish my grading curves to be typical and unremarkable, I have over the years asked several tenured faculty in my department how they curve their courses, and have adjusted my own curves accordingly.

From the tenured faculty I have interviewed, I have learned three things:

  1. Tenured engineering faculty approach the question in various ways.
  2. However, on the whole (to my surprise) none of the tenured engineering faculty I have interviewed grades markedly stricter or more leniently than the others do.
  3. I can extract from the interviews a heuristic to set a typical curve in my own courses.

Here is the resultant heuristic:

  • Target a course average of 70 plus three per year. That is, target 73 in a 100-level (freshman) course, 76 in a 200-level (sophomore) course, 79 in a 300-level (junior) course, and so on. I have never taught a 500-level graduate course, but at least one tenured faculty member who does teach such courses informs me that the trend continues through that level. At any rate, the general concept is that the weakest students progressively drop out before reaching the higher levels, so the average among the survivors progressively rises. (Since most dropouts are freshmen and sophomores, this does not explain specifically why the senior average is higher than the junior, but I gather that faculty just like to grade their seniors a little more leniently.)
  • If the course has a prerequisite on the same (or a higher) level, add one point. For instance, a 200-level course with a 200-level prerequisite can target an average of 77 rather than 76.
  • Many courses are taught principally during the fall semester or principally during the spring semester. Most students take the course during the principal semester, but at the 300 level and below, for various reasons, a few students take the course during the other, off-sequence semester, or during the summer. Experience finds the on-sequence students to be smarter and/or harder-working on average. Therefore, if the course is on sequence, add one point. For instance, the above-mentioned 200-level course can target an average of 78 rather than 77 if it is on sequence. (A few courses may have approximately equal enrollments fall and spring. Such a course has no on-sequence semester. For such a course, split the difference by adding half a point each semester. However, add zero for a summer course in any case. Most senior-level courses are offered only once a year, but I don't teach those courses, so I do not know how to adjust those. My guess would be: add a full point except during the summer; but I have not actually asked any faculty members about that.)
  • Target a standard deviation of 0.60*[100 - (targeted course average)]. For example, if targeting a course average of 78, target a standard deviation of 0.60*22 = 13.2.
  • When computing the course average across all students, disregard students who drop the course during the early or middle parts of the semester. Some colleges however allow students to withdraw late under restricted conditions. This is harder, because the late withdrawers are usually students that (a) actually took most of the course and (b) would have finished with Ds or Fs. I have discussed this problem with only three faculty members, but after discussing it, my practice is this: impute to late withdrawers a curved (not raw) average of 60—in other words, place late withdrawers on the borderline between a curved D- and a curved F—then include them in the overall average.
  • Regardless of the curve, if a student achieves an unadjusted score of 93, award him or her a straight A; if 90, at least an A-; if 87, at least a B+; if 83, at least a straight B; if 80, at least a B-; if 77, at least a C+; if 73, at least a straight C; and so on. However, your exams/tests/quizzes should probably be hard enough that this rule seldom comes into effect. Most students should end up being graded on the curve.
  • After all the above, keep one extra point of leniency in your pocket (as it were) to push some students at discretion over the line into the next higher grade. Illustration: suppose that the curved standard to achieve a B- in your course turns out to be 76.0. Suppose that you have a tight cluster of students whose averages are 76.1, 75.9, 75.8, 75.6, 75.4, 75.2, 75.1, 74.9 and 74.8, followed by a gap, after which the next best student is a 72.0. Just award the whole tight cluster of students a B-. Do this as needed at various grade cutoffs to raise the overall curved course average by about one point. However, an average of one extra point of leniency suffices and you don't want to allow your grade cutoffs to become too sloppy, so try not to exceed one extra point of leniency on average.
  • Regardless of subjective factors and leniency, never give a student with a higher course average a lower grade. That is, if some student with a 74.8 gets a B-, then every student at or above a 74.8 gets at least a B-. That's only fair.
  • If the course enrollment is small (fewer than 20 students, say), then subjectively consider the overall quality of the class. Adjust your cutoffs accordingly, but don't overdo it, and be honest. Such subjective adjustments should sometimes go up and sometimes go down, so that overall averages over the school year are maintained. If the course enrollment is large, then do not trust your subjective opinion but rather trust the law of large numbers: in that case, pretty much just stick to the averages.
  • If the course enrollment is extremely small (fewer than 5 students, say), then I do not know what to do. I have not taught such a course.
  • If some unusual administrative condition applies to a particular course during a specific semester (for instance, if an honors section usually exists but that section got canceled this year, dumping the honors students into your section), then adjust accordingly.
  • (Speaking of honors sections: my department does not have any of those. However, if yours does and you are teaching one, then you'll presumably need to adjust—but you are probably tenured faculty in that case and have no need of my advice. If had to guess, I would guess that an honors section boosted the targeted average by about ten points.)
  • If you have a particular student (or group of students) you are fairly sure has not learned/achieved enough to earn even a D-, if the student's raw average is below 60, then raise your curve's cutoff against a D- and give that student an F. Reason: you owe it to the public to block that student for the time being from graduating and entering professional practice. This is unpleasant, naturally, but fortunately it usually isn't an issue, at least not in my experience. If you curve as herein advised, then the student in question will usually get an F, anyway. Still, the issue does occasionally arise so I mention it.

Admittedly, the above is quasi-anecdotal. It is not a proper study (if it were, then I suppose that I would give it as a journal paper rather than as a StackExchange answer). On the other hand, it has little to do with my own opinion. Based on interviews, it reflects and approximately averages existing practices among tenured faculty in my engineering department, practices which—as far as I know (for I earned my own degrees far away in another region of the U.S.)—are typical of similar public colleges in other states.

The above is neither lenient nor strict as far as I know. It is typical, rather.

Of course, I have no documentary proof to show, so you must decide for yourself how credible I seem. I am merely part-time adjunct faculty at a state college, which in academia makes me an extremely unimportant person; yet I haven't seen this question asked and answered on StackExchange, and I happen to know something about it, and I moreover happen to have some years of minimally relevant experience to back my answer up, so there you have it. Back when I first started instructing, I would have liked to find such an answer on StackExchange, so today I write the answer. Make of it what you will.

EXAMPLE

Suppose that you were teaching a 300-level (junior) course. Suppose that this course has a 300-level prerequisite. Suppose that most students who take this course at your college take it during the spring semester, and suppose that you are teaching the course during the spring semester. Suppose that you mean to follow the above-described heuristic.

Suppose that initial enrollment is 110, but 10 students drop the course by the registrar's mid-semester drop deadline, leaving 100.

PROBLEM

According the above-described heuristic, if you are the instructor, then how should you curve this course?

SOLUTION

Target an average of 81 with a standard deviation of 11.4. Consider only the 100 students; ignore the other 10. At discretion in specific cases (mainly where students with similar marks cluster together), bump some students up over the line into the next grade, effectively (indirectly) boosting the overall average to 82.

Late-withdrawing students, if any, slightly complicate the solution as earlier explained. You should not ignore the late withdrawers, for that would be unfair to the students who complete the course. However, unless many students withdraw late, their effect on the averages will be fairly small. Nevertheless, their effect (even if small) poses a significant calculational hassle and you should allow adequate time before the grading deadline to adjust for it.

VARIANT PRACTICES

There exist many variant practices among divers professors, though all the practices my interviews have discovered achieve similar averages and, usually, similar standard deviations. For example, one professor I know awards a set percentage of his class a straight A, a set percentage an A-, and so on; but even this professor adjusts his percentages by class level to achieve distributions like the other professors achieve.

At least one professor I know, like perhaps many others, grades late withdrawers according to the withdrawers' incomplete course averages excluding the final exam. That's not what I do, but its effect is fairly similar and it should work well enough.

One professor I know ignores the late withdrawers as he ignores the early withdrawers. The consensus however seemed to be that that was probably not the best way to handle the matter. On the other hand, if late withdrawal is rare, then the difference probably does not amount to much.

Inexperienced assistant professors, who have yet to achieve tenure, might grade in unusual ways. Or not. I don't know anything about that. Interested only in experienced opinions, I have not interviewed any assistant professors.

CAUTION

Grades in the humanities and other nonquantitative fields of study seem to be distributed and assigned on a very different basis, presumably because faculty in those departments are much less likely to think statistically than we engineers are. Nothing written above applies to humanities courses as far as I know.

Undergraduates seem to evaluate an instructor more leniently when they expect lenient grades from the instructor. This is unpleasant but is also a fact of academic life. Undergraduates are often immature. That's just the way of it. (Fortunately, my department takes this leniency effect into adequate account when weighing students' evaluations of faculty, so the effect has posed little problem for me. I vaguely gather that, during the late 1960s, maintaining a B average in college saved a U.S. student from being drafted and sent to war in Vietnam. This led to rapid grade inflation among faculty who, understandably, felt unwilling to be the proximate reason a student of theirs got shot and killed; and there were knock-on effects, indirectly including the aforementioned problem with students' evaluations of faculty. Enough decades have passed since then—after all, your department chairman was probably in elementary school during the Vietnam War—that the problem in question may have largely stabilized. At any rate, as I said, it doesn't seem to be a big problem where I work.)