Why are academic institutions now looking to judge their students in multiple choice questions tests?

Because they are graded easily and fast, and there is no room for subjectivity.

The answer to question 1 is option b). If you marked b), you get points, if you didn't, you are wrong. The grader doesn't even need to know what the question is about, just that the right answer is b). Also, there is no middle ground, no partial credit, and no subjectivity involved.

Of course, there are a whole lot of problems with this system: poorly phrased questions, the student doesn't show the process, just the final result...


Multiple Choice Questions are nowadays very common in many of the world's institutes, particularly in entrance tests they conduct MCQs for the students. Why are these academic institution now looking to judge their students in MCQ tests?

In my experience (graduate admission to mathematics departments in the U.S.), multiple choice tests are not particularly valued or respected. People use them because of a lack of better options, not because these tests are considered a suitable measure of student accomplishments.

The problem is that course content and grading standards vary substantially between institutions. This makes it difficult to compare applicants to graduate school: how does an A- in a beginning graduate course at the University of the Middle of Nowhere compare with a B+ in an advanced undergraduate course at Caltech? Is it better to be the 4th strongest math major in your class at Princeton, or the best student in 25 years at Middle of Nowhere? For applicants from prestigious universities, letters of recommendation and comparisons with applicants in past years can provide plenty of information, but this doesn't work so well for applicants from less prestigious universities. Standardized tests help provide an objective measure that can level the playing field. (Of course they don't level it completely, but an applicant from the middle of nowhere who gets outstanding GRE scores attracts a lot more attention than one with mediocre GRE scores.)

Keep in mind there are a lot of colleges and universities out there, roughly 2500 in the U.S. alone. This means in a typical year, about a hundred students in the U.S. will be the best mathematics major to graduate from their department in the last 25 years. Some of these students are amazing, but in some cases it's not so impressive, since the department just doesn't get strong students. There has to be some measure beyond comparison with unknown classmates.

The reason why multiple choice tests are used (as opposed to essay questions) is because they are what's out there. If the GRE subject exam involved essay questions, that would be objectively better, but it would be nightmare to grade. For comparison, grading the Putnam exam takes a tremendous amount of work, and I'd estimate that an essay-based GRE would take at least ten times as many person-hours to grade, maybe more. I don't think anyone cares enough to want to develop the test, organize the test administration and grading, and pay for everything (with a combination of high test fees or outside funding). It's just not clear that it would be worthwhile.


Benefits

  • Administration is efficient. Grading with large numbers of test takers can be automated either by using computer administration or by scannable answer sheets.
  • Once the scoring system is determined, there is no subjectivity in the application of the scoring system. This increases the reliability and predictive validity of the test. It also increases efficiency in that test takers are less likely to query the scoring.
  • There are tools to evaluate the internal properties of the test. These in turn allow for a principled assessment of reliability and also options for refining the test. Using classical test theory or item response theory, you have a range of tools to identify problematic items or response options. You can also evaluate the overall reliability of the test. Simple steps include looking at the proportion of people answering the item correctly (good items are approximately in the 50 to 70% correct region) and the correlation between answering the item correctly and the total score on the test (good items have item-total correlations above say .20 or so).
  • You can administer more items. A general principle of reliability of measurement is that if you can get more observations of behaviour, you will get more reliable measurement. If you write concise items, people can often do one item every minute or so. This is much more than what you get with open response items.
  • You can maintain comparability across alternate forms. There are tools particularly based on item response theory, which allow you to draw on a test bank. As long as you have a certain number of common items, you can change a test over time while still being able to compare scores from different versions of the test (e.g., to compare scores from year to year). Or alternatively, you may be concerned about test security, so different people get exposed to different items. Either way, it's easier to ensure equivalence of scores using multiple choice items.

Problems

Of course there are many problems with multiple choice tests.

  • They do a poor job of measuring certain skills and abilities. For example, they are poor at measuring written skills and interpersonal skills.
  • Writing a large number of good items takes time and skill. In particular, designing multiple choice questions that require deeper analysis rather than surface level recall is often challenging and requires some skill.

Concluding thoughts

Given all of the above, it is understandable why universities might use multiple choice tests to make admission decisions. For certain skills and abilities, they are an efficient means of getting a pretty good predictor of academic performance, especially compared to alternatives (e.g., grades that mean different things from past institutions).

That said, I'm aware of quite a few institutions at least in Australia that would combine a multiple choice section with a written component in an attempt to get a more complete picture of academic ability.