Why do some tests have a (nonzero) minimum score?

According to the Encyclopedia of Research Design (page 629), it signals that these are interval variables, not ratio variables:

Standardized tests, including Intelligence Quotient (IQ), Scholastic Achievement Test (SAT), Graduate Record Examination (GRE), Graduate Management Admission Test (GMAT), and Miller Analogies Test (MAT) are also examples of an interval scale. For example, in the IQ scale, the difference between 150 and 160 is the same as that between 80 and 90. Similarly, the distance in the GRE scores between 350 and 400 is the same as the distance between 500 and 550.

Standardized tests are not based on a "true zero" point that represents the lack of intelligence. These standardized tests do not even have a zero point. The lowest possible score for these standardized tests is not zero. Because of the lack of a "true zero" point, standardized tests cannot make statements about the ratio of their scores. Those who have an IQ score of 150 are not twice as intelligent as those who have an IQ score of 75. Similarly, such a ratio cannot apply to other standardized tests including SAT, GRE, GMAT, or MAT.

Salkind, Neil J., ed. Encyclopedia of research design. Vol. 1. Sage, 2010.

I might be able to help answer this from a background in Psychometrics. Where I work we produce many tests that are all standardised and then equated to be put onto the same scale. These scales however, from one test to another, are unrelateble, unless of course the two differing tests have an equating study completed to determine the shift factor to transfer a scale from say Test 1 to the scale of Test 2.

To construct a scale, we first analyse the test data, so student response data and item(question) data. We do the analysis using the Rasch Model, which only takes into account two variables, the students' abilities and the items' difficulties. This allows us to construct a dataset that contains the logit levels of the students' abilities and of the items' difficulties.

Definition of Logit:

A logit is a unit of measurement to report relative differences between candidate ability estimates and item difficulties. Logits are an equal interval level of measurement, which means that the distance between each point on the scale is equal (1-2=99-100).

Once the logit tables have been created they can be used to create a scale by applying a simple linear transformation, such as:

scale score = 10 * logit difficulty + 250

In some of the work I do we have scale scores that actually are below 0, however most of the work I do, scale scores are constructed such that the minimum is around 200 or so. The construction of the scale is for the most part entirely arbitrary.

If you wish to see how the logits of students and items are calculated please read:

https://en.wikipedia.org/wiki/Rasch_model#The_mathematical_form_of_the_Rasch_model_for_dichotomous_data

Also as an extra note: There are other models for doing test analysis, such as the 2PL (Introduces an additional parameter to Rasch Model(1PL), the items discrimination), the 3PL (Introduces an additional parameter to the 2PL, which is a guess factor, this creates a minimum probability of getting the item incorrect which depends on your guess value), there is also a 4PL which adds an additional parameter(the slip paremeter, that creates a ceiling probability, that is not 1, for getting an item correct).

I hope this helps and provides some extra information that may be of use.

In addition to the reasons already mentioned: because we want a more natural scale for the answers: sometimes scores for an individual answer are on a scale 1-5 or 1-10, because it is more human-friendly than 0-4 or 0-9 (unless the human is a programmer). Adding individual scores up then results in a nonzero minumum.

Why do some tests have a (nonzero) minimum score?

Tags:

Grading

Exams

Related

Recent Posts