Are there 3 or 4 quartiles? 99 or 100 percentiles?

If you look in a non-mathematical dictionary, you will often find both definitions. For example, http://www.oxforddictionaries.com/us/definition/american_english/quartile defines quartile as

1 Each of four equal groups into which a population can be divided according to the distribution of values of a particular variable.

1.1 Each of the three values of the random variable that divide a population into four groups.

It is possible to find some examples where the first definition is used. In a passage in Digest of Education Statistics 1999, edited by Thomas D. Snyder, page 157, Table 143 has four columns under the heading "Socioeconomic status quartile", labeled Lowest, Second, Third, and Highest. Moreover, in footnote 1 of Table 144, we find the passage

The "Low" SES group is the lowest quartile; the "Middle" SES group is the middle two quartiles; and the "High" SES group is the upper quartile.

So a "quartile" in this context is a subset of the sample to which an individual belongs.

The Wikipedia article on quartile cites only one reference, the article "Sample quantiles in statistical packages", which, as the title suggests, is all about computing numbers to describe quantiles, in particular, the return value of the R function quantile(). The article therefore is mainly (exclusively?) concerned with the correct way to compute the numerical values that divide the data into quartiles (or other quantiles). But if you go to other sources such as the NIST/SEMATECH e-Handbook of Statistical Methods, you will find passages such as

The box plot uses the median and the lower and upper quartiles (defined as the 25th and 75th percentiles). If the lower quartile is Q1 and the upper quartile is Q3, then the difference (Q3 - Q1) is called the interquartile range or IQ.

Here, clearly each quartile is a number: the lower quartile is not bounded by Q1; it is Q1 in this context, which is a number that can be subtracted from another number.

My attempts to search for "quartile" on the Web seem to dredge up many more examples of the "number" usage than of the "subset" usage. I can guess a few of reasons for this, though I have not found much other discussion of it:

  1. Unless the number of observations in your sample is divisible by $4$, you will not be able to separate the sample into four equal parts by rank.

  2. Much of statistics has the goal of describing data succinctly, for example by a mean and standard deviation. The four lists of members of each of four equal (or nearly-equal) subsets of a large sample do not constitute a succinct description; in some cases this can be almost as verbose as the entire data set. On the other hand, it requires just three numbers to describe the boundaries between these subsets of the data, hence those three numbers appear frequently in the literature.

  3. There are several competing ways to compute the values that should serve as the "dividing lines" between the four (not necessarily exactly equal) ranked subsets of the data. This leads to a great deal written about "quartiles" using the "number" definition.

But notice that in the quoted passages from the Digest and Handbook, above, there is no ambiguity whatsoever about which meaning of "quartile" is intended. If a particular use of the word could possibly be ambiguous, one can first use the word in an unambiguous context to establish its meaning, or one can simply define it.


The word quartile refers to both the four partitions (or quarters) of the data set, and to the three points that mark these divisions. After all, we can't have one without the other.

When citing a value for a quartile, though, we are specifically referring to the three dividing points, else it'd be meaningless. Thus, the first, second, and third quartiles have a specific value in a data set. These points are often referred to as the lower, middle, and upper quartile.

On the other hand, we can say that there are multiple data points contained in the first, second, third, and fourth quartiles. In this context, we refer to the actual partition.

It all depends on context. The word is malleable, but the intent ought to be clear when used properly.


Note: This is really just a long comment - but maybe it's helpful.

This seems to me to be purely a matter of context. I have never seen anything like this before, with only 3 quartiles - it's written into the word itself that there should be 4 (QUART-iles). That said, this kind of thing happens in mathematics relatively often - there will be multiple uses of the same word or piece of notation. Many times the overlap stems from two different situations with a similar defining characteristic, or with similar associated mental pictures/ideas. Likewise in this case, both meanings are highly similar.

My advice for when doing mathematics is to use whichever one you feel is most convenient or applicable, and if need be, leave a remark or something explaining the convention you chose.

Also being a prospective graduate student and studying for the GRE (both general and in mathematics) I can say that I have never seen a practice question which is ambiguous. Although I can't find any kind of statement from the test-makers that there is one defining way, I can say that when I have seen such questions, they are of the form above. Even if they aren't always that way, I can assure you that there will never be ambiguity on the exam. That is to say, if you compute one, and it is not a choice, computing the other will always be there or vice versa. There will never be both listed as correct answers without clarifying.