Why does mathematical convention deal so ineptly with multisets?

This question reminded me of several notes by the influential computer scientist Edsger W. Dijkstra, who spent a lot of time thinking about how our notation can affect how we think and reason formally. (He preferred the term "bag" to "multiset", as in "a bag of positive integers.")

For example:

  • In writing about how computing science influenced mathematical style:

    I similarly prefer products to be defined on bags of factors rather than on sequences of factors. In the last case, one of the first things one has to do is to point out that as far as the value of the product is concerned, the order in which the factors occur in the sequence is irrelevant. Why then order them in the first place?

  • In discussing the notational conventions he uses, and why he uses them:

    Not making superfluous distinctions should always be encouraged; it is bad enough that we don’t have a canonical representation for the unordered pair.

Indeed--- forget multisets in general: why don't we even have the unordered pair? In this note, Dijkstra observes how a lack of a standard notation for this object often prevents us from recognizing that two statements are the same. (We are often fooled by superficial differences into saying things like "it suffices" to prove one of A, B, when from a logical point of view there is no difference between A and B.)

For my part, I think that we unconsciously "avoid" the formalism of multisets because we are generally uncomfortable with unordered things as unordered things. It is far more congenial to the human mind to think in terms of ordered things whose order then might, or might not, matter.

For some reason, the unordered set concept really "caught on" and we have all of this standard terminology associated with it. I would regard this as exceptional. I would definitely not regard it as evidence that sets are truly "first-class citizens" in written mathematics:

  • Have you noticed how often it is possible to simplify the language and notation of a published argument by using "arbitrary" index sets, instead of initial segments of $\mathbb{N}$? (People introduce integer subscripts that play no role in organizing the ideas of an argument all the time.)

  • Have you noticed how often people explicitly rule out the empty set in situations when it only complicates a statement or proof?

We could also avoid "a lot of circumlocution" if we only made better use of firmly established things! But do we?

I think that many mathematicians simply "think in lists." I think this preference has its roots in how humans communicate. Outside of mathematics, it is almost impossible to communicate the elements of an unordered collection without choosing some arbitrary order. (e.g. "Who is going to your party next weekend?" "What do we need from the grocery store?" "What are the nations of Europe?") We naturally communicate the elements of finite sets as lists, and then understand that they are sets. "De-listing" is so natural to us that we barely notice the "circumlocutions" it requires.

Another reason I think that multisets have not caught on is terminological.

  • The word "multiset" sounds too technical for what it is.

  • Dijkstra's "bag," on the other hand, doesn't sound technical enough. (To me, it only sounds OK for collections of "elementary" things, like integers).

  • Neither "multiset" nor "bag" gives rise to a decent-sounding subobject name.

(Note, for example, how the OP unconsciously avoided the awful word "submultiset" through repeated use of the phrase "multiset containment".)


I agree with Qiaochu that historical inertia plays the largest rôle: short of independent invention, you can’t use what you’ve never seen, and if there’s a familiar alternative, you probably won’t use something that’s less familiar to you, especially if you expect it to be somewhat unfamiliar to your audience. I don’t think that I ever saw them singled out as a separate kind of object until I decided to use Scheinerman’s text for our elementary discrete math course not too many years ago.

The fact that they’re a bit awkward to formalize probably also has had some effect. Had they come into common use early enough, this probably wouldn’t have mattered: ordered pairs are also a little awkward to formalize, but in most contexts no one cares. But their widespread utility was recognized late enough that (a) formalization was more of a concern, and (b) a variety of work-arounds was already available and often in use.

Why the general utility of the concept wasn’t recognized earlier is probably an unanswerable question, though I suspect that the variety of guises in which it can appear, or perhaps better, the variety of equivalent formulations in terms of more ‘standard’ objects, has something to do with the matter. One could also point to the fact that many applications in which they play a more than incidental rôle seem to fall in the area between logic and computer science.


Here's another reason I think multisets have not entered common usage: they are very complicated! Blizard's multiset paper is just full of… stuff. Here are a couple of examples.

  1. There are at least three analogs of the 'subset' relation. Writing $x\in_n M$ for "$x$ occurs in the multiset $M$ exactly $n$ times", and $x\in M$ for "$x\in_n M$ for some positive $n$", one can define:

    • $A\Subset B$ if $x\in_n A$ implies $x\in_m B$ for some $m\ge n$.
    • $A\sqsubset B$ if $x\in_n A$ implies $x\in_n B$
    • $A\subset{\llap\sqsubset } B$ if $A\Subset B$ and $x\in B$ implies $x\in A$.

    See page 43 of Blizard.

  2. A union of multisets may fail to be a multiset. Let $M_i$ be the multiset that contains the single element $\ast$ with multiplicity $i$. Then $\bigcup M_i$ is not a multiset; this holds for both the $\Cup$ and $\uplus$ operations I mentioned in the original question.