A monad is just a monoid in the category of endofunctors, what's the problem?

That particular phrasing is by James Iry, from his highly entertaining Brief, Incomplete and Mostly Wrong History of Programming Languages, in which he fictionally attributes it to Philip Wadler.

The original quote is from Saunders Mac Lane in Categories for the Working Mathematician, one of the foundational texts of Category Theory. Here it is in context, which is probably the best place to learn exactly what it means.

But, I'll take a stab. The original sentence is this:

All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor.

X here is a category. Endofunctors are functors from a category to itself (which is usually all Functors as far as functional programmers are concerned, since they're mostly dealing with just one category; the category of types - but I digress). But you could imagine another category which is the category of "endofunctors on X". This is a category in which the objects are endofunctors and the morphisms are natural transformations.

And of those endofunctors, some of them might be monads. Which ones are monads? Exactly the ones which are monoidal in a particular sense. Instead of spelling out the exact mapping from monads to monoids (since Mac Lane does that far better than I could hope to), I'll just put their respective definitions side by side and let you compare:

A monoid is...

  • A set, S
  • An operation, • : S × S → S
  • An element of S, e : 1 → S

...satisfying these laws:

  • (a • b) • c = a • (b • c), for all a, b and c in S
  • e • a = a • e = a, for all a in S

A monad is...

  • An endofunctor, T : X → X (in Haskell, a type constructor of kind * -> * with a Functor instance)
  • A natural transformation, μ : T × T → T, where × means functor composition (μ is known as join in Haskell)
  • A natural transformation, η : I → T, where I is the identity endofunctor on X (η is known as return in Haskell)

...satisfying these laws:

  • μ ∘ Tμ = μ ∘ μT
  • μ ∘ Tη = μ ∘ ηT = 1 (the identity natural transformation)

With a bit of squinting you might be able to see that both of these definitions are instances of the same abstract concept.


First, the extensions and libraries that we're going to use:

{-# LANGUAGE RankNTypes, TypeOperators #-}

import Control.Monad (join)

Of these, RankNTypes is the only one that's absolutely essential to the below. I once wrote an explanation of RankNTypes that some people seem to have found useful, so I'll refer to that.

Quoting Tom Crockett's excellent answer, we have:

A monad is...

  • An endofunctor, T : X -> X
  • A natural transformation, μ : T × T -> T, where × means functor composition
  • A natural transformation, η : I -> T, where I is the identity endofunctor on X

...satisfying these laws:

  • μ(μ(T × T) × T)) = μ(T × μ(T × T))
  • μ(η(T)) = T = μ(T(η))

How do we translate this to Haskell code? Well, let's start with the notion of a natural transformation:

-- | A natural transformations between two 'Functor' instances.  Law:
--
-- > fmap f . eta g == eta g . fmap f
--
-- Neat fact: the type system actually guarantees this law.
--
newtype f :-> g =
    Natural { eta :: forall x. f x -> g x }

A type of the form f :-> g is analogous to a function type, but instead of thinking of it as a function between two types (of kind *), think of it as a morphism between two functors (each of kind * -> *). Examples:

listToMaybe :: [] :-> Maybe
listToMaybe = Natural go
    where go [] = Nothing
          go (x:_) = Just x

maybeToList :: Maybe :-> []
maybeToList = Natural go
    where go Nothing = []
          go (Just x) = [x]

reverse' :: [] :-> []
reverse' = Natural reverse

Basically, in Haskell, natural transformations are functions from some type f x to another type g x such that the x type variable is "inaccessible" to the caller. So for example, sort :: Ord a => [a] -> [a] cannot be made into a natural transformation, because it's "picky" about which types we may instantiate for a. One intuitive way I often use to think of this is the following:

  • A functor is a way of operating on the content of something without touching the structure.
  • A natural transformation is a way of operating on the structure of something without touching or looking at the content.

Now, with that out of the way, let's tackle the clauses of the definition.

The first clause is "an endofunctor, T : X -> X." Well, every Functor in Haskell is an endofunctor in what people call "the Hask category," whose objects are Haskell types (of kind *) and whose morphisms are Haskell functions. This sounds like a complicated statement, but it's actually a very trivial one. All it means is that that a Functor f :: * -> * gives you the means of constructing a type f a :: * for any a :: * and a function fmap f :: f a -> f b out of any f :: a -> b, and that these obey the functor laws.

Second clause: the Identity functor in Haskell (which comes with the Platform, so you can just import it) is defined this way:

newtype Identity a = Identity { runIdentity :: a }

instance Functor Identity where
    fmap f (Identity a) = Identity (f a)

So the natural transformation η : I -> T from Tom Crockett's definition can be written this way for any Monad instance t:

return' :: Monad t => Identity :-> t
return' = Natural (return . runIdentity)

Third clause: The composition of two functors in Haskell can be defined this way (which also comes with the Platform):

newtype Compose f g a = Compose { getCompose :: f (g a) }

-- | The composition of two 'Functor's is also a 'Functor'.
instance (Functor f, Functor g) => Functor (Compose f g) where
    fmap f (Compose fga) = Compose (fmap (fmap f) fga)

So the natural transformation μ : T × T -> T from Tom Crockett's definition can be written like this:

join' :: Monad t => Compose t t :-> t
join' = Natural (join . getCompose)

The statement that this is a monoid in the category of endofunctors then means that Compose (partially applied to just its first two parameters) is associative, and that Identity is its identity element. I.e., that the following isomorphisms hold:

  • Compose f (Compose g h) ~= Compose (Compose f g) h
  • Compose f Identity ~= f
  • Compose Identity g ~= g

These are very easy to prove because Compose and Identity are both defined as newtype, and the Haskell Reports define the semantics of newtype as an isomorphism between the type being defined and the type of the argument to the newtype's data constructor. So for example, let's prove Compose f Identity ~= f:

Compose f Identity a
    ~= f (Identity a)                 -- newtype Compose f g a = Compose (f (g a))
    ~= f a                            -- newtype Identity a = Identity a
Q.E.D.

The answers here do an excellent job in defining both monoids and monads, however, they still don't seem to answer the question:

And on a less important note, is this true and if so could you give an explanation (hopefully one that can be understood by someone who doesn't have much Haskell experience)?

The crux of the matter that is missing here, is the different notion of "monoid", the so-called categorification more precisely -- the one of monoid in a monoidal category. Sadly Mac Lane's book itself makes it very confusing:

All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor.

Main confusion

Why is this confusing? Because it does not define what is "monoid in the category of endofunctors" of X. Instead, this sentence suggests taking a monoid inside the set of all endofunctors together with the functor composition as binary operation and the identity functor as a monoidal unit. Which works perfectly fine and turns into a monoid any subset of endofunctors that contains the identity functor and is closed under functor composition.

Yet this is not the correct interpretation, which the book fails to make clear at that stage. A Monad f is a fixed endofunctor, not a subset of endofunctors closed under composition. A common construction is to use f to generate a monoid by taking the set of all k-fold compositions f^k = f(f(...)) of f with itself, including k=0 that corresponds to the identity f^0 = id. And now the set S of all these powers for all k>=0 is indeed a monoid "with product × replaced by composition of endofunctors and unit set by the identity endofunctor".

And yet:

  • This monoid S can be defined for any functor f or even literally for any self-map of X. It is the monoid generated by f.
  • The monoidal structure of S given by the functor composition and the identity functor has nothing do with f being or not being a monad.

And to make things more confusing, the definition of "monoid in monoidal category" comes later in the book as you can see from the table of contents. And yet understanding this notion is absolutely critical to understanding the connection with monads.

(Strict) monoidal categories

Going to Chapter VII on Monoids (which comes later than Chapter VI on Monads), we find the definition of the so-called strict monoidal category as triple (B, *, e), where B is a category, *: B x B-> B a bifunctor (functor with respect to each component with other component fixed) and e is a unit object in B, satisfying the associativity and unit laws:

(a * b) * c = a * (b * c)
a * e = e * a = a

for any objects a,b,c of B, and the same identities for any morphisms a,b,c with e replaced by id_e, the identity morphism of e. It is now instructive to observe that in our case of interest, where B is the category of endofunctors of X with natural transformations as morphisms, * the functor composition and e the identity functor, all these laws are satisfied, as can be directly verified.

What comes after in the book is the definition of the "relaxed" monoidal category, where the laws only hold modulo some fixed natural transformations satisfying so-called coherence relations, which is however not important for our cases of the endofunctor categories.

Monoids in monoidal categories

Finally, in section 3 "Monoids" of Chapter VII, the actual definition is given:

A monoid c in a monoidal category (B, *, e) is an object of B with two arrows (morphisms)

mu: c * c -> c
nu: e -> c

making 3 diagrams commutative. Recall that in our case, these are morphisms in the category of endofunctors, which are natural transformations corresponding to precisely join and return for a monad. The connection becomes even clearer when we make the composition * more explicit, replacing c * c by c^2, where c is our monad.

Finally, notice that the 3 commutative diagrams (in the definition of a monoid in monoidal category) are written for general (non-strict) monoidal categories, while in our case all natural transformations arising as part of the monoidal category are actually identities. That will make the diagrams exactly the same as the ones in the definition of a monad, making the correspondence complete.

Conclusion

In summary, any monad is by definition an endofunctor, hence an object in the category of endofunctors, where the monadic join and return operators satisfy the definition of a monoid in that particular (strict) monoidal category. Vice versa, any monoid in the monoidal category of endofunctors is by definition a triple (c, mu, nu) consisting of an object and two arrows, e.g. natural transformations in our case, satisfying the same laws as a monad.

Finally, note the key difference between the (classical) monoids and the more general monoids in monoidal categories. The two arrows mu and nu above are not anymore a binary operation and a unit in a set. Instead, you have one fixed endofunctor c. The functor composition * and the identity functor alone do not provide the complete structure needed for the monad, despite that confusing remark in the book.

Another approach would be to compare with the standard monoid C of all self-maps of a set A, where the binary operation is the composition, that can be seen to map the standard cartesian product C x C into C. Passing to the categorified monoid, we are replacing the cartesian product x with the functor composition *, and the binary operation gets replaced with the natural transformation mu from c * c to c, that is a collection of the join operators

join: c(c(T))->c(T)

for every object T (type in programming). And the identity elements in classical monoids, which can be identified with images of maps from a fixed one-point-set, get replaced with the collection of the return operators

return: T->c(T) 

But now there are no more cartesian products, so no pairs of elements and thus no binary operations.