What is the differences between mathcode and catcode and how can I use mathcode?

In text a character token just has two properties, its character code and its cat(egory)code. If + is seen in the file it is (normally) given catcode 12 (punctuation) and character code comes from the file encoding so is 43 in this case.

In math mode lists the math atoms need more structure, Each symbol comes from a different font and gets different spacing depending on its class (operator, binary-infix, relation, etc) In typical 1970's style these properties are packed compactly into bit fields in a single integer called a mathcode, which is normally expressed in hex so you can easily pull apart the fields. the mathcode of + in plain tex is set as

\mathcode`\+="202B

which means that it is of class 2 (binary infix), fam0 (the roman font), and character hex 2B which is the decimal 43 the character code of + in the encoding in the roman font.

As egreg noted in the comments the mathcode is only consulted for normal character tokens, catcodes 11 and 12 (letters and punctuation), character tokens with special catcodes like 4 (& normally) retain their special behaviour and their mathcode is not consulted. However if you generate a catcode 12 & from a macro or via \string& then its mathcode will be consulted.

\delcode is similar but packs in a few extra bits as delimiters need more information the delcode of ( in plain is

\delcode`\(="028300

which says that small ( come from position hex 28 in font \fam0 but then you need to switch to character hex 0 in \fam3 to get big brackets. (The font metrics specify chains of glyphs to use to build larger characters if needed, but they need to know where to start.

\mathcode"8000 is a special code that is not looked up in the usual way. If a character has that mathcode, the definition of the active (catcode 13) token is used instead, even though the character itself is not active. this is used in plain and LaTeX to allow ' to work as a normal non-active apostrophe in text but in math it has catcode hex 8000 so the active definition is used, which expands to ^{\prime}.


I want to add one technical point that recently proved crucial to me: while \catcode changes are famously fragile in that they cannot occur after tokenization (specifically, after the argument to a macro is read), \mathcode changes can occur. For example, I had the following macro:

\def\genby#1{\langle#1\rangle}

to specify the generators of some thing in algebra, where #1 was to represent the generating set. If I had several sets of generators of course I wanted to use the mathematically correct notation \genby{S \cup T} rather than the easier but less correct \genby{S, T}. Unfortunately, I kept forgetting and rather than search-and-replace all the commas, I tried to redefine the command:

{\catcode`\,=\active \gdef,{\cup}}
\def\genby#{%
  \langle\bgroup \aftergroup\rangle
  \catcode`\,=\active
  \let\next=
}

(using this trick to avoid tokenization) and while this works okay in $\genby{S,T}$ (producing the equivalent of $\genby{S \cup T}$), it failed entirely in the amsmath construction

\begin{gather}
  \genby{S,T}
\end{gather}

where it had no effect! I immediately recognized this as an instance of "amsmath reads its argument twice" (which I first learned about in this question, which is not the only place it's come up on this site) and figured it was a tokenization issue, blocking the catcode change. So I tried \mathcode instead (also, see egreg's comment):

{\catcode`\,=\active \gdef,{\cup}}
\def\genby#1{%
  \langle\begingroup
  \mathcode`\,="8000
  #1
  \endgroup\rangle
}

or, without using a global definition for the active comma, that could conflict with other packages,

\newcommand{\genby}[1]{%
  \begingroup
  \begingroup\lccode`\~=`\,
  \lowercase{\endgroup\let~\cup}%
  \mathcode`\,=\string"8000
  \langle#1\rangle
  \endgroup}

and this worked again, even in gather.

This feature emphasizes an important theoretical point about math mode: when operating in it, TeX has an intermediate stage of interpretation, similar to tokenization, in which it builds a "math list" that is more than just the textual input ("code") but less than the typeset output ("horizontal list"). The interpretation of mathcodes, delcodes, and so on only occurs at the conversion from a tokenized input stream to a math list, and therefore changing one of them is valid after tokenization. (You still can't do anything once the math list is built, though.)