When is \catcode executed?

TeX absorbs one record (typically a line) at a time, but doesn’t immediately tokenize it. It rather normalizes it changing the EOL character(s), if the operating system uses it (them), with the character corresponding to the current value of \endlinechar.

Then it proceeds reading the line tokenizing the input as needed for determining what is coming along.

For instance, if it finds \foo {xyz} and \foo is a single argument macro, it will ignore the space and tokenize the open brace and whatever it finds until the matching closed brace is found (and tokenized). Going on with the example, if the expansion of \foo includes something like \catcode\endlinechar=12, the next end of line character not yet tokenized will be interpreted as having category code 12. So catcode changes are done in the stomach, but they can and will influence input not yet entered in the mouth, that is, not yet tokenized.

However, keep in mind that TeX interprets no instruction and expands no macro when it’s absorbing the replacement text of a macro. This is mainly what Knuth refers to with characters already inside token lists.

A curious example:

x\obeyspaces x\bye

You may know that \obeyspaces is simply

\catcode`\ =\active

and that Plain TeX sets the active space character to expand to a catcode 10 space. You can check that the output is

enter image description here

Wait! Doesn't TeX ignore spaces after control words? Yes and no. When tokenizing the input, TeX determines that the next character after s is not a letter (that is, the internal table of catcodes doesn't assign it the code 11), so it stops searching for the control sequence names and the input scanner goes into state “skipping blanks”, but the space has not yet been tokenized. The token \obeyspaces is a parameterless macro, so it's expanded and the category code change performed. Now TeX needs more tokens, so it tokenizes the next character, which happens to be a space and it assigns it category code 13 as instructed by the (just changed) catcode table: since the next character doesn't have catcode 10, the state changed from “skipping blanks” to “middle of line”. Then TeX expands the active character it and a space in output appears.


According to the TeXbook, characters in a file are first converted to tokens with catcodes ("mouth") and then any nonexpandable commands are executed ("stomach").

You have to be careful here about what is meant by “and then”.

It is true that characters are converted into tokens by the “mouth”, and that these tokens are passed to the “stomach”. But if you interpreted it as saying that all characters in a file are first tokenized, and only then (after everything has been tokenized) the “stomach” comes into play — then that's not true. Instead, the two systems interact: the “mouth” may pass a command to the “stomach”, which takes some actions and then asks the “mouth” for more tokens, and so on. The actions taken in the “stomach” can influence the future workings of the “mouth”.

It may help to consider the other names of the “mouth” and “stomach”: they are called the “input processor”[+“expansion processor”] and “execution processor” in TeX by Topic, and “syntactic routines” and “semantic routines” by Knuth in the TeX program:

from the TeX program

To a first approximation, you can think of the main control loop of TeX as a hungry stomach, simply executing commands one after another, and repeatedly asking the mouth for tokens either after completing a previous command, or while executing a particular command. For example, suppose you had the following input file:

hi\hskip 10 pt\end

Then the stomach gets

  • the token h₁₁ (which it “executes” essentially as a command to typeset that character — puts that character in the appropriate list).
  • the token i₁₁ (which it “executes”, same as above).
  • the token \hskip — at this point, the stomach executes the hskip command, as part of which the syntactic routines (mouth) are invoked and asked for tokens, to scan the glue specification 10 pt.
  • the token \end (which it executes as a command).

So when The TeXbook gives (on page 38) the example you mentioned, of {\hskip 36 pt} being converted into the sequence of tokens {₁, hskip, 3₁₂, 6₁₂, p₁₁, t₁₁, }₂, it is a bit misleading: although the characters do indeed get converted to those tokens at some point, this tokenization (of p and t for example) does not fully happen before the “stomach” sees the \hskip command; much of it happens after.

\catcode is a nonexpandable command, so it should be executed after character tokens have been assigned a catcode. […] So if a \catcode command is encountered in the "mouth", does TeX automatically execute it so that it will affect any tokens after it? Or is it still executed in the stomach, after tokens following it might have been assigned the wrong catcode already?

There is a lot of (understandable) confusion here, but the answer is: \catcode is executed in the stomach, after previous characters have been assigned catcodes and turned into tokens.

  • If \catcode is encountered in the mouth when the stomach is looking for a command to execute, then it is passed to the stomach, executed there, and affects future tokens.

  • If \catcode is encountered in the mouth when the stomach is simply collecting tokens (such as in the definition of a macro or a token-list assignment) then it is simply collected as yet another token (not executed), and future tokens (in the list being collected) will be scanned according to the catcodes when the collection started.

To illustrate, consider \catcode`S=3 which changes the category code of the letter S to 3 (namely math shift, like $).

An example of the first case:

hello \catcode`S=3 SxS 
\bye

Result:
hello x

An example of the second case:

\def\change{hello \catcode`S=3 SxS}

\change

now SyS

\bye

Result:
hello SxS

(Here, first the definition of \change was collected as a token list in which there was an explicit “letter” token S, so when we used \change it expanded to a list containing that letter-S token which is what got typeset. But that expansion of \change also contained a \catcode command, which got executed this time and affected future tokens.)

At which "organ"/stage of TeX is \catcode executed?

Simple answer: In the stomach.