How many different concepts of "token equivalence" are there in TeX?

I think the best way to think about it is that the “basic equality” of tokens is that character tokens are equal if they have same character code and catcode. command tokens are equal if they have the same name.

Then delimited macro parsing requires equal tokens.

\ifx tests if the “definition” of the two tokens is equal. Where for a macro the definition is the list of tokens in its definition (first level expansion) for a primitive each primitive has a unique definition and for a character token (and command tokens let to a character token) the definition encapsulates the character and catcode.

\if differs from \ifx in the way it uses expansion to determine the tokens to be tested but apart from that, it uses a modified form of equality where only the character code not catcode is considered for character tokens and all command tokens not \let to character tokens are considered equal.

\ifcat is the same as \if except it uses the catcode not the character code.


⟨token 1⟩ is the same as ⟨token 2⟩ if with
\long\def\tempa{⟨token 1⟩} and
\long\def\tempb{⟨token 2⟩} and
\long\def\firstoftwo#1#2{#1} and
\long\def\secondoftwo#1#2{#2}
the test
\ifx\tempa\tempb\expandafter\firstoftwo\else\expandafter\secondoftwo\fi
yields \firstoftwo.

Pitfalls/Problems:

This test is not expandable as temporary assignments (\tempa and \tempb) are required.
⟨token 1⟩ and/or ⟨token 2⟩ being defined in terms of \outer is a problem.
⟨token 1⟩ and/or ⟨token 2⟩ being an explicit character token of catcode 1 or 2 or 6 is a problem.
⟨token 1⟩ and/or ⟨token 2⟩ being an implicit character token of catcode 6 is a problem.


\ifx⟨token 1⟩⟨token 2⟩... tests if ⟨token 1⟩ and ⟨token 2⟩ have the same meaning.

\if⟨token 1⟩⟨token 2⟩... tests if ⟨token 1⟩ and ⟨token 2⟩ have the same character code. Expandable tokens are expanded while gathering ⟨token 1⟩ and ⟨token 2⟩. All unexpandable control sequence tokens that are not implicit character tokens are assumed to have the same character code—a character code which no character has. With implicit characters, be they control symbol tokens or control word tokens or active character tokens, the character code of the character token is assumed which they are let equal to. Active characters that are undefined are assumed to have the character code of their non-active pendants if preceded by \noexpand.

\ifcat⟨token 1⟩⟨token 2⟩... tests if ⟨token 1⟩ and ⟨token 2⟩ have the same category code. Expandable tokens are expanded while gathering ⟨token 1⟩ and ⟨token 2⟩. All unexpandable control sequence tokens that are not implicit character tokens are assumed to have the same category code—a category code which no character has. With implicit characters, be they control symbol tokens or control word tokens or active character tokens, the category code of the character token is assumed which they are let equal to. Active characters that are undefined are assumed to have category code 13(active) if preceded by \noexpand.


Control symbol tokens and control word tokens are equal if they have the same name. I.e., if applying \string yields the same sequence of character tokens.

Except for some edge cases:

The nameless control sequence token can be created in two ways:

  1. Via \csname\endcsname.
  2. Via a single character of category code 0 (escape), i.e., \, at the end of a line of .tex-input while the value of the integer-parameter \endlinechar is not within the range of possible coding-points for characters.

Applying \string to the nameless control sequence yields a sequence of character-tokens:

⟨current escapechar-token⟩c12s12n12a12m12e12⟨current escapechar-token⟩e12n12d12c12s12n12a12m12e12

⟨current escapechar-token⟩ denotes no token at all/nothing in case of the integer-parameter \escapechar not being within the range of possible coding-points for characters.

In case of the integer-parameter \escapechar being within the range of possible coding-points for characters ⟨current escapechar-token⟩ denotes a character token whose character code equals the value of \escapechar and whose category code is 10(space) in case of \ecapechar having the value 32 and whose category code is 12(other) in case of the value of \ecsapechar differing from 32.

Thus with \string alone you cannot distinguish the nameless control sequence from the control-word-token whose name is
csname⟨character denoted by current value of \escapechar⟩endcsname.

In the edge case of these two tokens having been assigned the same meaning, \ifx also is not suitable for distinguishing them.

You can distinguish them

  • by means of delimited arguments.
  • by defining temporary macros and \ifx-comparing them.

The following example

\expandafter\def\csname\endcsname{Some Definition.}

\begingroup
\def\firstofone#1{#1}
\catcode`\/=0
\catcode`\\=11
/firstofone{%
   /endgroup
   /def/csname\endcsname{Some Definition.}%
}%

\endlinechar=-1\relax

\message{^^J^^JStringification:^^J}
\message{^^J|\string\
|}
\message{^^J|\expandafter\string\csname csname\string\endcsname\endcsname|}
\message{^^JMeaning:^^J}
\message{^^J\meaning\
}
\message{^^J\expandafter\meaning\csname csname\string\endcsname\endcsname}
\message{^^JDirect \string\ifx-comparison:^^J}
\message{^^JTokens have %
\expandafter\ifx\csname csname\string\endcsname\endcsname\
equal meaning\else different meanings\fi.}
\message{^^J\string\ifx-comparison of temporary macros:}
\def\tempa{\
}%
\expandafter\def\expandafter\tempb\expandafter{\csname csname\string\endcsname\endcsname}%
\message{^^JTemporary macros defined from tokens have \ifx\tempa\tempb
equal meaning\else different meanings\fi.}
\message{^^J}
\csname stop\endcsname

\bye

yields the following messages on the terminal:

Stringification:
 
|\csname\endcsname| 
|\csname\endcsname|

Meaning:
 
macro:->Some Definition. 
macro:->Some Definition.

Direct \ifx-comparison:
 
Tokens have equal meaning.

\ifx-comparison of temporary macros:

Temporary macros defined from tokens have different meanings.

Character tokens are equal if they have the same category code and the same character code.

The crucial part is finding out whether a token is a control word token/control symbol token or an explicit character token.

This is crucial because there are edge cases.

E.g., you cannot distinguish an active character token let equal to one of its non-active pendants from that non-equal pendant other than by either using an argument delimited by one of those tokens or defining temporary macros as shown above.

E.g., after

\begingroup
\catcode`\a=13
\@firstofone{\endgroup\let a=}a

distinguishing active-a from catcode-11(letter)-a is possible only by either using an argument delimited by one of those tokens or defining temporary macros as shown above.

The same applies to one-letter-control-words/symbols while \escapechar has a negative value.

E.g., with

\escapechar=-1\relax
\let\a=a

distinguishing \a from catcode-11(letter)-a is possible only by either using an argument delimited by one of those tokens or defining temporary macros as shown above. Alternatively you could exploit the fact that while a has the category code 11(letter) \a is not a control symbol token but a control word token and that therefore unexpanded-writing \a yields a sequence with a space-character following the a while unexpanded-writing a yields a sequence with no space-character following the a. If in between the catcode of a is switched to something else, e.g. 12(other), then this does not work out because then \a is a control symbol token and TeX does not append space characters when unexpanded-writing a control symbol token.