Are ^ and _ the only commands in LaTeX not preceded by a backslash?

In TeX and LaTeX, it is vital to distinguish between "commands" or "control sequences" on the one hand and TeX-special characters on the other. Only the former begin, in general, with a backslash (\) character. (But see the final paragraph below for examples of commands which do not start with a backslash.)

^ and _ are but two examples of TeX-special characters. Others are \ (backslash), {, }, $ ("math shift"), &, #, and % (comment char.). What makes these characters "special" from the point of view of TeX and LaTeX is that they each have a specific special category code, or catcode for short. Various rules apply if one needs to typeset one of the special-catcode characters. For instance, to typeset an ampersand, hashtag, or percent character, one must prefix a backslash character to the corresponding symbol, i.e., input it as \&, \# and \%, respectively. To typeset the backslash symbol in text mode, one must input it as either "\textbackslash" or "\string\". (The latter method also requires the T1 font encoding -- it doesn't work with OT1. With the latter method, a space after \string\ is required if the objective is to generate a single backslash character; a likely undesirable side effect is that a space will be output after the backslash character. In contrast, inputting, say, \string\hello creates \hello, without a space.) In math mode, one outputs a backslash character by typing either \backslash or \setminus, depending on the intended meaning: the math status of \backslash is "ord" (ordinary), whereas that of \setminus is "bin" (binary).

If there are "special" catcodes, there must also be "non-special" catcodes, right? This is indeed the case. The main non-special catcodes are reserved for letters (e.g., a to z and A to Z), space characters, and "other" characters (such as numerals, punctuation marks, apostrophes, ", @, (, ), - and /). Characters with non-special catcodes can be entered directly, i.e., they don't need to be prefixed with anything special. End-of-line, "ignored", and "invalid" characters are also identified by non-special catcodes.

Finally, there is a catcode for active characters. An "active" character is an example of a command which does not start with a backslash. In most TeX installations, the catcode of ~ ("tilde") is "active"; the main use of ~ is to insert an unbreakable space between two words. To typeset a tilde character directly, it is necessary to input it as \~{} in text mode and as \tilde{} in math mode. AFAICT, thouth, stand-alone tildes are rather uncommon in general typographic applications. (Typesetting URL strings is a specialized application, I suppose.) The tilde character occurs frequently as a diacritic or "accent symbol", e.g., \~n in Espa\~na and $\tilde{z}$. Another example: If the babel package is loaded with the option french, the catcodes of certain punctuation marks, such as : and ;, are changed from "other" to "active". This change facilitates the implementation of French typographic practices for these punctuation marks.


Here's an answer from the perspective of the internals of the TeX (to be specific, Knuth TeX) program. You do not need to know any of the following for simply using TeX; this is just if you're curious about how things work internally.

Stomach “commands”

The "main" part of the TeX program (after some initialization etc) is basically a main loop (called main_control in the source code). The structure of the code is somewhat like the following pseudocode:

def main_control():
    while True:
        get_x_token  # Sets `cur_cmd` and `cur_chr`.
        # Now perform some action based on `mode` and `cur_cmd`...
        # ... this action may use the value of `cur_chr`.

This main control routine is what Knuth describes in the manual (The TeXbook) as the “stomach” of TeX: it hungrily waits for tokens (commands, and details thereof), which are delivered to it from the “mouth” of TeX (above, get_x_token).

So from the perspective of the guts of TeX, everything that reaches it is a command: for instance the letter “e” is delivered to the stomach as the command “letter” (with cur_chr = 101). For this “letter” command, the corresponding action (if TeX is in horizontal mode) is simply “append this character to the current hlist in the current font”.

Here are some common “commands” (with their—mostly irrelevant—internal codes), in what cases TeX's stomach gets those commands, and the corresponding action that is taken:

  • letter (11): TeX's stomach gets this command when you type a character regarded as a letter (by default: A..Z, a..z). The action (when in horizontal mode) is to append that character to the current hlist.
  • other_char (12): TeX's stomach gets this command when you type a character like (by default) "'()*+,-./0123456789:;<=>?@[]| or the backtick. The action (when in horizontal mode) is to append that character to the current hlist.
  • spacer (10): TeX's stomach gets this command when you type a character like space (ASCII 32) or tab (ASCII 9). The action is to append the normal inter-word glue or a larger glue, depending on space_factor (etc).
  • hskip (26): TeX's stomach gets this command when you type \hskip or \hfil or something like that. The action is to scan (if necessary) and then append the corresponding glue.
  • assign_int (73): TeX's stomach gets this command when you type certain commands like \tolerance or \day. The action is to call an internal procedure called prefixed_command (which in this case will scan a value for the corresponding integer and assign that value).
  • def (97): TeX's stomach gets this command when you type \def or \edef or \xdef or \gdef. The action is to call an internal procedure called prefixed_command (which in this case will scan a control-sequence name, then parameters, then a definition token list, and assign that meaning).
  • left_brace (1) (begin group): TeX's stomach gets this command when you type (by default) {. The action is to start a new save level.
  • math_shift (3): TeX's stomach gets this command when you type (by default) $. The action is to enter or appropriately exit math mode.
  • sup_mark (7): TeX's stomach gets this command when you type (by default) ^. The action if not in math mode is to print an error and insert_dollar_sign (as error recovery), and if in math mode is to call sub_sup (which handles subscripts and superscripts).
  • par_end (13): TeX's stomach gets this command at the end of each paragraph (\par or a blank line).
  • extension (59): left for extensions to TeX, including some built-in features of TeX that are implemented like extensions just to illustrate. The action is to call do_extension.
  • out_param (5) (output a parameter): TeX's stomach gets this command when it encounters, in a token list, whatever it had put there when you typed #1 or ... or #9.
  • relax (0): TeX's stomach may get this command when you type \relax. The action is to do nothing.

There are about 100 commands that can reach TeX's stomach; TeX extensions like eTeX, pdfTeX, XeTeX or LuaTeX have more. Some of them, like “letter”, occur so frequently that TeX considers part of the “main loop” that is more heavily optimized.

Mouth “commands”

All this was about TeX's “stomach” (the main control loop and semantic routines / action procedures). But TeX's “mouth” (scanning routines) is no less interesting. There's a lot that goes on there, before it delivers a command to the stomach.

Note that in the above, I said that some of these commands were triggered by a single character of input, and also the frequent remarks of “by default”. This is because TeX has a concept of category codes. There are 16 of them and you can read details on page 37 of The TeXbook or many questions on this site, e.g. What are category codes?. I won't repeat all that, but just some examples:

  • When TeX's mouth sees a character with catcode 11 (letter), it sends the command “letter” (also 11) to the stomach.

  • When TeX's mouth sees a character with catcode 12 (other character), it sends the command “other_char” (also 12) to the stomach.

  • When TeX's mouth sees a character with catcode 1 (beginning of group), it sends the command “left_brace” (also 1) to the stomach.

Similarly for catcode 2 (end of group/“right_brace”), catcode 3 (math shift/“math_shift”), catcode 4 (alignment tab/“tab_mark”), catcode 7 (superscript/“sub_mark”), catcode 8 (subscript/“sub_mark”), catcode 10 (space/“spacer”), which includes the ^ and _ mentioned in the question.

But!

  • When TeX's mouth sees a character with catcode 14 (comment, like % by default), it doesn't immediately send anything to the stomach but simply skips to the end of the line -- it will send a command when it encounters a real one.

  • When TeX's mouth sees a character with catcode 13 (active character, like ~ by default), it looks up the meaning of that active character and correspondingly does a macro call or expansion or whatever.

  • When TeX's mouth sees a character with catcode 0 (escape character, like the backslash by default), it scans for a control sequence (basically, either a single nonletter or a sequence of letters), then looks up that control sequence in a hash table to find out its meaning, then either says “undefined control sequence” or expands it or...

A lot can happen in the mouth, e.g. conditionals like \if, or macro calls (anything that was defined with \def or similar), or opening a file with \input, or things like \expandafter. Only when all this “chewing” is done and there's a “real” command does one reach the stomach.

Summary

From a user perspective, TeX has

  • “commands” like \hskip or \llap,
  • active characters like ~ (which too may be called commands),
  • special-catcode characters like $ or ^ or _ (which are not usually called commands), and
  • common-catcode (“letter” or “other char”) characters like e or 5 (which are almost never called commands).

From an internal perspective,

  • characters have catcodes,
  • based on which the mouth may do further scanning or expansion (in the case of a backslash or active characters like ~),
  • and continue to do expansion based on what it got (like \llap is a macro that expands to \hbox to\z@{\hss#1}),
  • and finally delivers commands to the stomach (e.g. \llap results in a make_box command delivered),
  • these “commands” include not only things like “hskip” (which correspond to what you as a user may consider a command \hskip), but also things like “math_shift” and “sup_mark” and even “other_char” and “letter” (often made from a single character of input),
  • but these “commands” do not include things like \input or \if or \string that you may consider a command.

LaTeX can print which characters have a non-literal meaning:

\makeatletter
\def\do#1{\ifnum\catcode\string`#1=12 \else
\message{\expandafter\@gobble\string#1}\fi}
\typeout{}\message{Chars:}%
\do\!\do\"\do\#\do\$\do\%\do\&\do\'\do\(\do\)\do\*\do\+\do\,%
\do\-\do\.\do\/\do\:\do\;\do\<\do\=\do\>\do\?\do\@\do\[\do\\%
\do\]\do\^\do\_\do\`\do\{\do\|\do\}\do\~%
\typeout{}%
\@@end

It prints:

Chars: # $ % & @ \ ^ _ { } ~

If we rerun it with

\documentclass{article}
\usepackage[french]{babel}
\begin{document}

prepended, it prints more:

Chars: ! # $ % & : ; ? @ \ ^ _ { } ~

I tried this (and it works) in TeX Live 2015 and 2018, with latex, pdflatex and lualatex.