LaTeX Theory - How Symbols are Modeled Under the Hood

TeX knows thirteen kind of atoms in math formulas and build upon them, just like any formula in mathematics is built upon atomic ones.

The atoms are Ord, Op, Rel, Bin, Open, Close, Punct, Inner, Over, Under, Acc, Rad and Vcent.

Actually only the first eight are eventually considered, because the last five are converted to Ord ones.

Every atom has three fields: nucleus, subscript and superscript, which in turn can contain other atoms. Again the last five types are special in this account, because only the nucleus makes real sense.

Ord is for “ordinary” symbols such as variables. Op is for “operators” such as \sum or \log. Rel and Bin are for “relation” and “operation” symbols (such as < or +). Open and Close refer to fences such as parentheses. Punct for punctuation signs (the comma or semicolon).

An Inner atom is basically built from \left\right (and contains a subformula). Over results from \overline and Under from \underline. Acc from the primitive \mathaccent that's called by commands such as \bar or \tilde. Rad stems from the \radical primitive, internally used by \sqrt. Vcent is a special object built from \vcenter.

An Op atom can be followed by the commands \displaylimits, \limits or \nolimits; no specification is equivalent to adding \displaylimits: the subscript and superscript fields will be typeset below and above the operator when the formula itself is typeset in display styles (from $$...$$ or, in LaTeX parlance, \[...\] or similar environments) or besides the symbol in the other styles. There are also rules for possibly choosing a bigger version of the symbol in display style.

Any symbol or subformula can be made into an atom by specifying it as argument to \mathord, \mathop, \mathrel, \mathbin, \mathopen, \mathclose, \mathpunct or \mathinner. However \mathord{...} is equivalent to the simpler {...}.

Your particular question is about \bar and \overline. Something like \bar{abc} becomes (temporarily) an Acc atom; the accent is placed above the whole subformula, but has no wider version, so it ends up covering just the b. With \widetilde it is different, because the \mathaccent command points to a glyph that has wider variants (this information is encoded in the font). With \overline{abc}, instead, a rule is drawn above the whole subformula, making a single Over atom (that will be later considered as Ord as far as spacing is concerned).

After the input is processed assigning atom types according to internal tables that assign \sum to being Op, = as being Rel and so on, the whole math list so obtained is reprocessed in order to add the suitable math spacings after transforming Over, Under, Acc, Rad and Vcent atoms to Ord; it is then processed again in order to transform it into “boxes and glue”.

The whole Appendix G in the TeXbook is devoted to the rules for such processing.


The main concept that controls the math spacing is the math class. Compare the two expressions below

enter image description here

In the first every atom has mathclass 0 (\mathord) so gets no special spacing.

In the second, operators are specified with \mathop, infix binary operators are marked with \mathbin and relations aare marked with \mathrel and you see the classic TeX spacing.

\documentclass{article}

\begin{document}

\[{X}_{0}^{n}{+}\mathrm{cos}{x}{=}0\]

\[\mathop{X}_{0}^{n}\mathbin{+}\mathop{\mathrm{cos}}{x}\mathrel{=}0\]


\end{document}

Of course you do not normally have to classify symbols by hand like this, for example

= is declared in latex by

\DeclareMathSymbol{=}{\mathrel}{operators}{"3D}

so by default it is \mathrel

similarly

\DeclareMathSymbol{+}{\mathbin}{operators}{"2B}

declares that by default + is a \mathbin

and \cos is defined by

\def\cos{\mathop{\operator@font cos}\nolimits}

so if you use \cos rather than \mathrm{cos} then you get the extra operator spacing.