Parser for pure LaTeX

If you write a parser you can define the subset of latex that you support. (There isn't really a useful definition of "Pure LaTeX with no primitives".)

For instance MathJax has a parser for a subset of LaTeX math markup, written in JavaScript, and LaTeXML has a parser for almost complete TeX written in perl, which does not include any TeX execution. LaTeXML's parser is perhaps the closest to what you ask, as far as I understand the question. https://github.com/brucemiller/LaTeXML

Here is an example that only uses commands defined in core latex. (The shortvrb package is part of the base LaTeX2e release, so it is as fundamental part of latex as say \section which is defined in article class from the same base release files.)

\documentclass{article}
\usepackage{shortvrb}


\begin{document}

\MakeShortVerb\*

 {\bfseries *}{* some text}

\DeleteShortVerb\*

 {\bfseries *}{* some text}

\end{document}

Note that it is not possible to statically assign any tokenisation to *}{* in the first case it produces the two character tokens }{ in the second case it produces two character tokens ** (the first one being bold).

It would be reasonable to produce a LaTeX parser for a subset of the language that did not include this kind of construct, but you need to define the subset it isn't enough to say "not plain TeX or primitives" there are plain constructs that can be easily parsed, and there are LaTeX constructions that can not be parsed in general without access to a full tex typesetting system.

Tags:

Parsing