Basics of parsing

The accepted TeX solution here includes several problems. One of them is that whole argument is read at each step, no only one token. The second is that the recursive loop in the accepted code generates the nested \if...\fi construction which is very limited in TeX.

So, I show here the common scanner declared by TeX primitives without the problems described above. The scanning of the spaces is allowed, but braces are not allowed (for simplicity).

\def\scan#1{\scanA#1\end}
\def\scanA{\futurelet\next\scanB}
\def\scanB{\expandafter\ifx\space\next \expandafter\scanC \else \expandafter\scanE \fi}
\def\scanC{\afterassignment\scanD \let\next= }
\def\scanD{\scanE{ }}
\def\scanE#1{\ifx\end#1\else
   (#1)% <- The processing over one token is here
   \expandafter \scanA \fi
}

\scan{abcdef ghijkl mno}

\bye

Edit: If you leave the space behavior unchanged (i.e. they are ignored), then the code is much more simple:

\def\scan#1{\scanA#1\end}
\def\scanA#1{\ifx\end#1\else
   (#1)% <- The processing over one token is here
   \expandafter \scanA \fi
}

Scanning one token at a time requires at least distinguishing whether the scanned token is a space or a left brace. This is because you can't remove the scanned token with a one parameter macro in those cases.

First of all, let's see what \futurelet does; your \futurelet\token\scanB tells TeX to look at what token follows \scanB, without removing it, then making a \let\token=<scanned token> assignment and finally “seeing” \scanB, which should make decisions based on the value of \token.

For terminating the scanning, you have to place some special token at the end; this token is frequently a “quark”, say

\def\quark{\quark}

so \scanB can do \ifx\token\quark and, in this case, stop the recursion. Let's put into play what we have till now:

\makeatletter
\def\scan@quark{\scan@quark}% if we find it in bad places, we'll know!
\newcommand\scan[1]{\futurelet\@let@token\scan@aux@i#1\scan@quark}
\def\scan@aux@i{%
  \ifx\@let@token\scan@quark
    \expandafter\@gobbletwo
  \else
    \expandafter\@firstofone
  \fi
  {\scan@aux@ii}%
}

The macro \scan@aux@ii should now go on with other tests. I used \@gobbletwo in the “true” case so to gobble \scan@aux@ii and \scan@quark.

If instead you want just to split the input at a certain token, a better approach is using delimited arguments: you can find several examples on the site. With expl3 it's quite easy, because there are built in functions that do the job.

So, say you have an input such as \word{abc^def^ghi} that you want to print with alternating colors. Here's an implementation:

\documentclass{article}
\usepackage{xparse,xcolor}

\ExplSyntaxOn
\NewDocumentCommand{\word}{m}
 {
  \kormylo_word:n { #1 }
 }

\seq_new:N \l_kormylo_word_fragment_seq
\bool_new:N \l_kormylo_second_color_bool

\cs_new_protected:Npn \kormylo_word:n #1
 {
  \kormylo_change_color:
  \seq_set_split:Nnn \l_kormylo_word_fragment_seq { ^ } { #1 }
  \seq_use:Nn \l_kormylo_word_fragment_seq { \kormylo_change_color: }
 }

\cs_new_protected:Npn \kormylo_change_color:
 {
  \bool_if:NTF \l_kormylo_second_color_bool
   { \color{second} \bool_set_false:N \l_kormylo_second_color_bool }
   { \color{first} \bool_set_true:N \l_kormylo_second_color_bool }
 }
\ExplSyntaxOff

\colorlet{first}{black}
\colorlet{second}{red}

\begin{document}

\word{su^per^cal^i^frag^i^lis^tic^ex^pi^al^i^do^cious}

\end{document}

Note that you can use spaces around the separator token for better input, such spaces will be disregarded.

enter image description here

The macros could be extended to allow spaces in the input: just split at spaces and do a mapping.

\documentclass{article}
\usepackage{xparse,xcolor}

\ExplSyntaxOn
\NewDocumentCommand{\words}{m}
 {
  \kormylo_words:n { #1 }
 }

\seq_new:N \l_kormylo_word_seq
\seq_new:N \l_kormylo_word_fragment_seq
\bool_new:N \l_kormylo_second_color_bool

\cs_new_protected:Npn \kormylo_words:n #1
 {
  \seq_set_split:Nnn \l_kormylo_word_seq { ~ } { #1 }
  \seq_map_inline:Nn \l_kormylo_word_seq
   {
    \kormylo_word:n { ##1 }
    \c_space_tl
   }
 }

\cs_new_protected:Npn \kormylo_word:n #1
 {
  \kormylo_change_color:
  \seq_set_split:Nnn \l_kormylo_word_fragment_seq { ^ } { #1 }
  \seq_use:Nn \l_kormylo_word_fragment_seq { \kormylo_change_color: }
 }

\cs_new_protected:Npn \kormylo_change_color:
 {
  \bool_if:NTF \l_kormylo_second_color_bool
   { \color{second} \bool_set_false:N \l_kormylo_second_color_bool }
   { \color{first} \bool_set_true:N \l_kormylo_second_color_bool }
 }
\ExplSyntaxOff

\colorlet{first}{black}
\colorlet{second}{red}

\begin{document}

\words{su^per^cal^i^frag^i^lis^tic^ex^pi^al^i^do^cious syl^la^ble con^cate^na^tion}

\end{document}

enter image description here

For the first answers, I assume that you want to scan pure text, without any groups commands etc. and without any spaces.

This is the (mainly) TeX solution.

\documentclass{minimal}
\begin{document}
\def\END{}
\def\ENDEND{}
\newcommand*\scan[1]{\scani #1\END\ENDEND}
\def\scani#1#2\ENDEND{%
  \ifx\END#1%
  \else%
    (#1)%
    \scani#2\ENDEND%
  \fi
}
\scan{test}
\end{document}

And here is the LaTeX3 version.

\documentclass{minimal}
\usepackage{expl3}
\begin{document}
\ExplSyntaxOn
\newcommand*\scan[1]
  {
    \tl_map_inline:nn {#1} { (##1) }
  }
\ExplSyntaxOff
\scan{test}
\end{document}

Both will output

(t)(e)(s)(t)

Dealing with spaces, it somewhat tricky. I have a TeX solution here (from Usenet times), but do not understand it myself.

For LaTeX3, here are solutions that can cope with spaces: LaTeX3: tl_map with spaces

Or you use my version, which is

\documentclass{minimal}
\usepackage{expl3}
\begin{document}
\ExplSyntaxOn
\newcommand*\scan[1]
  {
    \__scanloop: #1 \q_recursion_stop
  }
\cs_new:Nn \__scanloop:
  {
    \peek_meaning_remove:NTF \q_recursion_stop
      {}
      {
    \peek_charcode_remove:NTF \c_space_token
      { 
        (~)
        \__scanloop:
      }
    % else
      {
        \__scanloop_aux:
      }}
  }
\cs_new:Npn \__scanloop_aux: #1
  {
    ( #1 )
    \__scanloop:
  }
\ExplSyntaxOff
\scan{test with spaces}
\end{document}

which will output

(t)(e)(s)(t)( )(w)(i)(t)(h)( )(s)(p)(a)(c)(e)(s)

Basics of parsing

Tags:

Parsing

Macros

Expansion

Tex Core

Token Lists

Related

Recent Posts