Looping over strings

If I modify your test a bit to make a shorter argument for tracing

\documentclass{article}

\def\test#1{{
\tracingonline=1
\tracingmacros=1
\markletters{#1}
}
\typeout{TYPEOUT: \markletters{#1}}
}
\begin{document}

\subsection*{Macro based on one by Tarass:}

\def\xloop<#1#2>{%
  \ifx\relax#1
    \else
      (#1)\xloop<#2>%
  \fi}  
\def\markletters#1{\xloop<#1\relax>}

\test{a bc}

\subsection*{Macro based on one by David Carlisle:}

\def\xloop#1{%
  \ifx\relax#1
    \else
      (#1)\expandafter\xloop%
  \fi}
\def\markletters#1{\xloop#1\relax}%

\test{a bc}


\subsection*{Macro based on one by Florent:}

\def\gobblechar{\let\xchar= }
\def\assignthencheck{\afterassignment\xloop\gobblechar}
\def\xloop{%
  \ifx\relax\xchar
      \let\next=\relax
    \else
      (\xchar)\let\next=\assignthencheck
  \fi
  \next}
\def\markletters#1{\assignthencheck#1\relax}

\test{a bc}


\end{document}

then the first test produces

\markletters #1->\xloop <#1\relax >
#1<-a bc

\xloop <#1#2>->\ifx \relax #1 \else (#1)\xloop <#2>\fi 
#1<-a
#2<- bc\relax 

\@nobreakfalse ->\global \let \if@nobreak \iffalse 

\xloop <#1#2>->\ifx \relax #1 \else (#1)\xloop <#2>\fi 
#1<-b
#2<-c\relax 

\xloop <#1#2>->\ifx \relax #1 \else (#1)\xloop <#2>\fi 
#1<-c
#2<-\relax 

\xloop <#1#2>->\ifx \relax #1 \else (#1)\xloop <#2>\fi 
#1<-\relax 
#2<-
TYPEOUT: (a)(b)(c) 

Here you see that the macro uses a delimited argument so the entire list is grabbed each time (all tokens up to > ) and the first token is handled, with the remaining tokens being re-inserted in the recursive call.

  1. as #1 works as a normal non-delimited argument it always drops spaces.
  2. As the whole thing works by expansion it works in expansion only contexts such as \write so you get TYPEOUT: (a)(b)(c)
  3. At each stage there is a \fi inserted after the loop so if you have 1000 entries there will be 1000 of these, and at some point you will over-fill the input stack.

The second block produces

\markletters #1->\xloop #1\relax 
#1<-a bc

\xloop #1->\ifx \relax #1 \else (#1)\expandafter \xloop \fi 
#1<-a

\@nobreakfalse ->\global \let \if@nobreak \iffalse 

\xloop #1->\ifx \relax #1 \else (#1)\expandafter \xloop \fi 
#1<-b

\xloop #1->\ifx \relax #1 \else (#1)\expandafter \xloop \fi 
#1<-c

\xloop #1->\ifx \relax #1 \else (#1)\expandafter \xloop \fi 
#1<-\relax 
TYPEOUT: (a)(b)(c) 

Here you can see that after the first macro the inner macro does not grab the whole list, but just the first token. Doing it this way avoids reparsing teh list, and overloading the input stack, but you need to expand the \fi (to nothing) before doing the recursive call as you do not have the possibility of putting the \fi after the list as in the first version. hence the \expandafter which forces \fi to expand before \xloop.

The third version produces

\markletters #1->\assignthencheck #1\relax 
#1<-a bc

\assignthencheck ->\afterassignment \xloop \gobblechar 

\gobblechar ->\let \xchar = 

\xloop ->\ifx \relax \xchar \let \next =\relax \else (\xchar )\let \next =\assi
gnthencheck \fi \next 

\@nobreakfalse ->\global \let \if@nobreak \iffalse 

\next ->\afterassignment \xloop \gobblechar 

\gobblechar ->\let \xchar = 

\xloop ->\ifx \relax \xchar \let \next =\relax \else (\xchar )\let \next =\assi
gnthencheck \fi \next 

\next ->\afterassignment \xloop \gobblechar 

\gobblechar ->\let \xchar = 

\xloop ->\ifx \relax \xchar \let \next =\relax \else (\xchar )\let \next =\assi
gnthencheck \fi \next 

\next ->\afterassignment \xloop \gobblechar 

\gobblechar ->\let \xchar = 

\xloop ->\ifx \relax \xchar \let \next =\relax \else (\xchar )\let \next =\assi
gnthencheck \fi \next 

\next ->\afterassignment \xloop \gobblechar 

\gobblechar ->\let \xchar = 

\xloop ->\ifx \relax \xchar \let \next =\relax \else (\xchar )\let \next =\assi
gnthencheck \fi \next 
! Undefined control sequence.

Here the item is grabbed by a \let assignment, this has the advantage of seeing space tokens, but as it does not work by expansion it fails in the \typeout.


There are two differences between the first solution and the other two: The second and third solution both use tail recursion, in contrast to the first one. And, related to this, the first solution has to copy the argument from iteration to iteration, while the other two solutions pass over the argument just once, processing character by character without copying the rest.

First solution. In a conventional programming language the if-then-else-fi would be read completely and only then the code of one of the branches would be executed. In TeX, the \fi remains in the input until expanded. So the first loop produces something like

\ifx\relax H\else(H)\xloop<ello...>\fi
(H)\ifx\relax e\else(e)\xloop<llo...>\fi\fi
(H)(e)\ifx\relax l\else(l)\xloop<lo...>\fi\fi\fi

Note how the \fis pile up at the end.

Second solution. This is the most compact one. The \expandafter changes the order of expansions: \fi is expanded first, so the \ifx gets finished before continuing with the next \xloop.

\xloop Hello...\relax
\ifx\relax H\else (H)\expandafter\xloop\fi ello...\relax
(H)\expandafter\xloop\fi ello...\relax
(H)\xloop ello...\relax
(H)\ifx\relax e\else (e)\expandafter\xloop\fi llo...\relax
(H)(e)\expandafter\xloop\fi llo...\relax
(H)(e)\xloop llo...\relax

Third solution. \afterassignment first "assigns" the next character to a macro before testing it. Tail recursion is obtained by letting \next point to the code that is to be executed after the closing \fi.

\assignthencheck Hello...\relax
\xloop ello...\relax % \xchar is set to H
\ifx\relax\xchar\let\next=\relax\else(\xchar)\let\next=\assignthencheck\fi\next ello...\relax
(\xchar)\let\next=\assignthencheck\fi\next ello...\relax
(H)\next ello...\relax % \next points to the code of \assignthencheck
(H)\xloop llo...\relax % \xchar is set to e
(H)\ifx\relax\xchar\let\next=\relax\else(\xchar)\let\next=\assignthencheck\fi\next llo...\relax
(H)(\xchar)\let\next=\assignthencheck\fi\next llo...\relax
(H)(e)\next llo...\relax % \next points to the code of \assignthencheck

Edit: To illustrate what David and Enrico mean by the first two solutions being fully expandable and the third one not, try

\edef\resultexpanded{\markletters{Hello World!}}

\edef expands the second argument until it encounters non-expandable TeX primitives (or characters) and then uses the result to define \resultexpanded. For the first two versions of markletters, \resultexpanded is defined as

(H)(e)(l)(l)(o)(W)(o)(r)(l)(d)(!)

whereas in the third case it contains

\afterassignment\let\next=\relax\next\let\xchar=Hello World!\relax

which is the non-expandable rest.


The solutions by Tarass and David Carlisle are not that different. The latter is more efficient, because it removes the conditional before going to the next cycle, whereas the former nests them and, at each cycle, a \fi is added to the stack: a (very) long string could cause memory overflow.

The other difference, that is, the delimiters <>, is irrelevant as they play no role whatsoever.

Ignoring spaces is the price to pay for expandability of the macro: this means that the result is “computed” directly by macro expansion.

The third solution uses a different approach: instead of absorbing the next item as the argument to a macro, which is the cause for space gobbling, the control sequence is let to the next item, which is then removed, until arriving at \relax.

Note that you can't use this for storing the processed token list, because all cycles use \xchar for printing the item.

A further solution with xparse and expl3:

\documentclass{article}
\usepackage{xparse}

\ExplSyntaxOn
\NewDocumentCommand{\markletters}{om}
 {
  \IfNoValueTF{#1}
   {
    \kessels_markletters:nn { #2 } { \tl_use:N \l_kessels_marked_letters_tl }
   }
   {
    \kessels_markletters:nn { #2 } { \tl_set_eq:NN #1 \l_kessels_marked_letters_tl }
   }
 }

\tl_new:N \l_kessels_unmarked_letters_tl
\tl_new:N \l_kessels_marked_letters_tl

\cs_new_protected:Nn \kessels_markletters:nn
 {
  \tl_set:Nn \l_kessels_unmarked_letters_tl { #1 }
  \tl_replace_all:Nnn \l_kessels_unmarked_letters_tl { ~ } { \textvisiblespace }
  \tl_clear:N \l_kessels_marked_letters_tl
  \tl_map_inline:Nn \l_kessels_unmarked_letters_tl
   {
    \tl_put_right:Nn \l_kessels_marked_letters_tl { (##1) }
   }
  #2
 }
\ExplSyntaxOff

\begin{document}

\markletters{Hello World!}

\markletters[\foo]{Hello World!}

\texttt{\meaning\foo}

\end{document}

Spaces are first replaced with \textvisiblespace (you can use \nonbreakingspace if you prefer).

If the optional argument is missing, the processed string is printed, otherwise it is stored in the macro given as argument (no check for it not being defined is performed, but it can be easily added).

enter image description here

Here's a simplified version, just for printing the result. As you see, the code is much shorter and each command does a well stated task. It does exactly the same as Florent's code.

\documentclass{article}
\usepackage{xparse}

\ExplSyntaxOn
\NewDocumentCommand{\markletters}{m}
 {
  % save the input in a variable
  \tl_set:Nn \l_kessels_unmarked_letters_tl { #1 }
  % replace spaces with \textvisiblespace
  \tl_replace_all:Nnn \l_kessels_unmarked_letters_tl { ~ } { \textvisiblespace }
  % map the input surrounding each item by parentheses
  \tl_map_inline:Nn \l_kessels_unmarked_letters_tl { (##1) }
 }
\tl_new:N \l_kessels_unmarked_letters_tl
\ExplSyntaxOff

\begin{document}

\markletters{Hello World!}

\end{document}