\tl_replace_all:Nnn recurse subgroups

While Manuel was commenting, I remembered \regex_replace_all:nnN, where the first argument contains the token to be replaced by the 2nd argument in the token variable given as 3rd. argument.

\documentclass{article}


\usepackage{l3regex}

\begin{document}

\ExplSyntaxOn

\tl_set:Nn \l_tmpa_tl { abc{ab{abc}c} }

Before:\space \l_tmpa_tl \par

\regex_replace_all:nnN  {b} {d} \l_tmpa_tl

After:\space \l_tmpa_tl

\ExplSyntaxOff

\end{document}

enter image description here


I don't know if this is what you mean now I've seen your discussion in comments, but it certainly works for the cases in the original question.

This code does not rely on any package or code explicitly designated as experimental by the L3 developers.

Note, however, that I have no idea what I am doing.

Caveat emptor ....

Counts are included to show that the grouping within the token lists is preserved e.g. that a{bcde}f is counted as 3 tokens and not 6 or 8 when the token list is reassembled. During processing, the string is obviously counted as having more tokens since this is necessary to search and replace within the groups.

The result of the replacement operation is stored in a globally set variable \g_henri_mod_tl.

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{expl3}

\begin{document}

\ExplSyntaxOn
\str_new:N \l_henri_mod_str
\int_new:N \l_henri_tmpa_int
\int_new:N \l_henri_tmpb_int
\int_new:N \l_henri_tmpc_int
\tl_new:N \g_henri_mod_tl

\cs_new_protected:Npn \henri_replace_all:nnn #1 #2 #3
{
  \group_begin:
    \str_clear:N \l_henri_mod_str
    \int_zero:N \l_henri_tmpa_int
    \str_set:Nn \l_tmpa_str { #1 }
    \str_set:Nn \l_tmpb_str { #2 }
    \int_set:Nn \l_henri_tmpb_int { \str_count:N \l_tmpa_str }
    \int_set:Nn \l_henri_tmpc_int { \str_count:N \l_tmpb_str }
    \int_compare:nTF { \l_henri_tmpc_int = 1 }
    {
      \int_do_until:nn { \l_henri_tmpb_int = \l_henri_tmpa_int }
      {
        \str_if_eq_x:nnTF { #2 } { \str_head:N \l_tmpa_str }
        {
          \str_put_right:Nx \l_henri_mod_str { #3 }
        }
        {
          \str_put_right:Nx \l_henri_mod_str { \str_head:N \l_tmpa_str }
        }
        \str_set:Nx \l_tmpa_str { \str_tail:N \l_tmpa_str }
        \int_incr:N \l_henri_tmpa_int
      }
    }
    {
      \int_do_until:nn { \l_henri_tmpb_int = \l_henri_tmpa_int }
      {
        \str_if_eq_x:nnTF { #2 } { \str_range:Nnn \l_tmpa_str { 1 } { \l_henri_tmpc_int } }
        {
          \str_put_right:Nx \l_henri_mod_str { #3 }
          \int_set:Nn \l_tmpa_int { \str_count:N \l_tmpa_str }
          \str_set:Nx \l_tmpa_str { \str_range:Nnn \l_tmpa_str { 1 + \l_henri_tmpc_int } { \l_tmpa_int } }
          \int_add:Nn \l_henri_tmpa_int { \l_henri_tmpc_int }
        }
        {
          \str_put_right:Nx \l_henri_mod_str { \str_head:N \l_tmpa_str }
          \str_set:Nx \l_tmpa_str { \str_tail:N \l_tmpa_str }
          \int_incr:N \l_henri_tmpa_int
        }
      }
    }
    \tl_gset_rescan:Nno \g_henri_mod_tl {} { \l_henri_mod_str }
  \group_end:
}

\cs_generate_variant:Nn \henri_replace_all:nnn { Vnn }

\henri_replace_all:nnn { abc{ab{abc}c} } { b } { d }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\tl_set:Nn \l_tmpa_tl { abc{ab{abc}c} }
\henri_replace_all:Vnn \l_tmpa_tl { b } { d }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { {a=b}\,{[]} } { [ } { \sqsubseteq }
$\g_henri_mod_tl$ {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { gydihŵs } { y } { w }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { abc{ab{abc}c} } { bc } { doodle }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { {a=[b}\,{[]} } { =[ } { \sqsubseteq }
$\g_henri_mod_tl$ {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\ExplSyntaxOff

\end{document}

token list manipulation

EDITED to deal with searches for strings of more than one character. This can correctly substitute =[ with \sqsubseteq as mentioned in a comment.

EDIT

It is possible to define a further command sequence which obeys the target syntax. However, it should be noticed that this will not work in all cases. In particular, it fails to work correctly with gwdihŵs.

The idea is just to do the replacement and then spit out the global variable. I am not sure that it is correct to call the macro \tl_replace_allrecursive:nnn as this lacks any appropriate prefix, but if the macro is for purely personal use and you're not worried about future breakage, that's up to you. Personally, I'd call it something like \henri_replace_allrecursive:nnn and be safe since I don't see anything to be gained from violating the naming rules.

\cs_new_protected:Npn \tl_replace_allrecursive:nnn #1 #2 #3
{
  \henri_replace_all:nnn { #1 } { #2 } { #3 }
  \g_henri_mod_tl
}

Then we can say

\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { abc{ab{abc}c} } { b } { d } }
\l_tmpa_tl \par

\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { {a=b}\,{[]} } { [ } { \sqsubseteq }  }
$\l_tmpa_tl$ \par

\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { abc{ab{abc}c} } { bc } { doodle } }
\l_tmpa_tl \par
%
\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { {a=[b}\,{[]} } {  =[ } { \sqsubseteq } }
$\l_tmpa_tl$ \par

and, comparing with the original results, we can see that the replacements are as expected (less gwdihŵs, of course).

comparison

I take it the count of tokens here is irrelevant since everything is being expanded.

Complete code:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{expl3}

\begin{document}

\ExplSyntaxOn
\str_new:N \l_henri_mod_str
\int_new:N \l_henri_tmpa_int
\int_new:N \l_henri_tmpb_int
\int_new:N \l_henri_tmpc_int
\tl_new:N \g_henri_mod_tl

\cs_new_protected:Npn \henri_replace_all:nnn #1 #2 #3
{
  \group_begin:
    \str_clear:N \l_henri_mod_str
    \int_zero:N \l_henri_tmpa_int
    \str_set:Nn \l_tmpa_str { #1 }
    \str_set:Nn \l_tmpb_str { #2 }
    \int_set:Nn \l_henri_tmpb_int { \str_count:N \l_tmpa_str }
    \int_set:Nn \l_henri_tmpc_int { \str_count:N \l_tmpb_str }
    \int_compare:nTF { \l_henri_tmpc_int = 1 }
    {
      \int_do_until:nn { \l_henri_tmpb_int = \l_henri_tmpa_int }
      {
        \str_if_eq_x:nnTF { #2 } { \str_head:N \l_tmpa_str }
        {
          \str_put_right:Nx \l_henri_mod_str { #3 }
        }
        {
          \str_put_right:Nx \l_henri_mod_str { \str_head:N \l_tmpa_str }
        }
        \str_set:Nx \l_tmpa_str { \str_tail:N \l_tmpa_str }
        \int_incr:N \l_henri_tmpa_int
      }
    }
    {
      \int_do_until:nn { \l_henri_tmpb_int = \l_henri_tmpa_int }
      {
        \str_if_eq_x:nnTF { #2 } { \str_range:Nnn \l_tmpa_str { 1 } { \l_henri_tmpc_int } }
        {
          \str_put_right:Nx \l_henri_mod_str { #3 }
          \int_set:Nn \l_tmpa_int { \str_count:N \l_tmpa_str }
          \str_set:Nx \l_tmpa_str { \str_range:Nnn \l_tmpa_str { 1 + \l_henri_tmpc_int } { \l_tmpa_int } }
          \int_add:Nn \l_henri_tmpa_int { \l_henri_tmpc_int }
        }
        {
          \str_put_right:Nx \l_henri_mod_str { \str_head:N \l_tmpa_str }
          \str_set:Nx \l_tmpa_str { \str_tail:N \l_tmpa_str }
          \int_incr:N \l_henri_tmpa_int
        }
      }
    }
    \tl_gset_rescan:Nno \g_henri_mod_tl {} { \l_henri_mod_str }
  \group_end:
}

\cs_generate_variant:Nn \henri_replace_all:nnn { Vnn }

\cs_new_protected:Npn \tl_replace_allrecursive:nnn #1 #2 #3
{
  \henri_replace_all:nnn { #1 } { #2 } { #3 }
  \g_henri_mod_tl
}

\verb|\henri_replace_all:nnn {  } {  } {  } \g_henri_mod_tl|
\smallskip\par

\henri_replace_all:nnn { abc{ab{abc}c} } { b } { d }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\tl_set:Nn \l_tmpa_tl { abc{ab{abc}c} }
\henri_replace_all:Vnn \l_tmpa_tl { b } { d }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { {a=b}\,{[]} } { [ } { \sqsubseteq }
$\g_henri_mod_tl$ {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { gydihŵs } { y } { w }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { abc{ab{abc}c} } { bc } { doodle }
\g_henri_mod_tl {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\henri_replace_all:nnn { {a=[b}\,{[]} } { =[ } { \sqsubseteq }
$\g_henri_mod_tl$ {} ~ has ~ \tl_count:N \g_henri_mod_tl {} ~ tokens.\par

\bigskip\par
\verb|\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { ... } { ... } { ... } }|
\smallskip\par

\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { abc{ab{abc}c} } { b } { d } }
\l_tmpa_tl \par

\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { {a=b}\,{[]} } { [ } { \sqsubseteq }  }
$\l_tmpa_tl$ \par

\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { abc{ab{abc}c} } { bc } { doodle } }
\l_tmpa_tl \par
%
\tl_set:Nx \l_tmpa_tl { \tl_replace_allrecursive:nnn { {a=[b}\,{[]} } {  =[ } { \sqsubseteq } }
$\l_tmpa_tl$ \par


\ExplSyntaxOff

\end{document}

If you need full expandable solution of replacing text including text in groups, here is an idea:

\def\repl#1{\replA #1{\end}}
\def\replA#1#{\replB#1\end}
\def\replB#1{\ifx\end#1\expandafter\replC \else\replX{#1}\expandafter\replB\fi}
\def\replC#1{\ifx\end#1\empty\else{\repl{#1}}\expandafter\replA\fi}
\def\replX#1{\ifx#1bd\else#1\fi}

\message{....\repl{abc{aabc{bb}}cb}}

\bye

The \message command expands its argument and prints: ....adc{aadc{dd}}cd.

This code ignores spaces between tokens. The space handling was not specified in your task. If you need to keep spaces unchanged then the code needs to be a slight more complicated (about five more lines).

Edit My estimation was not exact. The code which accepts spaces needs only three more lines:

\def\repl#1{\replA #1{\end}}
\def\replA#1#{\replD#1 {\end} }
\def\replD#1 #2 {\replB#1\end
   \ifx\end#2\expandafter\replC\else\space\fihere\replD#2 \fi}
\def\replB#1{\ifx\end#1\else\replX{#1}\expandafter\replB\fi}
\def\replC#1{\ifx\end#1\empty\else{\repl{#1}}\expandafter\replA\fi}
\def\replX#1{\ifx#1bd\else#1\fi}
\def\fihere#1\fi{\fi#1}

\message{....\repl{ab c{ aa bc {bb}}cb}}

\bye