Trimming whitespace around text (like LTRIM, RTRIM and TRIM)

Trimming all of the explicit spaces around input is certainly doable. There are a number of approaches about this problem: I would go with the one Bruno Le Floch wrote for expl3 as \tl_trim_spaces:n. That can be used by doing

\usepackage{expl3}
\ExplSyntaxOn
\cs_new_eq:NN \trimspaces \tl_trim_spaces:n
\ExplSyntaxOff

Alternatively, the implementation can be included directly in the source and thus avoid any dependency:

\documentclass{article}
\makeatletter
\long\def\trim@spaces#1{%
  \@@trim@spaces{\q@mark#1}%
}
\def\@tempa#1{%
  \long\def\@@trim@spaces##1{%
    \@@trim@spaces@i##1\q@nil\q@mark#1{}\q@mark
      \@@trim@spaces@ii
      \@@trim@spaces@iii
      #1\q@nil
      \@@trim@spaces@iv
      \q@stop
  }%
  \long\def\@@trim@spaces@i##1\q@mark#1##2\q@mark##3{%
    ##3%
    \@@trim@spaces@i
    \q@mark
    ##2%
    \q@mark#1{##1}%
  }%
  \long\def\@@trim@spaces@ii\@@trim@spaces@i\q@mark\q@mark##1{%
    \@@trim@spaces@iii
    ##1%
  }%
  \long\def\@@trim@spaces@iii##1#1\q@nil##2{%
    ##2%
    ##1\q@nil
    \@@trim@spaces@iii
  }%
  \long\def\@@trim@spaces@iv##1\q@nil##2\q@stop{%
    \unexpanded\expandafter{\@gobble##1}%
  }%  
}
\@tempa{ }
\def\test{ foo }
\edef\test{\expandafter\trim@spaces\expandafter{\test}}
\show\test

This will remove all of the spaces from the ends of the input, even is you do something tricky like \edef\test{ \space foo \space} to start with (so there are multiple spaces at both ends). (If you are happy to limit yourself to this case, then xparse offers the \TrimSpaces post-processor for arguments using this method.)

The way the above works is that there are two loops: one for spaces at the start of the input (\@@trim@spaces@i), a second for those as the end (\@@trim@spaces@iii). First, \@@trim@spaces sets things up such that the correct markers are in place. In the 'leading' step, \@@trim@spaces@i matches an argument consisting of \q@mark followed by a space (the space itself is discarded). If there are more spaces then #1 and #3 will be empty and #2 will be the remaining input, meaning that \@@trim@spaces@i will be called again with the remaining input. On the other hand, if there are no spaces left in the input then #2 matches the empty input set up by \@@trim@spaces, #1 is the user input with all leading spaces removed and #3 is \@@trim@spaces@ii. The latter stops the loop and hands off to \@@trim@spaces@iii (a \q@mark is left on the front of the user input to prevent any loss of braces: see later). In this second loop, and spaces at the end of the input will appear just before \q@nil. This pattern is matched by the argument to \@@trim@spaces@iii. If there was a trailing space in the input then #1 is the user input with the space removed (but still with a leading \q@mark) and #2 is \@@trim@spaces@iii, leading to a loop. However, when the trailing spaces are exhausted, #2 is \@@trim@spaces@iv and #1 is the \q@mark <user input>\q@nil\@@trim@spaces@iii. The \q@nil\@@trim@spaces@iii is removed by the argument patter for \@@trim@spaces@iv before the leading \q@amrk is stripped off by \@gobble (with the \unexpanded preventing further expansion).

Note that the above uses e-TeX to allow it to prevent further expansion inside an \edef or similar. If the extensions are not available, change the last auxiliary to

  \long\def\@@trim@spaces@iv##1\q@nil##2\q@stop{%
    \@gobble##1%
  }%

with the proviso that this will mean that you do have to be cautious what is passed through.

A second thing to note is that there are some 'special' tokens in the above, for example \q@nil, that are used to match the macro argument patterns and so can't be in the input. That really should be okay with 'text', but you could use something even more obscure like \catcode`\Q=3 then Q (math shift catcode) if you wanted to.

Removing the other items requested would mean searching for all of them separately. That sounds quite tricky in the case of \hspace/\hskip as presumably the spacing could be given in any valid units, even before we worry about things like

\def\foo{10 pt }
\hskip\foo

As you may know, dealing with group tokens is tricky at the best of times, so finding an empty group could also be hard. (I guess you'd need to use a loop: grab each token in the input, see if it's empty and if it's not add it to the 'keep' pile.)

Moreover, I think that this sort of input is pretty unlikely in real input. Trimming explicit spaces make sense, but I am not convinced about the other items (unless there is some particular case here where there is a good chance of picking up the other items).


I definitely advise you to use Joseph's answer in practical use cases, even though it only removes explicit spaces, and not things like or \hskip.

Trimming such spaces from the right is straightforward (to some extent): \unskip, then repeat if the \lastskip is non-zero. This can however be fooled if there is a skip of size 0pt.

Trimming \hspace and friends from the left, also within macros forces us to manually perform all the macro expansions. Even worse: since \hspace uses \@ifnextchar, we need to also perform assignments. See code below.

Note that \hspace* uses TeX's primitives \vrule and \penalty for which I have implemented no support. They will stop both \trimleft and \trimright. I see how to fix that for \trimleft (at a dire cost), but not for \trimright, since TeX has no \lastrule. LuaTeX could help.

\begingroup
  %
  % This plain TeX code uses the prefix "tsp", and defines
  % \trim, \trimleft, and \trimright.
  %
  \catcode`@=11
  \long\gdef\trim#1{\trimleft{\trimright{#1}}}
  %
  % Trimming spaces on the right is done by repeatedly calling \unskip
  % until \lastskip is zero.  We start with \hskip0pt\relax to stop
  % \trimright from trimming spaces _before_ #1 in case this only
  % contains spaces.
  %
  \long\gdef\trimright#1{\hskip0pt\relax #1\tsp@right}
  \gdef\tsp@right
    {\unskip\ifdim0pt=\lastskip\else\expandafter\tsp@right\fi}
  %
  % Trimming spaces on the left is done by repeatedly using \futurelet
  % to test the first token, and dispatching depending on what is found.
  % Expandable tokens are expanded; most assignments are performed;
  % spaces are ignored; groups are entered.  The loop ends when
  % encountering \tsp@left@end.
  %
  \long\gdef\trimleft#1{\tsp@left#1\tsp@left@end}
  \global\let\tsp@left@end\relax
  \gdef\tsp@left{\expandafter\tsp@left@look}
  \gdef\tsp@left@look{\futurelet\tsp@token\tsp@left@test}
  \gdef\tsp@left@test
    {%
      \typeout{\meaning\tsp@token}%
      \expandafter\ifx\noexpand\tsp@token\tsp@token
        \expandafter\@secondoftwo
      \else
        \expandafter\@firstoftwo
      \fi
      {% Expandable token => expand again.
        \let\tsp@next\tsp@left
      }%
      {%
        \ifcat\tsp@token\relax
          % Non-expandable primitive: build \tsp@<meaning>.
          % Note that primitives for which I haven't defined
          % \tsp@<meaning> just give \relax, which stops
          % trimming cleanly.
          \begingroup
            \escapechar-1%
            \global\expandafter\let\expandafter\tsp@next
              \csname tsp@\meaning\tsp@token\endcsname
          \endgroup
        \else
          % Character token.
          \ifcat\tsp@token\bgroup % Begin-group: do; continue trimming
            \bgroup\let\tsp@next\tsp@gobble@token
          \else
            \ifcat\tsp@token\egroup % End-group: do; continue trimming
              \egroup\let\tsp@next\tsp@gobble@token
            \else
              \ifcat\tsp@token\space % Space: remove; continue trimming
                \let\tsp@next\tsp@gobble@token
              \else % Anything else: stop trimming
                \let\tsp@next\relax
              \fi
            \fi
          \fi
        \fi
      }%
      \tsp@next
    }%
  \gdef\tsp@gobble@token{\afterassignment\tsp@left\let\tsp@token= }
  %
  % Helpers for defining primitives.
  %
  \long\gdef\tsp@swap#1{#1\tsp@gobble@token}
  \gdef\tsp@assignment{\afterassignment\tsp@left}
  %
  % Various primitives
  %
  \global \let \tsp@unskip     \tsp@gobble@token
  \global \expandafter \let \csname tsp@ \endcsname \tsp@gobble@token
  \global \let \tsp@begingroup \tsp@swap
  \global \let \tsp@endgroup   \tsp@swap
  \global \let \tsp@def        \tsp@assignment
  \global \let \tsp@edef       \tsp@assignment
  \global \let \tsp@gdef       \tsp@assignment
  \global \let \tsp@xdef       \tsp@assignment
  \global \let \tsp@let        \tsp@assignment
  \global \let \tsp@futurelet  \tsp@assignment
  \global \let \tsp@global     \tsp@assignment
  \global \let \tsp@long       \tsp@assignment
  \global \let \tsp@protected  \tsp@assignment
  \gdef\tsp@hskip#1{\begingroup\afterassignment\tsp@hskip@\skip0= }
  \gdef\tsp@hskip@{\endgroup\tsp@left}
  %
  % We must end when seeing \tsp@left@end (normally \relax)
  %
  \long\gdef\tsp@relax#1%
    {%
      \begingroup
        \def\tsp@left@end{\tsp@left@end}%
        \expandafter
      \endgroup
      \ifx#1\tsp@left@end
      \else
        \expandafter\tsp@left
      \fi
    }
\endgroup

\documentclass{article}
\begin{document}
Without \verb|\trim|:\par\medskip
\def\firstname{FirstName}\def\lastname{LastName}
\edef\fullname{\firstname\ \lastname}\fbox{\fullname}

\def\firstname{FirstName}\def\lastname{}
\edef\fullname{\firstname\ \lastname}\fbox{\fullname}

\def\firstname{}\def\lastname{LastName}
\edef\fullname{\firstname\ \lastname}\fbox{\fullname}

\bigskip

With \verb|\trim|:\par\medskip
\def\firstname{FirstName}\def\lastname{LastName}
\edef\fullname{\firstname\ \lastname}\fbox{\trim{\fullname}}

\def\firstname{FirstName}\def\lastname{}
\edef\fullname{\firstname\ \lastname}\fbox{\trim{\fullname}}

\def\firstname{}\def\lastname{LastName}
\edef\fullname{\firstname\ \lastname}\fbox{\trim{\fullname}}

\end{document}

Tags:

Macros

Spacing