Trying to understand keyval.sty (specifically \KV@@sp@def)

The \@tempa thing you see in that piece of code is just a trick to get a space token in the definitions. That is necessary because when TeX is reading input it ignores spaces after multi-letter control sequences (like \hello and \x, but not like \$ or active characters like ~, assuming “usual” catcodes), so all of:

\def\tmp#1{}
\def \tmp #1{}
\def      \tmp      #1{}

will do the same thing because the spaces after \def and \tmp are ignored. However in the piece of code you show keyval needs some spaces in places where TeX would ignore them. To have a space token in those places, a common trick is to define a temporary macro (here \@tempa) and use #1 where you'd want a space, then you just need to use the temporary macro with the space as argument and all #1 (not ##1) will be replaced by the space token.

To illustrate, compare the output of:

1: Token is \meaning !

\def\tmpa#1{Token is \meaning#1!}
2: \tmpa{ }

enter image description here

In 1 the space character after \meaning is ignored by TeX, and it effectively does \meaning! (and prints the character !), while in 2 the space is inserted after and TeX does \meaning<space>, printing blank space.


keyval.dtx says:

\KV@@sp@def{⟨cmd⟩}{⟨token list⟩} is like \def, except that a space token at the beginning or end of ⟨token list⟩ is removed before making the assignment.

Let's look at the code:

\def\@tempa#1{%
\long\def\KV@@sp@def##1##2{%
  \futurelet\KV@tempa\KV@@sp@d##2\@nil\@nil#1\@nil\relax##1}%
\def\KV@@sp@d{%
  \ifx\KV@tempa\@sptoken
    \expandafter\KV@@sp@b
  \else
    \expandafter\KV@@sp@b\expandafter#1%
 \fi}%
\long\def\KV@@sp@b#1##1 \@nil{\KV@@sp@c##1}%
  }
\@tempa{ }
[...]
\long\def\KV@@sp@c#1\@nil#2\relax#3{\KV@toks@{#1}\edef#3{\the\KV@toks@}}

Carrying out \@tempa with a ⟨space token⟩ [ nested in curly braces to ensure that the corresponding space-character in the .tex-input-file will not be skipped during tokenization but will yield an explicit space-token=an explicit character-token of character-code 32—32 is the number of the code-point of the space-character in TeX's internal character-encoding-scheme which with traditional engines is ASCII and with XeTeX-based/LuaTeX-based engines is unicode— and category-code 10(space) ] as argument yields:

\long\def\KV@@sp@def#1#2{%
  \futurelet\KV@tempa\KV@@sp@d#2\@nil\@nil⟨space token⟩\@nil\relax#1}%
\def\KV@@sp@d{%
  \ifx\KV@tempa\@sptoken
    \expandafter\KV@@sp@b
  \else
    \expandafter\KV@@sp@b\expandafter⟨space token⟩%
 \fi}%
\long\def\KV@@sp@b⟨space token⟩#1⟨space token⟩\@nil{\KV@@sp@c#1}%
[...]
\long\def\KV@@sp@c#1\@nil#2\relax#3{\KV@toks@{#1}\edef#3{\the\KV@toks@}}

This is how the macros which form the \KV@@sp@def-mechanism get defined.

How does the \KV@@sp@def-mechanism work?

\KV@@sp@def{⟨cmd⟩}{⟨token list⟩}

yields:

\futurelet\KV@tempa\KV@@sp@d⟨token list⟩\@nil\@nil⟨space token⟩\@nil\relax⟨cmd⟩

The sequence \futurelet\KV@tempa, which comes from expanding \KV@@sp@def, assigns the meaning of the first token of #2/of ⟨token list⟩ to the control-word-token \KV@tempa.

Then \KV@@sp@d is carried out.

Basically \KV@@sp@d works as follows: If the meaning of \KV@tempa indicates that the first token of ⟨token list⟩ is a ⟨space token⟩, then call \KV@@sp@b. Otherwise call \KV@@sp@b⟨space token⟩. The \expandafters in \KV@@sp@d's definition are needed to make the \else or the \fi go away before \KV@@sp@b is carried out.

This way the next token behind the token \KV@@sp@b in any case is a ⟨space token⟩. Either it is the first token of ⟨token list⟩ or it is prepended to ⟨token list⟩ by \KV@@sp@d because ⟨token list⟩ does not have a leading ⟨space token⟩.

In other words: The cases of ⟨token list⟩ having a leading ⟨space token⟩ or not having a leading ⟨space token⟩ are settled by having \KV@@sp@d insert a leading ⟨space token⟩ in case ⟨token list⟩ does not have a leading ⟨space token⟩.

When \KV@@sp@d and its \ifx-thingie are done, you in any case have something like this:

\KV@@sp@b⟨space token⟩⟨token list with a leading space token spliced off if at least one leading space token was present⟩\@nil\@nil⟨space token⟩\@nil\relax⟨cmd⟩

\KV@@sp@b itself is delimited by ⟨space token⟩. Thus the ⟨space token⟩ behind \KV@@sp@b will be removed.

The argument of \KV@@sp@b is delimited by ⟨space token⟩\@nil.

Thus you can fork two cases:

Case 1:

If ⟨token list with a leading space token spliced off if at least one leading space token was present⟩ has a trailing ⟨space token⟩, the argument-delimiter will be formed by ⟨token list with a leading space token spliced off if at least one leading space token was present⟩'s trailing ⟨space token⟩ and the \@nil right behind ⟨token list with a leading space token spliced off if at least one leading space token was present⟩ and you get:

\KV@@sp@c⟨token list with a leading space token and/or a trailing space token spliced off if at least one leading/trailing space token was present⟩\@nil⟨space token⟩\@nil\relax⟨cmd⟩

Case 2:

If ⟨token list with a leading space token spliced off if at least one leading space token was present⟩ does not have a trailing ⟨space token⟩, the argument-delimiter will be formed by the ⟨space token⟩ before the third \@nil and that \@nil and you get:

\KV@@sp@c⟨token list with a leading space token and/or a trailing space token spliced off if at least one leading/trailing space token was present⟩\@nil\@nil\relax⟨cmd⟩

In both cases the first argument of \KV@@sp@c, which is delimited by \@nil, will be ⟨token list with a leading space token and/or a trailing space token spliced off if at least one leading/trailing space token was present⟩,
the second argument of \KV@@sp@c, which is delimited by \relax, will be the stuff between the first \@nil and the \relax, and the third argument of \KV@@sp@c, which is not delimited, will be formed by ⟨cmd⟩.

Thus in both cases carrying out \KV@@sp@c yields:

\KV@toks@{⟨token list with a leading space token and/or a trailing space token spliced off if at least one leading/trailing space token was present⟩}\edef⟨cmd⟩{\the\KV@toks@}

You might ask the question:

Why assignment to the token register \KV@toks@ and then \edef⟨cmd⟩{\the\KV@toks@} instead of
\def⟨cmd⟩{⟨token list with a leading space token and/or a trailing space token spliced off if at least one leading/trailing space token was present⟩}?

The reason is:

⟨token list with a leading space token and/or a trailing space token spliced off if at least one leading/trailing space token was present⟩ might contain hashes (#).
With \def these might erroneously be taken for something that in the ⟨balanced text⟩ of the definition of ⟨cmd⟩ denotes an argument while the ⟨parameter text⟩ of the definition of ⟨cmd⟩ is empty. This in turn would result in error-messages.
A subtlety of \edef is: When \edef gets the content of a token register via \the⟨token register⟩, then the tokens that form that content will not be expanded further. Besides this each explicit character token of category code 6(parameter), i.e., each hash (#), will be doubled and thus will not be taken for something that in the ⟨balanced text⟩ of the definition of ⟨cmd⟩ denotes an argument.

So on the one hand with the "token register-\edef-way" you have doubling of hashes within the ⟨balanced text⟩ of the definition of ⟨cmd⟩. On the other hand: When macros—⟨cmd⟩ is a macro—get expanded, two consecutive explicit character tokens of category code 6(parameter), i.e., two consecutive hashes (##), collapse into a single token/into a single hash (#). (This is useful when it comes to nesting ⟨definition⟩s within the ⟨balanced text⟩s of ⟨definition⟩s.)

The "token register-\edef-way" ensures that expanding ⟨cmd⟩ yields exactly the same amount/constellation of explicit character tokens of category code 6(parameter)/hashes as is provided in ⟨token list⟩.

That's it.


The \KV@@sp@def-mechanism relies on ⟨token list⟩ not containing the token \@nil.

The \KV@@sp@def-mechanism removes exactly one leading ⟨space token⟩ from ⟨token list⟩ if present and exactly one trailing ⟨space token⟩ from ⟨token list⟩ if present before defining ⟨cmd⟩, even if several leading and/or trailing ⟨space token⟩s are present.

In case the set of tokens that forms ⟨token list with a leading space token spliced off if at least one leading space token was present⟩ is of pattern {⟨balanced text⟩}⟨space token⟩, the pair of curly braces that surrounds ⟨balanced text⟩ will be removed by \KV@@sp@b.

In case after carrying out \KV@@sp@b ⟨token list with a leading space token and/or a trailing space token spliced off if at least one leading/trailing space token was present⟩ is of pattern {⟨balanced text⟩}, the outermost pair of curly braces will be removed by \KV@@sp@c.

This means: Depending on the presence of a leading/trailing ⟨space token⟩, up to two levels of surrounding curly braces might get removed/stripped off. Whether such brace-removal is desired or not depends on the use-case.


By the way:

More about the removal of leading ⟨space token⟩s and trailing ⟨space token⟩s from macro arguments can be found in the solutions to challenge 15 (Space removal) of Michael Downes' Around the Bend-challenges.