Extract the numerical and non-numerical portion from text

An approach using the LaTeX3 l3regex module

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{array,booktabs,expl3,l3regex}
\ExplSyntaxOn
\tl_new:N \l_extract_tl
\regex_set:Nn \l_extract_tl { ^\s*([+-]?\d*\.?\d*)\s*(.*) }
\seq_new:N \l_extract_seq
\tl_new:N \NumberValue
\tl_new:N \OtherValue
\cs_new_protected:Npn \extract_number:n #1
  {
    \regex_extract_once:NnN  \l_extract_tl {#1} \l_extract_seq
    \tl_gset:Nx \NumberValue { \seq_item:Nn \l_extract_seq { 2 } }
    \tl_gset:Nx \OtherValue { \seq_item:Nn \l_extract_seq { 3 } }
  }
\cs_new_protected:Npn \Test #1
  {
    \extract_number:n {#1}
    & \detokenize{#1} & \NumberValue & \OtherValue
  }
\ExplSyntaxOff
\begin{document}
\begin{tabular}{l>{\ttfamily}r>{\ttfamily}r>{\ttfamily}r}
  \toprule
             & \multicolumn{1}{r}{Input} & 
               \multicolumn{1}{r}{Digit} & \multicolumn{1}{r}{Non-digit} \\
  \midrule
   Decimal:  \Test{ 1.01abc}               \\
             \Test{+2.01abc}               \\ 
             \Test{-3.01abc}               \\
  \midrule
   Integer:  \Test{  abc}                  \\
             \Test{ 5abc}                  \\ 
             \Test{+6abc}                  \\
             \Test{-7abc}                  \\
  \midrule
   Floating Point: \Test{ 5.34abc}         \\
                   \Test{+6.34abc}         \\
                   \Test{-7.34abc}         \\
  \midrule
   Number Only:    \Test{3}                \\
                   \Test{3.2}              \\ 
                   \Test{-5.1}             \\
                   \Test{+5.1}             \\
  \midrule
   No Digits:      \Test{abc}              \\
  \midrule
   Formatted Text: \Test{  8$abc_1$}       \\ 
                   \Test{-8.2$abc_1$}      \\ 
                   \Test{+$abc_1$}         \\
                   \Test{$abc_1$}          \\
  \bottomrule
\end{tabular}
\end{document}

Currently, this module is 'experimental' hence loading it separately from expl3, but I'd expect it to move to 'kernel' in the near-ish future (before the end of the year).

The way that this works is that when we do a regular expression match, the capturing groups are stored in a sequence indexed from 0 (the complete match) upward. So I've got the first capture group as the numerical part and the second as the non-numerical. Notice that I've also included \s* to remove any leading spaces from those two groups: if you miss that out then you'll also pick up the spaces as part of the match.

Also notice that the results here are detokenized, so if you want to have formatted text you'd need to \scantokens the results. (Something as simple as \scantokens\expandafter{\OtherValue} would do here.)


If you can use luatex, you may use a proper parser (the code below is in ConTeXt, just because I don't know all the details of using luatex in LaTeX).

 \startluacode
  local P, R, S, V, match = lpeg.P, lpeg.R, lpeg.S, lpeg.V, lpeg.match
  local Ct, C, Cs, Cc = lpeg.Ct, lpeg.C, lpeg.Cs, lpeg.Cc

  local format = string.format

  local digit    = R("09")
  local sign     = S('+-')
  local integer  = sign^0 * digit^0 -- NOTE: I'd rather use digit^1, but
                                    -- the requirements want to capture a
                                    --  single sign as well
  local float    = sign^0 * digit^0 * P('.') * digit^1
  local space    = P(" ")^0

  local number   = Cs(float + integer)
  local any      = Cs(P(1)^0)

  local number_value = Cc("\\global\\def\\NumberValue{%s}") * number / format
  local other_value  = Cc("\\global\\def\\OtherValue{%s}")  * any    / format
  local parser = Cs(space * number_value * other_value)

  function commands.extract_number(s)
      context(match(parser,s))
  end
\stopluacode

\unprotect
\def\extract#1%
    {\let\NumberValue\relax
     \let\OtherValue \relax
     \ctxcommand{extract_number(\!!bs\detokenize{#1}\!!es)}}
\protect

You can then use this as follows.

\def\Test#1%
    {\extract{#1}%
     #1 \NC \NumberValue \NC \OtherValue}

\starttext

\starttabulate[|l|r|r|r|]
  \HL
  \NC           \NC Input \NC Digit \NC Non-Digit \NC \NR
  \HL
  \NC Decimal:  \NC \Test{ 1.01abc}               \NC \NR
  \NC           \NC \Test{+2.01abc}               \NC \NR 
  \NC           \NC \Test{-3.01abc}               \NC \NR
  \HL
  \NC Integer:  \NC \Test{  abc}                  \NC \NR
  \NC           \NC \Test{ 5abc}                  \NC \NR 
  \NC           \NC \Test{+6abc}                  \NC \NR
  \NC           \NC \Test{-7abc}                  \NC \NR
  \HL
  \NC Floating Point: \NC \Test{ 5.34abc}         \NC \NR
  \NC                 \NC \Test{+6.34abc}         \NC \NR
  \NC                 \NC \Test{-7.34abc}         \NC \NR
  \HL
  \NC Number Only:    \NC \Test{3}                \NC \NR
  \NC                 \NC \Test{3.2}              \NC \NR 
  \NC                 \NC \Test{-5.1}             \NC \NR
  \NC                 \NC \Test{+5.1}             \NC \NR
  \HL
  \NC No Digits:      \NC \Test{abc}              \NC \NR
  \HL
  \NC Formatted Text: \NC \Test{  8$abc_1$}       \NC \NR 
  \NC                 \NC \Test{-8.2$abc_1$}      \NC \NR 
  \NC                 \NC \Test{+$abc_1$}         \NC \NR
  \NC                 \NC \Test{$abc_1$}          \NC \NR
  \HL
\stoptabulate
\stoptext

which gives

enter image description here


Here is a solution with xstring:

\documentclass[border=2pt]{standalone}
\usepackage{booktabs}
\usepackage{xstring}
\makeatletter
% first, need to fix a bug in xstring:
\@xs@newmacro\IfDecimal{}{1}{0}{%
    \@xs@formatnumber{#1}\@xs@reserved@A
    \decimalpart\z@
    \afterassignment\@xs@defafterinteger\integerpart\@xs@reserved@A\relax\@xs@nil
    \expandafter\@xs@testdot\@xs@afterinteger\@xs@nil
    \ifx\@empty\@xs@afterdecimal\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi}

\newcommand*\Test[1]{%
    \IfBeginWith{#1}{ }{\StrBehind{#1}{ }[\temp@@]}{\def\temp@@{#1}}%
    \IfDecimal\temp@@
        {\def\temp@{#1&#1&}}
        {\def\temp@{#1&}%
        \StrBefore{#1}\@xs@afterdecimal[\temp@@]%
        \expandafter\g@addto@macro\expandafter\temp@\expandafter{\temp@@&}%
        \expandafter\g@addto@macro\expandafter\temp@\expandafter{\@xs@afterdecimal}%
        }%
    \temp@\\}
\makeatother
\begin{document}
\begin{tabular}{l r r r}
 & &Number &Non-Digits\\
\midrule
Decimal:
&\Test{ 1.01abc}
&\Test{+2.01abc}
&\Test{-3.01abc}

\midrule
Integer:
&\Test{  abc}
&\Test{ 5abc}
&\Test{+6abc}
&\Test{-7abc}

\midrule
Floating Point:
&\Test{ 5.34abc}
&\Test{+6.34abc}
&\Test{-7.34abc}

\midrule
Number Only:
&\Test{3}
&\Test{3.2}
&\Test{-5.1}
&\Test{+5.1}

\midrule
No Digits:
&\Test{abc}

\midrule
Formatted Text:
&\Test{  8$abc_1$}
&\Test{-8.2$abc_1$}
&\Test{+$abc_1$}
&\Test{$abc_1$}
\end{tabular}
\end{document}

EDIT: here is how to do with \ExtractLeadingNumber and \ExtractTralingNonDigits

\makeatletter
% first, need to fix a bug in xstring:
\@xs@newmacro\IfDecimal{}{1}{0}{%
    \@xs@formatnumber{#1}\@xs@reserved@A
    \decimalpart\z@
    \afterassignment\@xs@defafterinteger\integerpart\@xs@reserved@A\relax\@xs@nil
    \expandafter\@xs@testdot\@xs@afterinteger\@xs@nil
    \ifx\@empty\@xs@afterdecimal\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi}

\newcommand*\ExtractLeadingNumber[1]{%
    \IfBeginWith{#1}{ }{\StrBehind{#1}{ }[\temp@@]}{\def\temp@@{#1}}%
    \IfDecimal\temp@@{#1}{\StrBefore{#1}\@xs@afterdecimal}%
}
\newcommand*\ExtractTralingNonDigits[1]{%
    \IfBeginWith{#1}{ }{\StrBehind{#1}{ }[\temp@@]}{\def\temp@@{#1}}%
    \IfDecimal\temp@@{}\@xs@afterdecimal
}
\makeatother

\newcommand*\Test[1]{#1&\ExtractLeadingNumber{#1}&\ExtractTralingNonDigits{#1}\\}

Tags:

Macros

Strings