Extract first word in a string

You're almost there, just remove the trailing comma

\documentclass{article}

\makeatletter
\newcommand\FirstWord[1]{\@firstword#1 \@nil}%
\newcommand\@firstword{}%
\newcommand\@removecomma{}%
\def\@firstword#1 #2\@nil{\@removecomma#1,\@nil}%
\def\@removecomma#1,#2\@nil{#1}
\makeatother

\begin{document}

X\FirstWord{John, Paul, George and Ringo}X

X\FirstWord{John}X

X\FirstWord{John and Paul}X

X\FirstWord{{John, Paul}, George and Ringo}X

\end{document}

enter image description here

You can add further tests for removing other delimiters

\documentclass{article}

\makeatletter
\newcommand\FirstWord[1]{\@firstword#1 \@nil}%
\def\@firstword#1 #2\@nil{\@removecomma#1,\@nil}%
\def\@removecomma#1,#2\@nil{\@removeperiod#1.\@nil}
\def\@removeperiod#1.#2\@nil{\@removesemicolon#1;\@nil}
\def\@removesemicolon#1;#2\@nil{#1}
\makeatother

\begin{document}

X\FirstWord{John; Paul; George; Ringo}X

X\FirstWord{John. Paul. George. Ringo}X

X\FirstWord{John}X

X\FirstWord{John and Paul}X

X\FirstWord{{John. Paul}. George. Ringo}X

\end{document}

If you don't need expandability, you can use l3regex:

\documentclass{article}
\usepackage{xparse,l3regex}

\ExplSyntaxOn
\NewDocumentCommand{\FirstWord}{m}
 {
  % split the argument at spaces
  \seq_set_split:Nnn \l_tmpa_seq { ~ } { #1 }
  % get the first item
  \tl_set:Nx \l_tmpa_tl { \seq_item:Nn \l_tmpa_seq { 1 } }
  % remove a trailing period, semicolon or comma (\Z matches the end)
  \regex_replace_once:nnN { [.;,]\Z } { } \l_tmpa_tl
  % output the result
  \tl_use:N \l_tmpa_tl
 }
\ExplSyntaxOff

\begin{document}

X\FirstWord{John, Paul, George and Ringo}X

X\FirstWord{John; Paul; George; Ringo}X

X\FirstWord{John. Paul. George. Ringo}X

X\FirstWord{John}X

X\FirstWord{John and Paul}X

X\FirstWord{{John, Paul}, George and Ringo}X

X\FirstWord{{John. Paul}. George. Ringo}X

\end{document}

Thanks for comment by Mico, this is the suggested pattern to use. PS. I am not a pattern matching expert and do not play one on TV, but the nice thing about lualatex is one can employ sophisticated pattern matching procedures if they are needed.

\documentclass{article}
\usepackage{luacode} % for '\luaexec' and '\luastring' macros

\newcommand{\FirstWord}[1]{\luaexec{tex.print(string.match(\luastring{#1}, '\%w+\%-?\%w*'))}}

\begin{document}
\def\lst{John, Paul, George and Ringo}
\textbf{\FirstWord{\lst}} is the first word in \{\lst\}

\def\lst{-John, Paul, George and Ringo}
\textbf{\FirstWord{\lst}} is the first word in \{\lst\}

\def\lst{Marie-Claire, Paul, George and Ringo}
\textbf{\FirstWord{\lst}} is the first word in \{\lst\}

\end{document}

gives

Mathematica graphics


Earlier version

lualatex solution

Updated with another variation of the call just for illustration.

\documentclass{article}
\usepackage{luacode}

\newcommand{\FirstWord}[1]{\luaexec{tex.print(string.match('#1', '([^,]+)'))}}

\begin{document}
\def\lst{John, Paul, George and Ringo}

\textbf{\FirstWord{\lst}} is the first word in \{\lst\}
\end{document}

The above does not handle special cases such as {{John, Paul}, George and Ringo}. It will still return John for the above.


Original answer

\documentclass{article}
\usepackage{luacode}
\begin{luacode*}
function FirstWord(arg)
tex.print(string.match(arg, '([^,]+)'))
end
\end{luacode*}
\newcommand{\FirstWord}[1]{\directlua{FirstWord("#1")}}

\begin{document}
\def\lst{John, Paul, George and Ringo}

   \textbf{\FirstWord{\lst}} is the first word in \{\lst\}

\end{document}

gives

Mathematica graphics


Admittedly a bit late to the game, but here's a second LuaLaTeX-based solution, which generalizes the earlier answer by @Nasser. This answer's pattern search algorithm satisfies the following criteria:

  • If the string to be searched starts with a substring that's delimited by matching curly braces, the entire substring is returned.

  • Otherwise, the first word is returned. Here, a "word" is taken to be either a collection of alphabetic characters -- e.g., "John" or "Nicolò" -- or a hyphenated pair of words -- e.g., "Kröller-Müller" and "Rhys-Davies". (Put differently, a hyphenated word is taken to be two single words that are joined by exactly one instance of -; the only restriction on the first word in the hyphenated pair is that it contain at least two characters.) Any non-alphabetic characters that precede the "word" in the full string are automatically discarded. The Lua code is unicode-aware, i.e., the words may contain non-ASCII alphabetic characters (such as ö, ü, and ò).

enter image description here

% !TEX TS-program = lualatex
\documentclass{article}
\usepackage{fontspec}
\usepackage{luacode} % for 'luacode' environment and '\luastring' macro
%% Lua-side code: A Lua function that does most of the work
\begin{luacode}
function fw ( s )
   if string.find ( s , '^%b{}' ) then
      first = string.sub ( string.match ( s , '%b{}' ), 2, -2 )
   else
      first = unicode.utf8.match ( s , '%w+%-?%w+' )
   end
   tex.sprint ( first ) 
end
\end{luacode}
%% TeX-side code: A macro that invokes the Lua function
\newcommand{\FW}[1]{\directlua{fw(\luastring{#1})}}

\begin{document}

\def\lst{{John and Paul} but not George or Ringo}
\FW{\lst}

\def\lst{'{Bay- Day} Hay}
\FW{\lst}

\def\lst{Kröller-Müller and Schwassmann-Wassmann}
\FW{\lst}

\end{document}

Tags:

Macros

Strings