List parsing with special input/optional arguments

This answers (at least partially) the parsing problem. But of course here I'm using specific features of math mode. But the principle should be clear.

\documentclass{article}
\makeatletter
\def\mylist#1{%
  \@for\next:=#1\do{%
    \expandafter\@mylistaux\next[]\@nil
    \@mylist@separator}%
  }
\def\@mylistaux{\@ifnextchar(\@mylistauxi\@mylistauxii}
\def\@mylistauxi(#1)[]\@nil{\@mylist@parentheses{#1}}
\def\@mylistauxii{\@ifnextchar[\@mylistauxiii{\@mylistauxiii[]}}
\def\@mylistauxiii[#1]#2[#3]#4\@nil{\@mylist@brackets{#1}{#2}{#3}}

%% final macros
\def\@mylist@parentheses#1{%
  % do something for the case (xxx)
  $(#1)_{E}$}
\def\@mylist@parentheses#1#2#3{%
  % do something for the case [a]xxx[b]
  % #1 is a, #2 is xxx, #3 is b
  $_{#1}#2_{#3}$
\def\@mylist@separator{\quad$\bullet$\quad}
\makeatother

\begin{document}
\mylist{1,[a]2,3[b],[c]444[d],(5)}
\end{document}

enter image description here

A LaTeX3 implementation that makes clearer how to use the gathered arguments:

\usepackage{xparse}

\ExplSyntaxOn
\seq_new:N \l_tobi_list_seq
\tl_new:N \l_tobi_last_item_tl
\NewDocumentCommand{\mylist}{m}
  {
   \seq_set_split:Nnn \l_tobi_list_seq { , }{ #1 }
   \seq_pop_right:NN \l_tobi_list_seq \l_tobi_last_item_tl
   \seq_map_inline:Nn \l_tobi_list_seq { \tobi_process:w ##1 [ ] \q_stop \tobi_separator: }
   \exp_after:wN \tobi_process:w \l_tobi_last_item_tl [ ] \q_stop
  }
\cs_new:Npn \tobi_process:w 
  {
   \peek_charcode:NTF ( { \tobi_process_aux_i:w } { \tobi_process_aux_ii:w }
  }
\cs_new:Npn \tobi_process_aux_i:w ( #1 ) [ ] \q_stop
  {
   \tobi_final_parentheses:n { #1 }
  }
\cs_new:Npn \tobi_process_aux_ii:w
  {
   \peek_charcode:NTF [ { \tobi_process_aux_iii:w } { \tobi_process_aux_iii:w [ ] }
  }
\cs_new:Npn \tobi_process_aux_iii:w [ #1 ] #2 [ #3 ] #4 \q_stop
  {
   \tobi_final_brackets:nnn { #1 } { #2 } { #3 }
  }
%% Final macros
\cs_new:Npn \tobi_final_parentheses:n #1
  {
   $(#1)\sb{E}$
  }
\cs_new:Npn \tobi_final_brackets:nnn #1 #2 #3
  {
   $\sb{#1}#2\sb{#3}$
  }
\cs_new:Npn \tobi_separator: { \quad\textbullet\quad }
\ExplSyntaxOff

One now has to suitably define \tobi_final_parentheses:n, \tobi_final_brackets:nnn and \tobi_separator:.

It's not necessary to use these names: just use what you prefer; the code might be

\usepackage{xparse}

\ExplSyntaxOn
\seq_new:N \l_tobi_list_seq
\tl_new:N \l_tobi_last_item_tl
\NewDocumentCommand{\mylist}{m}
  {
   \seq_set_split:Nnn \l_tobi_list_seq { , }{ #1 }
   \seq_pop_right:NN \l_tobi_list_seq \l_tobi_last_item_tl
   \seq_map_inline:Nn \l_tobi_list_seq { \tobi_process:w ##1 [ ] \q_stop \mylistseparator }
   \exp_after:wN \tobi_process:w \l_tobi_last_item_tl [ ] \q_stop
  }
\cs_new:Npn \tobi_process:w 
  {
   \peek_charcode:NTF ( { \tobi_process_aux_i:w } { \tobi_process_aux_ii:w }
  }
\cs_new:Npn \tobi_process_aux_i:w ( #1 ) [ ] \q_stop
  {
   \mylistelementinparentheses { #1 }
  }
\cs_new:Npn \tobi_process_aux_ii:w
  {
   \peek_charcode:NTF [ { \tobi_process_aux_iii:w } { \tobi_process_aux_iii:w [ ] }
  }
\cs_new:Npn \tobi_process_aux_iii:w [ #1 ] #2 [ #3 ] #4 \q_stop
  {
   \mylistelementinbrackets { #1 } { #2 } { #3 }
  }
\ExplSyntaxOff

%% Final macros
\newcommand\mylistelementinparentheses[1]{$(#1)\sb{E}$}
\newcommand\mylistelementinbrackets[3]{$\sb{#1}#2\sb{#3}$}
\newcommand\mylistseparator {\quad\textbullet\quad}

so the hard work making macros can be defined as you're accustomed to (the names can be changed, of course).

EDIT

How to define \tobi_final_parentheses:n, \tobi_final_brackets:nnn and \tobi_separator: for the desired output. As you can see, LaTeX3 syntax allows for very easy management of empty arguments.

%% Final macros
% aaa -> \cite{aaa}
% [a]bbb -> \cite[][a]{bbb}
% ccc[b] -> \cite[b]{ccc}
% [c]ddd[d] -> \cite[c][d]{ddd}
% (eee) -> eee

\cs_new:Npn \tobi_final_parentheses:n #1 { #1 }
\cs_new:Npn \tobi_final_brackets:nnn #1 #2 #3
  {
   \tl_if_empty:nTF { #3 }
     {
      \tl_if_empty:nTF { #1 }
        { \cite{#2} }
        { \cite[][#1]{#2} }
     }
     {
      \tl_if_empty:nTF { #1 }
        { \cite[#3]{#2} }
        { \cite[#1][#3]{#2} }
     }
  }
\cs_new:Npn \tobi_separator: {, ~ }
\ExplSyntaxOff

If a luatex solution is ok, then you can try the lpeg parser to achieve what you want.

First the lua file (save it as listparsing.lua). Note that I use -2 as the first argument to make tex.sprint prints thing verbatim. You have to change this value to -1 to use the standard catcode régime. The parser accepts only words (that is a succession of letters) but it can be easily expanded to other types of word (like e.g. control sequences). An other possible enhancement would be to check for balanced brackets (a bit more difficult, as it implies the use of a real grammar).

lpeg = require('lpeg')

local P, R, S, C, Cs, V = lpeg.P, lpeg.R, lpeg.S, lpeg.C, lpeg.Cs, lpeg.V
local match = lpeg.match

local space = S(' \n\t')
local lbracket, rbracket = P('['), P(']')
local lparen, rparen = P('('), P(')')
local comma = P(',') * space^0 / ', '

local letter = R('az') + R('AZ')
local word = letter^1

local digit = R('09')
local number = digit^1

local pretextpost = lbracket * C(word) * rbracket * C(word) * lbracket * C(word) * rbracket / function (a,b,c) return string.format('\\cite[%s][%s]{%s}',a,c,b) end
local pretext = lbracket * C(word) * rbracket * C(word) / function (a,b) return string.format('\\cite[][%s]{%s}',a,b) end
local posttext = C(word) * lbracket * C(word) * rbracket / function (a,b) return string.format('\\cite[%s]{%s}',b,a) end
local text = C(word) / function (a) return string.format('\\cite{%s}',a) end
local special = lparen * C(word) * rparen / function (a) return a  end
local pattern = pretextpost + pretext + posttext + text + special

local parser = Cs(pattern * (comma * pattern)^0)

function parse_and_texprint(s)
   return tex.sprint(-2,match(parser,s))
end

Then the tex file.

\documentclass{standalone}
\directlua{dofile('listparsing.lua')}
\def\mylist#1{%
  \directlua{%
    parse_and_texprint('#1')}}
\begin{document}
\texttt{\mylist{aaa,[a]bbb,ccc[b],[c]ddd[d],(eee)}}
\end{document}

enter image description here


This is not a final solution, as I think what you are asking can get very complicated, but is indicative of building a finite state machine. For demonstration purposes, I will use @tfor to scan the list letter by letter. LaTeX3, I am sure (as egreg's solution shows) offers better possibilities.

\documentclass{article}
\begin{document}
\makeatletter
\newif\if@gather
\newif\if@store
\DeclareRobustCommand\temp{}
\let\@ex\expandafter
\edef\alist{[a]2[b],3[b],[c]4[d],(eee)}
\def\parse#1{%
\@ex\@tfor\@ex\next\@ex:\@ex=#1\do{%
    \@gathertrue\@storetrue
    \if\next,\@gathertrue\@storetrue\def\next{$\bullet$}\fi
    \if\next[\@gathertrue\@storefalse\fi
    \if\next]\@storefalse\@gathertrue\fi
    \if\next(\@gathertrue\@storefalse\fi 
    \if\next)\@storefalse\@gathertrue\fi
    \if@store\edef\temp{\temp\next}\fi
}}
\parse\alist
\temp
\end{document}

The advantage of this method, besides being more readable is that you can capture every letter and either store it in a macro or token list or do something else for it. In the example I just removed all the [], the () and replaced the commas with $\bullet. It will need much more work to achieve what you are after and to make it robust to check for errors and edge cases. A much better and more robust scanner can be build using \futurelet. In this respect you might be able to either borrow code from the soul package, which has an excellent parser or you can use the \SOUL@everytoken macro.