How to check if a string is a valid LaTex rule?

On Linux at least (don't know about Windows), there is the latexdef script by Martin Scharrer, which looks up LaTeX definitions from the command line:

latexdef section 

will print

\section
\long macro:->\@startsection {section}{1}{\z@ }{-3.5ex \@plus -1ex \@minus -.2ex}{2.3ex \@plus .2ex}{\normalfont \Large \bfseries }

whereas

latexdef sausage 

will print

\sausage
undefined

We can invoke latexdef from Python like so:

import subprocess, re

def latexdef(command_list, *args):
    '''
    call latexdef on a list of commands to be looked up
    *args can be used to pass options to latexdef
    '''
    p = subprocess.Popen(['latexdef'] + list(args) + command_list, \
                        stdout=subprocess.PIPE, \
                        stderr=subprocess.STDOUT)
    return p.communicate()[0].strip()

def are_commands(command_list, *args):
    '''
    look up multiple commands and return results in a dict
    '''
    result = latexdef(command_list, *args)
    frags = [ f.splitlines() for f in re.split(r'\n{2,}', result, re.MULTILINE) ]
    return { command[1:] : defn != 'undefined' for command, defn in frags }

def is_command(command, *args):
    '''
    look up a single command
    '''
    return are_commands([command],*args).values()[0]

if __name__ == '__main__':
    commands = "chapter section sausage".split()

    for command in commands:
        print command, is_command(command)

    print "\nwith book class loaded"

    for command in commands:
        print command, is_command(command, '-c', 'book')

    print "\nall at once, with class book"
    print are_commands(commands, '-c', 'book')

This prints

chapter False
section True
sausage False

with book class loaded
chapter True
section True
sausage False

all at once, with class book
{'sausage:': False, 'section:': True, 'chapter:': True}

Each single invocation of latexdef is rather slow, but time can be saved by looking up multiple commands in a single call. This is the purpose of are_commands, which returns the lookup result for each command in a dict.

Also note that latexdef is a Perl script, so depending on how important this is to you, it might make sense to translate the entire thing to Python, thus cutting out the middleman. But it is a longish script, and Perl is kind of hard on the eyes ...


This is not a real answer, but rather a longer comment. The given answer by Michael Palmer does work for most cases if those macros are defined by the core packages/classes.

However: There are some cases you might want to consider. A LaTeX rule how you formulate it probably means command sequence. The typical LaTeX command sequence (I'll call it "cmd" in the following examples) can be produced as the following ABNF:

cmd = "\" 1*ALPHA

But that's not sufficient. You should note that there are internal macros which you might want to in-/exclude separately. That would mean you would have to check for something like

cmd = "\" 1*(ALPHA | "@")

for internal macros. If such a command sequence is valid at the point it is used is context-dependent. Although this rule would check for the validity of the command itself, it mostly has to be used within a \makeatletter ... \makeatother environment to be valid (if your check should involve context).

And that your check should involve context can simply be shown by a command like \frac which is only a "valid LaTeX rule" when used in math mode. Or something like \meter which is only valid within siunitx's commands.

Another case is expl3. l3 commands are also valid in LaTeX if they are enclosed in \ExplSyntaxOn and \ExplSyntaxOff. They would be built with something like this:

cmd = "\" 1*(ALPHA | "_") ":" 0*ALPHA

which is actually not quite true as the characters after the colon are restricted, but it should suffice.

And it's getting even worse if you want to check the validity of user-defined macros within \csname ...\endcsname as the user has many more options here.

Update: The most interesting part after all would be to also check if the call is valid. That would mean that you would have to check the function's signature too and then the command's call. That would mean \frac would only be valid if it's called from within math mode and has two mandatory arguments. F.i. like $\frac{1}{2}$. That's the point where you probably want to compile a sample document, because a real parser would be very complex here.

All those methods have one caveat: You will not only get LaTeX command sequences, but also TeX ones. If you specifically try to get LaTeX ones but want to exclude TeX ones, you'll have a problem.

Update 2: As you were interested in implementation for a test: Here are some regular expression you can use to match. Only on full match you'll actually have a valid sequence in front of you. For the context-sensitive part you may want to work with lookaheads and lookbehinds.

  • standard LaTeX: \\[A-Za-z]*
  • internal LaTeX: \\[A-Za-z@]*
  • expl syntax: \\[A-za-z@_]*:[DNncVvoOxfTFpw]*
  • \csname commands: something like \\.*$