Lexer/parser tools

Updated 2015-01-05:

My original answer pointing to a now deleted question:

There are a bunch of good answers to this question already in What parser generator do you recommend

So I've taken the list of items from the deleted answer on archive.org with at least 1 vote here:

  • Packrat
  • Elkhound
  • Antlr
  • Spirit part of Boost (C++)
  • Lemon
  • GOLD Parser
  • DMS Software Rengineering Toolkit (not FOSS)

I've done several flex/bison systems myself but now I'd replace both with Lemon from sqlite since it's one tool, re-entrant and thread safe as well as having a streaming/pull-based model.


The bad news is that most real computer langauges aren't "LALR(1)", which means you have to resort to considerable hackery to make YACC parse real langauges.

If you are designing a DSL, you can use any the LALR parser generators without a lot of trouble precisely because you can change the grammar of your DSL when the parser generator squawks. LL parser generators mostly work here too for the same reason but the lack of left recursion can be a real pain.

If you are uncomprising in the way you like your syntax, GLR parsers are hands-down winners. We use them in the DMS Software Reengineering Toolkit and have built production quality parsers for some 30+ languages including C++, which has a folk theorem saying its nearly impossible to parse. The folk theorem was started by people using LL and LALR parsers to try and handle C++. GLR does it easily.