Slow ANTLR4 generated Parser in Python, but fast in Java

I faced a similar problem so I decided to bump this old post with a possible solution. My grammar ran instantly with the TestRig but was incredibly slow on Python 3.

In my case the fault was the non-greedy token that I was using to produce one line comments (double slash in C/C++, '%' in my case):

TKCOMM : '%' ~[\r\n]* -> skip ;

This is somewhat backed by this post from sharwell in this discussion here:

When performance is a concern, avoid using non-greedy operators, especially in parser rules.

To test this scenario you may want to remove non-greedy rules/tokens from your grammar.

I confirm that the Python 2 and Python 3 runtimes have performance issues. With a few patches, I got a 10x speedup on the python3 runtime (~5 seconds down to ~400 ms).

Posting here since it may be useful to people that find this thread.

Since this was posted, there have been several performance improvements to Antlr's Python target. That said, the Python interpreter will be intrinsically slower than Java or other compiled languages.

I've put together a Python accelerator code generator for Antlr's Python3 target. It uses Antlr C++ target as a Python extension. Lexing & parsing is done exclusively in C++, and then an auto-generated visitor is used to re-build the resulting parse tree in Python. Initial tests show a 5x-25x speedup depending on the grammar and input, and I have a few ideas on how to improve it further.

Here is the code-generator tool:

And here is a fully-functional example:

Hope this is useful to those who prefer using Antlr in Python!