Why is this regular expression so slow in Java?

The extra + causes a lot of backtracking (in a naive regexp implementation) when the string cannot be matched. If the string can be matched, the answer is known in the first try. This explains why the case 2 is fast and only case 3 is slow.


Caveat: I don't really know much about regex internals, and this is really conjecture. And I can't answer why Java suffers from this, but not the others (also, it is substantially faster than your 12 seconds in jshell 11 when I run it, so it perhaps only affects certain versions).

"aaaaaaaaaaaaaaaaaaaaaaaaaaaabs".matches("(a+)+b")

There are lots of ways that lots of as could match:

(a)(a)(a)(a)
(aa)(a)(a)
(a)(aa)(a)
(aa)(aa)
(a)(aaa)
etc.

For the input string "aaaaaaaaaaaaaaaaaaaaaaaaaaaab", it will greedily match all of those as in a single pass, match the b, job done.

For "aaaaaaaaaaaaaaaaaaaaaaaaaaaabs", when it gets to the end and finds that the string doesn't match (because of the s), it's not correctly recognizing that the s means it can never match. So, having gone through and likely matched as

(aaaaaaaaaaaaaaaaaaaaaaaaaaaa)bs

it thinks "Oh, maybe it failed because of the way I grouped the as - and goes back and tries all the other combinations of the as.

(aaaaaaaaaaaaaaaaaaaaaaaaaaa)(a)bs  // Nope, still no match
(aaaaaaaaaaaaaaaaaaaaaaaaaa)(aa)bs  // ...
(aaaaaaaaaaaaaaaaaaaaaaaaa)(aaa)bs  // ...
...
(a)(aaaaaaaaaaaaaaaaaaaaaaaaaaa)bs  // ...
(aaaaaaaaaaaaaaaaaaaaaaaaaa(a)(a)bs  // ...
(aaaaaaaaaaaaaaaaaaaaaaaaa(aa)(a)bs  // ...
(aaaaaaaaaaaaaaaaaaaaaaaa(aaa)(a)bs  // ...
...

There are lots of these (I think there are something like 2^27 - that's 134,217,728 - combinations for 28 as, because each a can either be part of the previous group, or start its own group), so it takes a long time.


I don't know Perl too well but the Python version is not equivalent to the Java one. You are using search() but the Java version is using matches(). The equivalent method in Python would be fullmatch()

When I run your examples in Python (3.8.2) with search() I get quick results as you do. When I run it with fullmatch() I get poor (multi-second) execution time. Could it be that your Perl example is also not doing a full match?

BTW: if you want to try the Java version of search you would use:

Pattern.compile("(a+)+b").matcher("aaaaaaaaaaaaaaaaaaaaaaaaaaaabs").find();

There might be some slight difference in the semantics but it should be close enough for this purpose.