In which languages is it a security hole to use user-supplied regular expression?

It's generally dynamic languages with an eval facility that tend to have the ability to execute code from regular expressions. In static languages (i.e. those requiring a separate compilation step) there is generally no way to execute code that wasn't compiled, so evaluating code from within a regex is impossible.

Without a way to embed code in a regex, the worst a user can do is write a regex that takes a long time to evaluate.


This is not true: you cannot execute code callbacks in Perl by sneaking them in an evaluated regex. This is forbidden. You have to specifically override that with a lexically scoped

use re "eval";

if you expect to have both interpolation and code escapes happening in the same pattern.

Watch:

% perl -le '$x = "(?{ die 'naughty' })"; "aaa" =~ /$x/'
Eval-group not allowed at runtime, use re 'eval' in regex m/(?{ die naughty })/ at -e line 1.
Exit 255

% perl -Mre=eval -le '$x = "(?{ die 'naughty' })"; "aaa" =~ /$x/'
naughty at (re_eval 1) line 1.
Exit 255

In most languages allowing users to supply regular expression means that you allow for a denial of service attack.

Some types of regular expressions are extremely cpu intensive to execute. So in general it's a bad idea to allow users to enter regular expressions that will be executed on a remote system.

For more info, read this page: http://www.regular-expressions.info/catastrophic.html