What special characters must be escaped in regular expressions?

Which characters you must and which you mustn't escape indeed depends on the regex flavor you're working with.

For PCRE, and most other so-called Perl-compatible flavors, escape these outside character classes:

.^$*+?()[{\|

and these inside character classes:

^-]\

For POSIX extended regexes (ERE), escape these outside character classes (same as PCRE):

.^$*+?()[{\|

Escaping any other characters is an error with POSIX ERE.

Inside character classes, the backslash is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally, e.g.:

[]^-]

In POSIX basic regular expressions (BRE), these are metacharacters that you need to escape to suppress their meaning:

.^$*[\

Escaping parentheses and curly brackets in BREs gives them the special meaning their unescaped versions have in EREs. Some implementations (e.g. GNU) also give special meaning to other characters when escaped, such as \? and +. Escaping a character other than .^$*(){} is normally an error with BREs.

Inside character classes, BREs follow the same rule as EREs.

If all this makes your head spin, grab a copy of RegexBuddy. On the Create tab, click Insert Token, and then Literal. RegexBuddy will add escapes as needed.


Modern RegEx Flavors (PCRE)

Includes C, C++, Delphi, EditPad, Java, JavaScript, Perl, PHP (preg), PostgreSQL, PowerGREP, PowerShell, Python, REALbasic, Real Studio, Ruby, TCL, VB.Net, VBScript, wxWidgets, XML Schema, Xojo, XRegExp.
PCRE compatibility may vary

    Anywhere: . ^ $ * + - ? ( ) [ ] { } \ |


Legacy RegEx Flavors (BRE/ERE)

Includes awk, ed, egrep, emacs, GNUlib, grep, PHP (ereg), MySQL, Oracle, R, sed.
PCRE support may be enabled in later versions or by using extensions

ERE/awk/egrep/emacs

    Outside a character class: . ^ $ * + ? ( ) [ { } \ |
    Inside a character class: ^ - [ ]

BRE/ed/grep/sed

    Outside a character class: . ^ $ * [ \
    Inside a character class: ^ - [ ]
    For literals, don't escape: + ? ( ) { } |
    For standard regex behavior, escape: \+ \? \( \) \{ \} \|


Notes

  • If unsure about a specific character, it can be escaped like \xFF
  • Alphanumeric characters cannot be escaped with a backslash
  • Arbitrary symbols can be escaped with a backslash in PCRE, but not BRE/ERE (they must only be escaped when required). For PCRE ] - only need escaping within a character class, but I kept them in a single list for simplicity
  • Quoted expression strings must also have the surrounding quote characters escaped, and often with backslashes doubled-up (like "(\")(/)(\\.)" versus /(")(\/)(\.)/ in JavaScript)
  • Aside from escapes, different regex implementations may support different modifiers, character classes, anchors, quantifiers, and other features. For more details, check out regular-expressions.info, or use regex101.com to test your expressions live

Unfortunately there really isn't a set set of escape codes since it varies based on the language you are using.

However, keeping a page like the Regular Expression Tools Page or this Regular Expression Cheatsheet can go a long way to help you quickly filter things out.

Tags:

Regex