Regex for all PRINTABLE characters

Very late to the party, but this regexp works: /[ -~]/.

How? It matches all characters in the range from space (ASCII DEC 32) to tilde (ASCII DEC 126), which is the range of all printable characters.

If you want to strip non-ASCII characters, you could use something like:

$someString.replace(/[^ -~]/g, '');

NOTE: this is not valid .net code, but an example of regexp usage for those who stumble upon this via search engines later.


If your regex flavor supports Unicode properties, this is probably the best the best way:

\P{Cc}

That matches any character that's not a control character, whether it be ASCII -- [\x00-\x1F\x7F] -- or Latin1 -- [\x80-\x9F] (also known as the C1 control characters).

The problem with POSIX classes like [:print:] or \p{Print} is that they can match different things depending on the regex flavor and, possibly, the locale settings of the underlying platform. In Java, they're strictly ASCII-oriented. That means \p{Print} matches only the ASCII printing characters -- [\x20-\x7E] -- while \P{Cntrl} (note the capital 'P') matches everything that's not an ASCII control character -- [^\x00-\x1F\x7F]. That is, it matches any ASCII character that isn't a control character, or any non-ASCII character--including C1 control characters.

Tags:

.Net

Regex