What does \\x80-\\xFF refer to?

Okay, all the answers given so far lead me in the right direction and allowed me to find the following in the documentation.

After \x, up to two hexadecimal digits are read (letters can be in upper or lower case). In UTF-8 mode, \x{...} is allowed, where the contents of the braces is a string of hexadecimal digits. It is interpreted as a UTF-8 character whose code number is the given hexadecimal number. The original hexadecimal escape sequence, \xhh, matches a two-byte UTF-8 character if the value is greater than 127.

So, as a summary :-

i) '\x' allows for a hexadecimal escape sequence, after which, up to two hexadecimal digits are read

ii) '\xhh' the two 'hh' letters can be in upper or lower case

iii) '\xhh' specifies a code-point in the range 0-FF

iv) '\x80-\xFF' refers to a character range outside ASCII


x80-xFF are non-ASCII character ranges. They're still printable, both in Latin-1, or encode higher code points for UTF-8.

Using \\x80 over \x80 is slightly more correct. The backslash escapes itself in strings. In single quoted strings too, albeit it's effectively irrelevant there.

In double quoted strings however using just \x80 would be interpreted by PHP, whereas \\x80 would be seen and interpreted by the regex engine.


You don't need to use double backslash in a pattern with PHP, however even if you use it, it is ignored and read as an escape (like a simple backslash).

One exception, if you use the heredoc or nowdoc syntax to enclose the pattern, a double backslash is seen as a literal backslash.

Tags:

Php

Regex