FILTER_FLAG_STRIP_LOW vs FILTER_FLAG_STRIP_HIGH?

FILTER_FLAG_STRIP_LOW

Remove characters with ASCII value < 32

FILTER_FLAG_STRIP_HIGH

Remove characters with ASCII value > 127


The flags are explained in a different page of the documentation.

FILTER_FLAG_STRIP_LOW strips bytes in the input that have a numerical value <32, most notably null bytes and other control characters such as the ASCII bell. This is a good idea if you intend to pass an input to another application which uses null-terminated strings. In general, characters with a Unicode codepoint lower than 32 should not occur in user input, except for the newline characters 10 and 13.

FILTER_FLAG_STRIP_HIGH strips bytes in the input that have a numerical value >127. In almost every encoding, those bytes represent non-ASCII characters such as ä, ¿, etc. Passing this flag can be a band-aid for broken string encoding, which can become a security vulnerability. However, non-ASCII characters are to be expected in virtually all user input.

To summarize:

filter_var("\0aä\x80", FILTER_SANITIZE_STRING) == "\0aä\x80"
filter_var("\0aä\x80", FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_LOW) == "aä\x80"
filter_var("\0aä\x80", FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH) == "\0a"
filter_var("\0aä\x80", FILTER_SANITIZE_STRING,
           FILTER_FLAG_STRIP_LOW | FILTER_FLAG_STRIP_HIGH) == "a"