Is "\n" a vertical whitespace, i.e., should "\v" match it?

perldoc perlrecharclass says that \v matches a "vertical whitespace character". This is further explained:

"\v" matches any character considered vertical whitespace; this includes the platform's carriage return and line feed characters (newline) plus several other characters, all listed in the table below. "\V" matches any character not considered vertical whitespace. They use the platform's native character set, and do not consider any locale that may otherwise be in use.

Specifically, \v matches the following characters in 5.16:

$ unichars -au '\v'           # From Unicode::Tussle
 ---- U+0000A LINE FEED
 ---- U+0000B LINE TABULATION
 ---- U+0000C FORM FEED
 ---- U+0000D CARRIAGE RETURN
 ---- U+00085 NEXT LINE
 ---- U+02028 LINE SEPARATOR
 ---- U+02029 PARAGRAPH SEPARATOR

You could use a character class to get the same effect as Perl's \v.

Of course this applies to Perl; I don't know whether it applies to Java.


Java 7's Javadoc for java.util.regex.Pattern explicitly mentions \v in its "list of Perl constructs not supported by this class". So it's not that \n doesn't belong to Java's category of "vertical whitespace"; it's that Java 7 doesn't have a category of "vertical whitespace". Instead, Java 7 regexes have an undocumented feature whereby they interpret \v as referring to the vertical tab character, U+000B. (This is a traditional escape sequence from C/C++/Bash/etc., though Java string literals don't support it. Likewise with \a for alert/bell and \cX for control-character X.)

Edited to add: This has changed in newer versions of Java. According to Java 8's Javadoc for java.util.regex.Pattern, \v now means "A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029]".