Why ^*$ matches "127.0.0.1"

If you try

Regex.Match("127.0.0.1", "^*1$")

You'll see it also matches. The Match.Index property has a value of 8, meaning that it matched the last '1', not the first one. It makes sense, because "^*" will match zero or more beginning-of-lines and there is zero beginning-of-line before '1'.

Think of the way "a*1$" would match because there is no 'a' before "1$". So "a*$" would match with the end of line, like your example does.

By the way, the MSDN docs don't mention '*' ever matching simply '*' except when escaped as '\*'. And '*' by itself will throw an exception, not match '*'.


Well, theoretically you are right, it should not match. But this depends on how the implementation works internally. Most regex impl. will take your regex and strip ^ from the front (taking note that it must match from start of the string) and strip $ from the end (noting that it must to the end of the string), what is left over is just "*" and "*" on its own is a valid regex. The implementation you are using is just wrong regarding how to handle it. You could try what happens if you replace "^*$" just with "*"; I guess it will also match everything. It seems like the implementation treats a single asterisk like a ".*".

According to ISO/IEC 9945-2:1993 standard, which is also described in the POSIX standard, it is broken. It is broken because the standard says that after a ^ character, an asterisk has no special meaning at all. That means "^*$" should actually only match a single string and this string is "*"!

To quote the standard:

The asterisk is special except when used:

  • in a bracket expression
  • as the first character of an entire BRE (after an initial ^, if any)
  • as the first character of a subexpression (after an initial ^, if any); see BREs Matching Multiple Characters .

So if it is the first character (and ^ doesn't count as first character if present) it has no special meaning. That means in this case an asterisk should only match one character and that is an asterisk.


Update

Microsoft says

Microsoft .NET Framework regular expressions incorporate the most popular features of other regular expression implementations such as those in Perl and awk. Designed to be compatible with Perl 5 regular expressions, .NET Framework regular expressions include features not yet seen in other implementations, such as right-to-left matching and on-the-fly compilation.

Source: http://msdn.microsoft.com/en-us/library/hs600312.aspx

Okay, let's test this:

# echo -n 127.0.0.1 | perl -n -e 'print (($_ =~ m/(^.*$)/)[0]),"\n";'
-> 127.0.0.1
# echo -n 127.0.0.1 | perl -n -e 'print (($_ =~ m/(^*$)/)[0]),"\n";'
->

Nope, it does not. Perl works correctly. ^.*$ matches the string, ^*$ doesn't => .NET's regex implementation is broken and it does not work like Perl 5 as MS claims.


Asterisk (*) matches the preceding element ZERO OR MORE times. If you want one or more, use the + operator instead of the *.

You are asking it to match an optional start of string marker and the end of string marker. I.e. if we omit the start of string marker, you're only looking for the end of string marker... which will match any string!

I don't really understand what you are trying to do. If you could give us more information then maybe I could tell you what you should have done :)

Tags:

C#

Regex