What Perl regex can match CamelCase words?

I think you want something like this, written with the /x flag to add comments and insignificant whitespace:

/
   \b      # word boundary so you don't start in the middle of a word

   (          # open grouping
      [A-Z]      # initial uppercase
      [a-z]*     # any number of lowercase letters
   )          # end grouping

   {2,}    # quantifier: at least 2 instances, unbounded max  

   \b      # word boundary
/x

If you want it without the fancy formatting, just remove the whitespace and comments:

/\b([A-Z][a-z]*){2,}\b/

As j_random_hacker points out, this is a bit simple since it will match a word that is just consecutive capital letters. His solution, which I've expanded with /x to show some detail, ensures at least one lowercase letter:

/
    \b          # start at word boundary
    [A-Z]       # start with upper
    [a-zA-Z]*   # followed by any alpha

    (?:  # non-capturing grouping for alternation precedence
       [a-z][a-zA-Z]*[A-Z]   # next bit is lower, any zero or more, ending with upper
          |                     # or 
       [A-Z][a-zA-Z]*[a-z]   # next bit is upper, any zero or more, ending with lower
    )

    [a-zA-Z]*   # anything that's left
    \b          # end at word 
/x

If you want it without the fancy formatting, just remove the whitespace and comments:

/\b[A-Z][a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/

I explain all of these features in Learning Perl.


Assuming you aren't using the regex to do extraction, and just matching...

[A-Z][a-zA-Z]*

Isn't the only real requirement that it's all letters and starts with a capital letter?


brian's and sharth's answers will also report words that consist entirely of uppercase letters (e.g. FOO). This may or may not be what you want. If you want to restrict to just camel-cased words that contain at least one lowercase letter, use:

/\b[A-Z][a-zA-Z]*[a-z][a-zA-Z]*\b/

If in addition you wish to exclude words that consist of a single uppercase letter followed by any number of lowercase letters (e.g. Perl), use:

/\b[A-Z][a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/

(Basically, we require the string to start with a capital letter and to contain at least one additional capital letter and one lowercase letter; these latter two can appear in either order.)

Tags:

Regex

Perl