Regex. Camel case to underscore. Ignore first occurrence

// (Preceded by a lowercase character or digit) (a capital) => The character prefixed with an underscore
var result = Regex.Replace(input, "(?<=[a-z0-9])[A-Z]", m => "_" + m.Value);
result = result.ToLowerInvariant();
  • This works for both PascalCase and camelCase.
  • It creates no leading or trailing underscores.
  • It leaves in tact any sequences of non-word characters and underscores in the string, because they would seem intentional, e.g. __HiThere_Guys becomes __hi_there_guys.
  • Digit suffixes are (intentionally) considered part of the word, e.g. NewVersion3 becomes new_version3.
  • Digit prefixes follow the original casing, e.g. 3VersionsHere becomes 3_versions_here, but 3rdVersion becomes 3rd_version.
  • Unfortunately, capitalized two-letter acronyms (e.g. in IDNumber, where ID would be considered a separate word), as suggested in Microsoft's Capitalization Conventions, are not supported, since they conflict with other cases. I recommend, in general, to resist this guideline, as it is a seemingly arbitrary exception to the convention of not capitalizing acronyms. Stick with IdNumber.

You can use a lookbehind to ensure that each match is preceded by at least one character:

System.Text.RegularExpressions.Regex.Replace(input, "(?<=.)([A-Z])", "_$0",
                      System.Text.RegularExpressions.RegexOptions.Compiled);

lookaheads and lookbehinds allow you to make assertions about the text surrounding a match without including that text within the match.


Non-Regex solution

string result = string.Concat(input.Select((x,i) => i > 0 && char.IsUpper(x) ? "_" + x.ToString() : x.ToString())); 

Seems to be quite fast too: Regex: 2569ms, C#: 1489ms

Stopwatch stp = new Stopwatch();
stp.Start();
for (int i = 0; i < 1000000; i++)
{
    string input = "ThisIsMySample";
    string result = System.Text.RegularExpressions.Regex.Replace(input, "(?<=.)([A-Z])", "_$0",
            System.Text.RegularExpressions.RegexOptions.Compiled);
}
stp.Stop();
MessageBox.Show(stp.ElapsedMilliseconds.ToString());
// Result 2569ms

Stopwatch stp2 = new Stopwatch();
stp2.Start();
for (int i = 0; i < 1000000; i++)
{
    string input = "ThisIsMySample";
    string result = string.Concat(input.Select((x, j) => j > 0 && char.IsUpper(x) ? "_" + x.ToString() : x.ToString()));
}
stp2.Stop();
MessageBox.Show(stp2.ElapsedMilliseconds.ToString());
// Result: 1489ms

Maybe like;

var str = Regex.Replace(input, "([A-Z])", "_$0", RegexOptions.Compiled);
if(str.StartsWith("_"))
   str = str.SubString(1);

Tags:

C#

Regex