How to convert PascalCase to pascal_case?

Try this on for size:

$tests = array(
  'simpleTest' => 'simple_test',
  'easy' => 'easy',
  'HTML' => 'html',
  'simpleXML' => 'simple_xml',
  'PDFLoad' => 'pdf_load',
  'startMIDDLELast' => 'start_middle_last',
  'AString' => 'a_string',
  'Some4Numbers234' => 'some4_numbers234',
  'TEST123String' => 'test123_string',
);

foreach ($tests as $test => $result) {
  $output = from_camel_case($test);
  if ($output === $result) {
    echo "Pass: $test => $result\n";
  } else {
    echo "Fail: $test => $result [$output]\n";
  }
}

function from_camel_case($input) {
  preg_match_all('!([A-Z][A-Z0-9]*(?=$|[A-Z][a-z0-9])|[A-Za-z][a-z0-9]+)!', $input, $matches);
  $ret = $matches[0];
  foreach ($ret as &$match) {
    $match = $match == strtoupper($match) ? strtolower($match) : lcfirst($match);
  }
  return implode('_', $ret);
}

Output:

Pass: simpleTest => simple_test
Pass: easy => easy
Pass: HTML => html
Pass: simpleXML => simple_xml
Pass: PDFLoad => pdf_load
Pass: startMIDDLELast => start_middle_last
Pass: AString => a_string
Pass: Some4Numbers234 => some4_numbers234
Pass: TEST123String => test123_string

This implements the following rules:

  1. A sequence beginning with a lowercase letter must be followed by lowercase letters and digits;
  2. A sequence beginning with an uppercase letter can be followed by either:
    • one or more uppercase letters and digits (followed by either the end of the string or an uppercase letter followed by a lowercase letter or digit ie the start of the next sequence); or
    • one or more lowercase letters or digits.

A shorter solution: Similar to the editor's one with a simplified regular expression and fixing the "trailing-underscore" problem:

$output = strtolower(preg_replace('/(?<!^)[A-Z]/', '_$0', $input));

PHP Demo | Regex Demo


Note that cases like SimpleXML will be converted to simple_x_m_l using the above solution. That can also be considered a wrong usage of camel case notation (correct would be SimpleXml) rather than a bug of the algorithm since such cases are always ambiguous - even by grouping uppercase characters to one string (simple_xml) such algorithm will always fail in other edge cases like XMLHTMLConverter or one-letter words near abbreviations, etc. If you don't mind about the (rather rare) edge cases and want to handle SimpleXML correctly, you can use a little more complex solution:

$output = ltrim(strtolower(preg_replace('/[A-Z]([A-Z](?![a-z]))*/', '_$0', $input)), '_');

PHP Demo | Regex Demo


A concise solution and can handle some tricky use cases:

function decamelize($string) {
    return strtolower(preg_replace(['/([a-z\d])([A-Z])/', '/([^_])([A-Z][a-z])/'], '$1_$2', $string));
}

Can handle all these cases:

simpleTest => simple_test
easy => easy
HTML => html
simpleXML => simple_xml
PDFLoad => pdf_load
startMIDDLELast => start_middle_last
AString => a_string
Some4Numbers234 => some4_numbers234
TEST123String => test123_string
hello_world => hello_world
hello__world => hello__world
_hello_world_ => _hello_world_
hello_World => hello_world
HelloWorld => hello_world
helloWorldFoo => hello_world_foo
hello-world => hello-world
myHTMLFiLe => my_html_fi_le
aBaBaB => a_ba_ba_b
BaBaBa => ba_ba_ba
libC => lib_c

You can test this function here: http://syframework.alwaysdata.net/decamelize