Anyone know an elegant function to fix name cases?

Here is a try

$names=array();
$names[]="sven-alex crumpet";
$names[]="RONALDO McDonalDO";
$names[]="Boopsie o'Brien";
$names[]="j.r. BOB DOBBS";
$names[]="francesca DE LOS gatOS";
$names[]="yungcheng LI";
$names[]="mr hankey";
$names[]="santas little helper";
$names[]="j.r.r. tolkien";

$splitters=array(' ','.',"'",'-'); //more to come
$fixedNames=array();

foreach($names as $name) {
    $fixed='';
    $blank=str_replace($splitters,'?',$name);
    $n=explode('?',$blank);
    foreach($n as $f) $fixed.=ucfirst(strtolower($f)).' ';
    for ($i=0;$i<strlen($fixed);$i++) {
        if ($fixed[$i]==' ') {
            if ($blank[$i]=='?') {
                $fixed[$i]=$name[$i];
            }
        }
    }
    $fixedNames[]=substr_replace($fixed,'', -1);
}

echo '<pre>';
print_r($fixedNames);
echo '<pre>';

outputs

Array
(
    [0] => Sven-Alex Crumpet
    [1] => Ronaldo Mcdonaldo
    [2] => Boopsie O'Brien
    [3] => J.R. Bob Dobbs
    [4] => Francesca De Los Gatos
    [5] => Yungcheng Li
    [6] => Mr Hankey
    [7] => Santas Little Helper
    [8] => J.R.R. Tolkien
)

It is impossible to "correct" a name like YungCheng without algorithms taking care of regional / cultural conventions and a huge name database to compare with.


While this is a fairly old question now however:

function titleCase($string, $delimiters = array(" ", "-", ".", "'", "O'", "Mc", "Mac"), $exceptions = array("and", "to", "of", "das", "dos", "de", "do", "da", "los", "von", "van", "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX", "X")) {
    /*
     * Exceptions in lower case are words you don't want converted
     * Exceptions all in upper case are any words you don't want converted to title case
     *   but should be converted to upper case, e.g.:
     *   king henry viii or king henry Viii should be King Henry VIII
     */
    $string = mb_convert_case($string, MB_CASE_TITLE, "UTF-8");
    foreach ($delimiters as $dlnr => $delimiter) {
        $words = explode($delimiter, $string);
        $newwords = array();
        foreach ($words as $wordnr => $word) {
            if (in_array(mb_strtoupper($word, "UTF-8"), $exceptions)) {
                // check exceptions list for any words that should be in upper case
                $word = mb_strtoupper($word, "UTF-8");
            } else if (in_array(mb_strtolower($word, "UTF-8"), $exceptions)) {
                // check exceptions list for any words that should be in lower case
                $word = mb_strtolower($word, "UTF-8");
            } else if (!in_array($word, $exceptions)) {
                // convert to uppercase (non-utf8 only)
                $word = ucfirst($word);
            }
            array_push($newwords, $word);
        }
        $string = join($delimiter, $newwords);
    } //foreach
    return $string;
}

It won't work for YungCheng but it will work for pretty much anything else. The only issue is if the $string is ONLY a surname like "do Carmo" then it will return "Do Carmo". It is built for full names really so if you $string = "frederick do carmo"; it will then return "Frederick do Carmo". Hope that is of some help.


This is simply impossible.

Spelling of names varies from country to country, as you show in your question. The easiest way to go is to find the most common way of spelling, and that would be to capitalise every first letter of every 'word', i.e. every string preceded by a space, hyphen, dot or apostroph.

This doesn't fix all your problems (YungCheng, McDonaldo) and leaves you with other issues as well, but that's as close as you're gonna get.

Compare:

  • Alex Van Halen (US spelling)
  • Alex van Halen (correct Dutch spelling)

There's no algorithm fixing this.

This article illustrates the problem with Dutch names very well, and that's just one language. There's probably an article like this for every language in the world. ;)

Tags:

Php