Converting Microsoft Word special characters with PHP

Great solution. I copied and pasted it and it worked with out a problem. On further study, I added a few characters that were not in the search and replace array. In order to find the ASCII character id numbers, I wrote a PHP function which shows what the ASCII character number is:

function stdump($s){

  for($i=0;$i<strlen($s);$i++){

    echo substr($s,$i,1) . "(" . ord(substr($s,$i,1)) . ")";

  }

  echo "<br/>";
}

The character is display and next to it the ascii number is show in parenthesis. Like this:

echo stdump("GPUs…");

produces:

G(71)P(80)U(85)s(115)â(226)€(128)¦(166)

Hope this helps.

--Keith


Hmm. I use this function for sanitizing text copied into an RTE. It may or may not work in this case. It converts to HTML entities, but you could tweak it to just convert to regular characters:

function convertFromCP1252($string)
{
    $search = array('&',
                    '<',
                    '>',
                    '"',
                    chr(212),
                    chr(213),
                    chr(210),
                    chr(211),
                    chr(209),
                    chr(208),
                    chr(201),
                    chr(145),
                    chr(146),
                    chr(147),
                    chr(148),
                    chr(151),
                    chr(150),
                    chr(133),
                    chr(194)
                );

     $replace = array(  '&amp;',
                        '&lt;',
                        '&gt;',
                        '&quot;',
                        '&#8216;',
                        '&#8217;',
                        '&#8220;',
                        '&#8221;',
                        '&#8211;',
                        '&#8212;',
                        '&#8230;',
                        '&#8216;',
                        '&#8217;',
                        '&#8220;',
                        '&#8221;',
                        '&#8211;',
                        '&#8212;',
                        '&#8230;',
                        ''
                    );

    return str_replace($search, $replace, $string);
}

For anyone getting the diamond question mark in PHP, this method of replacing UTF-8 characters worked better than using the chr function.

$search = [                 // www.fileformat.info/info/unicode/<NUM>/ <NUM> = 2018
                "\xC2\xAB",     // « (U+00AB) in UTF-8
                "\xC2\xBB",     // » (U+00BB) in UTF-8
                "\xE2\x80\x98", // ‘ (U+2018) in UTF-8
                "\xE2\x80\x99", // ’ (U+2019) in UTF-8
                "\xE2\x80\x9A", // ‚ (U+201A) in UTF-8
                "\xE2\x80\x9B", // ‛ (U+201B) in UTF-8
                "\xE2\x80\x9C", // “ (U+201C) in UTF-8
                "\xE2\x80\x9D", // ” (U+201D) in UTF-8
                "\xE2\x80\x9E", // „ (U+201E) in UTF-8
                "\xE2\x80\x9F", // ‟ (U+201F) in UTF-8
                "\xE2\x80\xB9", // ‹ (U+2039) in UTF-8
                "\xE2\x80\xBA", // › (U+203A) in UTF-8
                "\xE2\x80\x93", // – (U+2013) in UTF-8
                "\xE2\x80\x94", // — (U+2014) in UTF-8
                "\xE2\x80\xA6"  // … (U+2026) in UTF-8
    ];

    $replacements = [
                "<<", 
                ">>",
                "'",
                "'",
                "'",
                "'",
                '"',
                '"',
                '"',
                '"',
                "<",
                ">",
                "-",
                "-",
                "..."
    ];

    str_replace($search, $replacements, $string);