Encoding SQL_Latin1_General_CP1_CI_AS into UTF-8

I know this post is old, but the only thing that work for me was iconv("CP850", "UTF-8//TRANSLIT", $var); I had the same issues with SQL_Latin1_General_CP1_CI_AI, maybe it work for SQL_Latin1_General_CP1_CI_AS too.


For me, none of the above was the direct solution--though I did use parts of above solutions. This worked for me with the Vietnamese alphabet. If you come across this post and none of the above work for you, try:

    $req = "SELECT CAST(MY_COLUMN as VARBINARY(MAX)) as MY_COLUMN FROM MY_TABLE"; 
    $stmt = $conn->prepare($req);
    $stmt->execute();
    while ($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
        $str = pack("H*",$row['MY_COLUMN']);
        $str = mb_convert_encoding($z, 'HTML-ENTITIES','UCS-2LE');
        print_r($str);
    }

And a little bonus--I had to json_encode this data and was (duh) getting html code instead of the special characters. to fix just use html_entity_decode() on the strings before sending with json_encode.


You can try so:

header("Content-Type: text/html; charset=utf-8");
$dbhost   = "hostname";
$db       = "database";
$query = "SELECT *
    FROM Estado
    ORDER BY Nome";
$conn = new PDO( "sqlsrv:server=$dbhost ; Database = $db", "", "" );
$stmt = $conn->prepare( $query, array(PDO::ATTR_CURSOR => PDO::CURSOR_SCROLL, PDO::SQLSRV_ATTR_CURSOR_SCROLL_TYPE => PDO::SQLSRV_CURSOR_BUFFERED, PDO::SQLSRV_ENCODING_SYSTEM) );
$stmt->execute();
while ( $row = $stmt->fetch( PDO::FETCH_ASSOC ) )
{
// CP1252 == code page Latin1
print iconv("CP1252", "ISO-8859-1", "$row[Nome] <br>");
}

I found how to solve it, so hopefully this will be helpful to someone.

First, SQL_Latin1_General_CP1_CI_AS is a strange mix of CP-1252 and UTF-8. The basic characters are CP-1252, so this is why all I had to do was UTF-8 and everything worked. The asian and other UTF-8 characters are encoded on 2 bytes and the php pdo_mssql driver seems to hate varying length characters so it seems to do a CAST to varchar (instead of nvarchar) and then all the 2 byte characters become question marks ('?').

I fixed it by casting it to binary and then I rebuild the text with php:

SELECT CAST(MY_COLUMN AS VARBINARY(MAX)) FROM MY_TABLE;

In php:

//Binary to hexadecimal
$hex = bin2hex($bin);

//And then from hex to string
$str = "";
for ($i=0;$i<strlen($hex) -1;$i+=2)
{
    $str .= chr(hexdec($hex[$i].$hex[$i+1]));
}
//And then from UCS-2LE/SQL_Latin1_General_CP1_CI_AS (that's the column format in the DB) to UTF-8
$str = iconv('UCS-2LE', 'UTF-8', $str);