How to fix double-encoded UTF8 characters (in an utf-8 table)

The above answer worked for some of my data, but resulted in a lot of NULL columns after running. My thought is if the conversion wasn't successful it returns null. To avoid that, I added a small check.

UPDATE
    tbl

SET
    col =
    CASE
        WHEN CONVERT(CAST(CONVERT(col USING latin1) AS BINARY) USING utf8) IS NULL THEN col
        ELSE CONVERT(CAST(CONVERT(col USING latin1) AS BINARY) USING utf8)
    END

well it is very important to use "utf8mb4" instead of "utf8" since mysql will strip out all the data after an unrecognized character. So the safer method is;

UPDATE tablename SET
field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8mb4);

be careful about this.

The following MySQL function will return the correct utf8 string after double-encoding:

CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8)

It can be used with an UPDATE statement to correct the fields:

UPDATE tablename SET
    field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8);

I meet this issue too, here a solution for Oracle:

update tablename t set t.colname = convert(t.colname, 'WE8ISO8859P1', 'UTF8') where t.colname like '%Ã%'

And another one for Java:

public static String fixDoubleEncoded(String text) {
    final Pattern pattern = Pattern.compile("^.*Ã[^0-9a-zA-Z\\ \t].*$");
    try {
        while (pattern.matcher(text).matches())
            text = new String(text.getBytes("iso-8859-1"), "utf-8");
    }
    catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }
    return text;
}

How to fix double-encoded UTF8 characters (in an utf-8 table)

Tags:

Mysql

String

Character Encoding

Utf 8

Related

Recent Posts