Adding UTF-8 support to JS/PHP script

المراكز is Mojibake, or possibly "double encoding", for المراكز -- Please do SELECT col, hex(col) ... to see which of these looks like:

Mojibake: D8A7D984D985D8B1D8A7D983D8B2
double encoding: C398C2A7C399E2809EC399E280A6C398C2B1C398C2A7C399C692C398C2B2

If Mojibake:

  • The bytes to be stored need to be UTF-8-encoded. Fix this.
  • The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. Fix this.
  • The column needs to be declared CHARACTER SET utf8 (or utf8mb4). Fix this.
  • HTML should start with <meta charset=UTF-8>.

If double-encoding: This is caused by converting from latin1 (or whatever) to utf8, then treating those bytes as if they were latin1 and repeating the conversion.

More discussion:

Trouble with UTF-8 characters; what I see is not what I stored

Do not use the mysql_* interface in PHP; switch to mysqli_* or PDO interfaces. mysql_* was removed in PHP 5.7.


If your database is latin1, it will store unicode characters as multi-byte characters. If it's utf-8 based, it will still store multiple characters but displayed in a more "sensible" manner.

If, your ر character is represented as XYZ (3 bytes), then when you retrieve XYZ, the browser will reassemble them into a visible ر.

However, if your database is utf-8, it'll further encode each component, so that you are "reliably" seeing XYZ in the end. Let's say X is denoted as x1,x2, and Y is just y, and Z is z1,z2,z3, so instead of seeing ر, which is stored as XYZ, you now see x1x2yz1z2z3, which is shown as XYZ.

Try converting your database to latin1 to at least confirm my theory. Thanks.

Edit:

There is no need to use a utf8 js library. Make sure your page's character encoding is utf8:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

When you POST the data, you can encode it with encodeURIComponent before sending with a XHR request. I'm not sure whether the jQuery flavor of $.ajax already does the encoding.