HTML5 Encoding & Cyrillic

If you want to be sure which charset will be used by browser you must have in your page head

 <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

otherwise you are at the mercy of local settings and browser automation.


The text data in the example is UTF-8 encoded text misinterpreted as window-1252 encoded. The reason is that the encoding has not been specified and browsers are forced to make a guess. To fix this, specify the encoding; see the W3C page Character encodings. Two simple ways that work independently of server settings, as long as the server does not send wrong encoding information in HTTP headers:

1) Save the file as UTF-8 with BOM (there is probably an option for this in your authoring program.

2) Add the following tag into the head part:

<meta charset=utf-8>

There is no single default encoding specified for HTML5. On the contrary, browsers are expected to make guesses when no encoding has been declared. This is a fairly complex process, described in 8.2.2.2 Determining the character encoding.