Special character not displaying as expected

The reason for having saved the file with Windows-1252 encoding (most likely) instead of UTF-8 encoding resulting in getting the non-ASCII character displayed wrong in the browsers was missing knowledge about UTF-8 detection by UltraEdit and perhaps also appropriate UTF-8 configuration.

How currently latest version 22.10 of UltraEdit detects UTF-8 encoding is explained in detail in user-to-user forum topic UTF-8 not recognized, largish file. This forum topic contains also recommendations on how to configure UltraEdit best for HTML writers who use mainly UTF-8 encoding for all HTML files. The UTF-8 detection was greatly improved with UltraEdit v24.00 which detects UTF-8 encoded characters also on in very large files on scrolling to a block containing a UTF-8 encoded character.

Unfortunately the regular expression search used by currently latest UltraEdit v22.10 and previous versions to detect a UTF-8 HTML character set declaration does not work for short HTML5 variant as reported in forum topic Short UTF-8 charset declaration in HTML5 header. The reason is the double quote character between charset= and utf-8. I reported this by email to IDM Computer Solutions, Inc. as the referenced topic was created with the suggestion to make the small change in the regular expression to detect also short HTML5 UTF-8 declaration. The UTF-8 detection was updated later by the developers of UltraEdit for UE v24.00 and UES v17.00 as a post on referenced forum topic explains in detail.

However, when an HTML5 file is declared as UTF-8 encoded, but UltraEdit loaded it as ANSI file, the user can see the wrong loading in the status bar at bottom of main window. A small (less than 64 KB) UTF-8 encoded HTML file should result in getting

  • either U8- and line terminator type (DOS/UNIX/MAC) displayed for users of UE < v19.00 or when using basic status bar in later versions of UE
  • or UTF-8 selected in encoding selector in status bar for users of UE v19.00 or later versions not using basic status bar.

If this is not the case, the UltraEdit user can use

  • Save As from menu File and select UTF-8 - NO BOM for Encoding (Windows Vista or later) respectively Format (Windows 2000/XP) to convert the file from ANSI to UTF-8 without byte order mark, or
  • ASCII to UTF-8 (Unicode editing) from submenu Conversions in menu File to convert the file from ASCII/ANSI to UTF-8 without an immediate save, or
  • select Unicode - UTF-8 via encoding selector in status bar (UE v19.00 or later only) resulting also in an immediate conversion from ASCII/ANSI to UTF-8 and enabling Unicode editing.

For the last two options the UTF-8 BOM settings at Advanced - Settings or Configuration - File Handling - Save determine saving the file without or with byte order mark on next save.

Once the word méywe is saved into the file using UTF-8 encoding resulting in byte stream 6D C3 A9 79 77 65 (hexadecimal) which would be displayed as méywe when UTF-8 encoded file is opened in ASCII/ANSI mode (option in File - Open dialog) using Windows-1252 as code page, UltraEdit detects this file on next opening automatically as UTF-8 encoded file although <meta charset="utf-8"> is not recognized because there is now at least one UTF-8 encoded character in the first 64 KB of the file.

To answer the question:

What did I miss?

You missed to save the file as UTF-8 encoded file after having it opened or created as ANSI file (or more precise single byte per character encoded text file using a code page) and having it declared as UTF-8 encoded. This is a common problem of many users writing into an HTML file

<meta charset="utf-8">

or

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

or

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

or into an XML file

<?xml version="1.0" encoding="UTF-8"?>

or

<?xml version="1.0" encoding='utf-8'?>

and other variations depending on usage of ' or " and writing either UTF-8 or utf-8 (and other spellings) without really knowing what this string means for the applications interpreting the bytes of the file.

What's the best default new file format? contains lots of useful information and links to web pages with useful information about text encoding, which one to use for which file types and how to configure UltraEdit accordingly.


Check and see if the server is sending a charset in the Content-type header. The encoding specified in that will take precedence over what you specify with the meta element.


1 - Replace your

<meta charset="utf-8">

with

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

2 - Check if your HTML Editor's encoding is in UTF8. Usually this option is found on the tabs on the top of the program, like in Notepad++.

3 - Check if your browser is compatible with your font, if you're somehow importing a font. Or try and add a css to set your fonts to a default/generally accepted one like

body
{
    font-family: "Times New Roman", Times, serif;
}

Hope it helps :)

Tags:

Html

Utf 8