Why does wprintf transliterate Russian text in Unicode into Latin on Linux?

why does it transliterate Russian text in Unicode into Latin as opposed to transcoding it into UTF-8 or using replacement characters?

Because the starting locale of your program is the default one, the C locale. So it's translating wide string into C locale. C locale doesn't handle UTF-8 nor any unicode, so your standard library does it's best to translate wide characters into some basic character set used in C locale.

You may change the locale to any UTF-8 locale and the program should output UTF-8 string.

Note: (in implementation I know of) the encoding of the FILE stream is determined and saved at the time the stream orientation (wide vs normal) is chosen. Remember to set the locale before doing anything with stdout (ie. this vs this).

Because conversion of wide characters is done according to the currently set locale. By default a C program always starts with a "C" locale which only supports ASCII characters.

You have to switch to any Russian or UTF-8 locale first:

setlocale(LC_ALL, "ru_RU.utf8"); // Russian Unicode
setlocale(LC_ALL, "en_US.utf8"); // English US Unicode

Or to a current system locale (which is likely what you need):

setlocale(LC_ALL, "");

The full program will be:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main() {
  setlocale(LC_ALL, "ru_RU.utf8");
  wprintf(L"Привет, мир!\n");
}

As for your code working as-is on other machines - this is due to how libc operates there. Some implementations (like musl) do not support non-Unicode locales and thus can unconditionally translate wide characters to an UTF-8 sequence.

Why does wprintf transliterate Russian text in Unicode into Latin on Linux?

Tags:

Linux

C

Printf

Non Ascii Characters

Wchar

Related

Recent Posts