Converting unicode strings and vice-versa

The solutions are platform-dependent. On Windows use MultiByteToWideChar and WideCharToMultiByte API functions. On Unix/linux platforms iconv library is quite popular.


In the future (VS 2010 already supports it), this will be possible in standard C++ (finally!):

#include <string>
#include <locale>

std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
const std::wstring wide_string = L"This is a string";
const std::string utf8_string = converter.to_bytes(wide_string);

C++ by itself doesn't offer this functionality. You'll need a separate library, like libiconv.


The conversion from ASCII to Unicode and vice versa are quite trivial. By design, the first 128 Unicode values are the same as ASCII (in fact, the first 256 are equal to ISO-8859-1).

So the following code works on systems where char is ASCII and wchar_t is Unicode:

const char* ASCII = "Hello, world";
std::wstring Unicode(ASCII, ASCII+strlen(ASCII));

You can't reverse it this simple: 汉 does exist in Unicode but not in ASCII, so how would you "convert" it?

Tags:

C++

Unicode