Python, Unicode, and the Windows console

Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!


Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):

PrintFails - Python Wiki

Here's a code excerpt from that page:

$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
    line = u"\u0411\n"; print type(line), len(line); \
    sys.stdout.write(line); print line'
  UTF-8
  <type 'unicode'> 2
  Б
  Б

  $ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
    line = u"\u0411\n"; print type(line), len(line); \
    sys.stdout.write(line); print line' | cat
  None
  <type 'unicode'> 2
  Б
  Б

There's some more information on that page, well worth a read.


Update: Python 3.6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. Internally, it uses the same Unicode API as the win-unicode-console package mentioned below. print(unicode_string) should just work now.


I get a UnicodeEncodeError: 'charmap' codec can't encode character... error.

The error means that Unicode characters that you are trying to print can't be represented using the current (chcp) console character encoding. The codepage is often 8-bit encoding such as cp437 that can represent only ~0x100 characters from ~1M Unicode characters:

>>> u"\N{EURO SIGN}".encode('cp437')
Traceback (most recent call last):
...
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
character maps to 

I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?

Windows console does accept Unicode characters and it can even display them (BMP only) if the corresponding font is configured. WriteConsoleW() API should be used as suggested in @Daira Hopwood's answer. It can be called transparently i.e., you don't need to and should not modify your scripts if you use win-unicode-console package:

T:\> py -m pip install win-unicode-console
T:\> py -m run your_script.py

See What's the deal with Python 3.4, Unicode, different languages and Windows?

Is there any way I can make Python automatically print a ? instead of failing in this situation?

If it is enough to replace all unencodable characters with ? in your case then you could set PYTHONIOENCODING envvar:

T:\> set PYTHONIOENCODING=:replace
T:\> python3 -c "print(u'[\N{EURO SIGN}]')"
[?]

In Python 3.6+, the encoding specified by PYTHONIOENCODING envvar is ignored for interactive console buffers unless PYTHONLEGACYWINDOWSIOENCODING envvar is set to a non-empty string.

Tags:

Python

Unicode