Why python-cgi fails on unicode?

When running from the console Python can detect the encoding of the console and implicitly converts Unicode printed to the console to that encoding. It can still fail if that encoding doesn't support the characters you are trying to print. UTF-8 can support all Unicode characters, but other common console encodings like cp437 on US Windows don't.

When stdout is not a console, Python 2.X defaults to ASCII when it can't determine a console encoding. That's why in a web sever you have to be explicit and encode your output yourself.

As an example, try the following script from a console and from your webserver:

import sys
print sys.stdout.encoding

From the console you should get some encoding, but from the web server you should get None. Note that Python 2.X uses ascii but Python 3.X uses utf-8 when the encoding cannot be determined.

The problem can also occur at a console when redirecting output. This script:

import sys
print >>sys.stderr,sys.stdout.encoding
print >>sys.stderr,sys.stderr.encoding

returns the following when run directly vs. redirecting stdout:

C:\>test
cp437
cp437

C:\>test >out.txt
None
cp437

Note stderr wasn't affected since it wasn't redirected.

The environment variable PYTHONIOENCODING can be used to override the default stdout/stdin encoding as well.


Try applying the utf-8 codecs on stdin and stdout...

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import cgitb
import sys
import codecs

sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
# If you need input too, read from char_stream as you would sys.stdin
char_stream = codecs.getreader('utf-8')(sys.stdin)

cgitb.enable()

print "Content-Type: text/html;charset=utf-8"
print 
s=u'Nikolja \u043d\u0435 \u0421\u0430\u0440\u043a\u043e\u0437\u0438!'
print s