Pythonic way to ensure unicode in python 2 and 3

Using six.text_type should suffice virtually always, just like the accepted answer says.

On a side note, and FYI, you could get yourself into trouble in Python 3 if you somehow feed a bytes instance to it, (although this should be really hard to do).

CONTEXT

six.text_type is basically an alias for str in Python 3:

>>> import six
>>> six.text_type
<class 'str'>

Surprisingly, using str to cast bytes instances gives somewhat unexpected results:

>>> six.text_type(b'bytestring')
"b'bytestring'"

Notice how our string just got mangled? Straight from str's docs:

Passing a bytes object to str() without the encoding or errors arguments falls under the first case of returning the informal string representation.

That is, str(...) will actually call the object's __str__ method, unless you pass an encoding:

>>> b'bytestring'.__str__()
"b'bytestring'"
>>> six.text_type(b'bytestring', encoding='utf-8')
'bytestring'

Sadly, if you do pass an encoding, "casting" regular str instances will no longer work:

>>> six.text_type('string', encoding='utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: decoding str is not supported

On a somewhat related note, casting None values can be troublesome as well:

>>> six.text_type(None)
'None'

You'll end up with a 'None' string, literally. Probably not what you wanted.

ALTERNATIVES

  1. Just use six.text_type. Really. There's nothing to worry about unless you interact with bytes on purpose. Make sure to check for Nones before casting though.

  2. Use Django's force_text. Safest way out of this madness if you happen to be working on a project that's already using Django 1.x.x.

  3. Copy-paste Django's force_text to your project. Here's a sample implementation.

For either of the Django alternatives, keep in mind that force_text allows you to specify strings_only=True to neatly preserve None values:

>>> force_text(None)
'None'
>>> type(force_text(None))
<class 'str'>

>>> force_text(None, strings_only=True)
>>> type(force_text(None, strings_only=True))
<class 'NoneType'>

Be careful, though, as it won't cast several other primitive types as well:

>>> force_text(100)
'100'
>>> force_text(100, strings_only=True)
100
>>> force_text(True)
'True'
>>> force_text(True, strings_only=True)
True

Don't re-invent the compatibility layer wheel. Use the six compatibility layer, a small one-file project that can be included with your own:

Six supports every Python version since 2.6. It is contained in only one Python file, so it can be easily copied into your project. (The copyright and license notice must be retained.)

It includes a six.text_type() callable that does exactly this, convert a value to Unicode text:

import six

unicode_x = six.text_type(x)

In the project source code this is defined as:

import sys

PY2 = sys.version_info[0] == 2
PY3 = sys.version_info[0] == 3
# ...

if PY3:
    # ...
    text_type = str
    # ...

else:
    # ...
    text_type = unicode
    # ...