Python pickle calls cPickle?

In Anaconda Python3.5 : one can access cPickle as

import _pickle as cPickle

credits to Mike McKerns


Actually, if you have pickled objects from python2.x, then generally can be read by python3.x. Also, if you have pickled objects from python3.x, they generally can be read by python2.x, but only if they were dumped with a protocol set to 2 or less.

Python 2.7.10 (default, Sep  2 2015, 17:36:25) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> x = [1,2,3,4,5]
>>> import math
>>> y = math.sin
>>>     
>>> import pickle 
>>> f = open('foo.pik', 'w') 
>>> pickle.dump(x, f)
>>> pickle.dump(y, f)
>>> f.close()
>>> 
dude@hilbert>$ python3.5
Python 3.5.0 (default, Sep 15 2015, 23:57:10) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('foo.pik', 'rb') as f:
...   x = pickle.load(f)
...   y = pickle.load(f)
... 
>>> x
[1, 2, 3, 4, 5]
>>> y
<built-in function sin>

Also, if you are looking for cPickle, it's now _pickle, not pickle.

>>> import _pickle
>>> _pickle
<module '_pickle' from '/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/lib-dynload/_pickle.cpython-35m-darwin.so'>
>>> 

You also asked how to stop pickle from using the built-in (C++) version. You can do this by using _dump and _load, or the _Pickler class if you like to work with the class objects. Confused? The old cPickle is now _pickle, however dump, load, dumps, and loads all point to _pickle… while _dump, _load, _dumps, and _loads point to the pure python version. For instance:

>>> import pickle
>>> # _dumps is a python function
>>> pickle._dumps
<function _dumps at 0x109c836a8>
>>> # dumps is a built-in (C++)
>>> pickle.dumps
<built-in function dumps>
>>> # the Pickler points to _pickle (C++)
>>> pickle.Pickler 
<class '_pickle.Pickler'>
>>> # the _Pickler points to pickle (pure python)
>>> pickle._Pickler
<class 'pickle._Pickler'>
>>> 

So if you don't want to use the built-in version, then you can use pickle._loads and the like.


It's looking like the pickled data that you're trying to load was generated by a version of the program that was running on Python 2.7. The data is what contains the references to cPickle.

The problem is that Pickle, as a serialization format, assumes that your standard library (and to a lesser extent your code) won't change layout between serialization and deserialization. Which it did -- a lot -- between Python 2 and 3. And when that happens, Pickle has no path for migration.

Do you have access to the program that generated mnist.pkl.gz? If so, port it to Python 3 and re-run it to regenerate a Python 3-compatible version of the file.

If not, you'll have to write a Python 2 program that loads that file and exports it to a format that can be loaded from Python 3 (depending on the shape of your data, JSON and CSV are popular choices), then write a Python 3 program that loads that format then dumps it as Python 3 pickle. You can then load that Pickle file from your original program.

Of course, what you should really do is stop at the point where you have ability to load the exported format from Python 3 -- and use the aforementioned format as your actual, long-term storage format.

Using Pickle for anything other than short-term serialization between trusted programs (loading Pickle is equivalent to running arbitrary code in your Python VM) is something you should actively avoid, among other things because of the exact case you find yourself in.