Pickle is a Python module for object serialization, also known as persistence, marshalling, or flattening.
It is often necessary to save and restore the contents of an object to a file. One approach to this problem is to write a pair of functions that read and write data from a file in a special format. A powerful alternative approach is to use Python's pickle module. Exploiting Python's ability for introspection, the pickle module recursively converts nearly arbitrary Python objects into a stream of bytes that can be written to a file.
The Boost Python Library supports the pickle module through the interface
as described in detail in the Python
Library Reference for pickle. This interface involves the special
methods __getinitargs__
,
__getstate__
and __setstate__
as described in the following.
Note that Boost.Python
is also fully compatible with
Python's cPickle module.
At the user level, the Boost.Python pickle interface involves three special methods:
When an instance of a Boost.Python extension class is pickled, the
pickler tests if the instance has a __getinitargs__
method. This method must return a Python tuple
(it is most convenient to use a boost::python::tuple
). When the instance
is restored by the unpickler, the contents of this tuple are used
as the arguments for the class constructor.
If __getinitargs__
is not defined, pickle.load
will call the constructor (__init__
)
without arguments; i.e., the object must be default-constructible.
When an instance of a Boost.Python
extension class is pickled, the pickler tests if the instance has
a __getstate__
method.
This method should return a Python object representing the state
of the instance.
When an instance of a Boost.Python
extension class is restored by the unpickler (pickle.load
),
it is first constructed using the result of __getinitargs__
as arguments (see above). Subsequently the unpickler tests if the
new instance has a __setstate__
method. If so, this method is called with the result of __getstate__
(a Python object)
as the argument.
The three special methods described above may be .def()
'ed
individually by the user. However, Boost.Python
provides an easy to use high-level interface via the boost::python::pickle_suite
class that also enforces consistency: __getstate__
and __setstate__
must be
defined as pairs. Use of this interface is demonstrated by the following
examples.
There are three files in python/test
that show how to provide pickle support.
The C++ class in this example can be fully restored by passing the appropriate
argument to the constructor. Therefore it is sufficient to define the
pickle interface method __getinitargs__
.
This is done in the following way: Definition of the C++ pickle function:
struct world_pickle_suite : boost::python::pickle_suite { static boost::python::tuple getinitargs(world const& w) { return boost::python::make_tuple(w.get_country()); } };
Establishing the Python binding:
class_<world>("world", args<const std::string&>()) // ... .def_pickle(world_pickle_suite()) // ...
The C++ class in this example contains member data that cannot be restored
by any of the constructors. Therefore it is necessary to provide the
__getstate__
/__setstate__
pair of pickle interface
methods:
Definition of the C++ pickle functions:
struct world_pickle_suite : boost::python::pickle_suite { static boost::python::tuple getinitargs(const world& w) { // ... } static boost::python::tuple getstate(const world& w) { // ... } static void setstate(world& w, boost::python::tuple state) { // ... } };
Establishing the Python bindings for the entire suite:
class_<world>("world", args<const std::string&>()) // ... .def_pickle(world_pickle_suite()) // ...
For simplicity, the __dict__
is not included in the result of __getstate__
.
This is not generally recommended, but a valid approach if it is anticipated
that the object's __dict__
will always be empty. Note that the safety guard described below will
catch the cases where this assumption is violated.
This example is similar to pickle2.cpp. However, the object's __dict__
is included in the result
of __getstate__
. This
requires a little more code but is unavoidable if the object's __dict__
is not always empty.
The pickle protocol described above has an important pitfall that the end user of a Boost.Python extension module might not be aware of:
__getstate__
is defined and the instance's __dict__
is not empty.
The author of a Boost.Python
extension class might provide
a __getstate__
method without
considering the possibilities that: * his class is used in Python as a
base class. Most likely the __dict__
of instances of the derived class needs to be pickled in order to restore
the instances correctly. * the user adds items to the instance's __dict__
directly. Again, the __dict__
of the instance then needs to
be pickled.
To alert the user to this highly unobvious problem, a safety guard is provided.
If __getstate__
is defined
and the instance's __dict__
is not empty, Boost.Python
tests if the class has an attribute
__getstate_manages_dict__
.
An exception is raised if this attribute is not defined:
RuntimeError: Incomplete pickle support (__getstate_manages_dict__ not set)
To resolve this problem, it should first be established that the __getstate__
and __setstate__
methods manage the instances's __dict__
correctly. Note that this can be done either at the C++ or the Python level.
Finally, the safety guard should intentionally be overridden. E.g. in C++
(from pickle3.cpp):
struct world_pickle_suite : boost::python::pickle_suite { // ... static bool getstate_manages_dict() { return true; } };
Alternatively in Python:
import your_bpl_module class your_class(your_bpl_module.your_class): __getstate_manages_dict__ = 1 def __getstate__(self): # your code here def __setstate__(self, state): # your code here
Boost.Python
extension modules with many
extension classes, providing complete pickle support for all classes
would be a significant overhead. In general complete pickle support
should only be implemented for extension classes that will eventually
be pickled.
__getstate__
if the instance can also be reconstructed by way of __getinitargs__
.
This automatically avoids the pitfall described above.
__getstate__
is
required, include the instance's __dict__
in the Python object that is returned.
The pickle4.cpp example demonstrates an alternative technique for implementing pickle support. First we direct Boost.Python via the class_::enable_pickling() member function to define only the basic attributes required for pickling:
class_<world>("world", args<const std::string&>()) // ... .enable_pickling() // ...
This enables the standard Python pickle interface as described in the Python
documentation. By "injecting" a __getinitargs__
method into the definition of the wrapped class we make all instances pickleable:
# import the wrapped world class from pickle4_ext import world # definition of __getinitargs__ def world_getinitargs(self): return (self.get_country(),) # now inject __getinitargs__ (Python is a dynamic language!) world.__getinitargs__ = world_getinitargs
See also the tutorial section on injecting additional methods from Python.