PrevUpHomeNext

Pickle support

Introduction
The Pickle Interface
Example
Pitfall and Safety Guard
Practical Advice
Light-weight alternative: pickle support implemented in Python

Pickle is a Python module for object serialization, also known as persistence, marshalling, or flattening.

It is often necessary to save and restore the contents of an object to a file. One approach to this problem is to write a pair of functions that read and write data from a file in a special format. A powerful alternative approach is to use Python's pickle module. Exploiting Python's ability for introspection, the pickle module recursively converts nearly arbitrary Python objects into a stream of bytes that can be written to a file.

The Boost Python Library supports the pickle module through the interface as described in detail in the Python Library Reference for pickle. This interface involves the special methods __getinitargs__, __getstate__ and __setstate__ as described in the following. Note that Boost.Python is also fully compatible with Python's cPickle module.

At the user level, the Boost.Python pickle interface involves three special methods:

__getinitargs__

When an instance of a Boost.Python extension class is pickled, the pickler tests if the instance has a __getinitargs__ method. This method must return a Python tuple (it is most convenient to use a boost::python::tuple). When the instance is restored by the unpickler, the contents of this tuple are used as the arguments for the class constructor.

If __getinitargs__ is not defined, pickle.load will call the constructor (__init__) without arguments; i.e., the object must be default-constructible.

__getstate__

When an instance of a Boost.Python extension class is pickled, the pickler tests if the instance has a __getstate__ method. This method should return a Python object representing the state of the instance.

__setstate__

When an instance of a Boost.Python extension class is restored by the unpickler (pickle.load), it is first constructed using the result of __getinitargs__ as arguments (see above). Subsequently the unpickler tests if the new instance has a __setstate__ method. If so, this method is called with the result of __getstate__ (a Python object) as the argument.

The three special methods described above may be .def()'ed individually by the user. However, Boost.Python provides an easy to use high-level interface via the boost::python::pickle_suite class that also enforces consistency: __getstate__ and __setstate__ must be defined as pairs. Use of this interface is demonstrated by the following examples.

There are three files in python/test that show how to provide pickle support.

The C++ class in this example can be fully restored by passing the appropriate argument to the constructor. Therefore it is sufficient to define the pickle interface method __getinitargs__. This is done in the following way: Definition of the C++ pickle function:

struct world_pickle_suite : boost::python::pickle_suite
{
  static
  boost::python::tuple
  getinitargs(world const& w)
  {
      return boost::python::make_tuple(w.get_country());
  }
};

Establishing the Python binding:

class_<world>("world", args<const std::string&>())
      // ...
      .def_pickle(world_pickle_suite())
      // ...

The C++ class in this example contains member data that cannot be restored by any of the constructors. Therefore it is necessary to provide the __getstate__/__setstate__ pair of pickle interface methods:

Definition of the C++ pickle functions:

struct world_pickle_suite : boost::python::pickle_suite
  {
    static
    boost::python::tuple
    getinitargs(const world& w)
    {
      // ...
    }

    static
    boost::python::tuple
    getstate(const world& w)
    {
      // ...
    }

    static
    void
    setstate(world& w, boost::python::tuple state)
    {
      // ...
    }
  };

Establishing the Python bindings for the entire suite:

class_<world>("world", args<const std::string&>())
    // ...
    .def_pickle(world_pickle_suite())
    // ...

For simplicity, the __dict__ is not included in the result of __getstate__. This is not generally recommended, but a valid approach if it is anticipated that the object's __dict__ will always be empty. Note that the safety guard described below will catch the cases where this assumption is violated.

This example is similar to pickle2.cpp. However, the object's __dict__ is included in the result of __getstate__. This requires a little more code but is unavoidable if the object's __dict__ is not always empty.

The pickle protocol described above has an important pitfall that the end user of a Boost.Python extension module might not be aware of:

__getstate__ is defined and the instance's __dict__ is not empty.

The author of a Boost.Python extension class might provide a __getstate__ method without considering the possibilities that: * his class is used in Python as a base class. Most likely the __dict__ of instances of the derived class needs to be pickled in order to restore the instances correctly. * the user adds items to the instance's __dict__ directly. Again, the __dict__ of the instance then needs to be pickled.

To alert the user to this highly unobvious problem, a safety guard is provided. If __getstate__ is defined and the instance's __dict__ is not empty, Boost.Python tests if the class has an attribute __getstate_manages_dict__. An exception is raised if this attribute is not defined:

RuntimeError: Incomplete pickle support (__getstate_manages_dict__ not set)

To resolve this problem, it should first be established that the __getstate__ and __setstate__ methods manage the instances's __dict__ correctly. Note that this can be done either at the C++ or the Python level. Finally, the safety guard should intentionally be overridden. E.g. in C++ (from pickle3.cpp):

struct world_pickle_suite : boost::python::pickle_suite
{
  // ...

  static bool getstate_manages_dict() { return true; }
};

Alternatively in Python:

import your_bpl_module
class your_class(your_bpl_module.your_class):
  __getstate_manages_dict__ = 1
  def __getstate__(self):
    # your code here
  def __setstate__(self, state):
    # your code here
  • In Boost.Python extension modules with many extension classes, providing complete pickle support for all classes would be a significant overhead. In general complete pickle support should only be implemented for extension classes that will eventually be pickled.
  • Avoid using __getstate__ if the instance can also be reconstructed by way of __getinitargs__. This automatically avoids the pitfall described above.
  • If __getstate__ is required, include the instance's __dict__ in the Python object that is returned.

The pickle4.cpp example demonstrates an alternative technique for implementing pickle support. First we direct Boost.Python via the class_::enable_pickling() member function to define only the basic attributes required for pickling:

class_<world>("world", args<const std::string&>())
    // ...
    .enable_pickling()
    // ...

This enables the standard Python pickle interface as described in the Python documentation. By "injecting" a __getinitargs__ method into the definition of the wrapped class we make all instances pickleable:

# import the wrapped world class
from pickle4_ext import world

# definition of __getinitargs__
def world_getinitargs(self):
  return (self.get_country(),)

# now inject __getinitargs__ (Python is a dynamic language!)
world.__getinitargs__ = world_getinitargs

See also the tutorial section on injecting additional methods from Python.


PrevUpHomeNext