DM Pybind11 Style Guide

This is the DM Pybind11 Coding Standard.

Changes to this document must be approved by the System Architect (RFC-24). To request changes to these standards, please file an RFC.

Contents

Introduction

This document lists pybind11 coding recommendations. The recommendations are based on conventions in upstream pybind11 and best practices developed within DM.

Recommendation Importance

In the guideline sections, the terms required, must, should, amongst others, have special meaning. Refer to Stringency Level reference. DM uses the spirit of the IETF organization’s RFC 2199 Reference definitions.

General

All rules from the DM C++ style guide SHALL also apply to pybind11 wrappers

Fundamentally pybind11 wrappers are just C++. Therefore the rules from the DM C++ Style Guide also apply, unless they are in direct conflict with one of the other rules in this guide.

All rules from the DM Python style guide SHALL also apply to pybind11 wrappers

The generated extension modules are used from Python and should closely resemble Python types. Therefore the rules from the DM Python Style Guide also apply, unless they are in direct conflict with one of the other rules in this guide.

This applies in particular to pure Python code sections, except in cases where deviation is explicitly allowed.

Two rules in the Python coding guide (inherited from PEP8) are particularly relevant to pybind11-related wrappers:

  • from <module> import * should only be used in __init__.py modules that “lift” symbols to package level (and contain no other code).
  • __all__ should be defined by any module whose symbols will be thusly lifted.

Modules and source files

Wrappers for a C++ header file SHOULD go in a Python module with a lowercased version of the header file name

For example, C++ code from LinearTransform.h would be wrapped in a module named linearTransform. If the wrappers are defined purely in C++, the source code would go in linearTransform.cc (see the following rule for the case where both C++ and Python code are present).

By wrapping different headers into separate modules (to be combined in __init__) we make builds more parallelize able, make it easier to avoid circular dependencies, and make partial rebuilds faster.

If a group of headers together provide functionality that cannot be used independently, they may be wrapped into a single module. The headers wrapped by such a module must be prominently listed in a comment near the top of the source file.

Wrappers that contain both C++ and Python code MUST define a subpackage

When the wrappers for a header (or group of closely-related headers) require both C++ and Python, both files MUST be moved to a new Python subpackage, with an __init__.py file that lifts all public symbols from both modules to package scope. The Python module need not export symbols also provided by the C++ module (frequently, it will simply modify them, by e.g. adding methods to classes using the lsst.utils.continueClass decorator). The C++ module name should still be the lowercased header file name, and the Python module name MUST be this with a “Continued” suffix.

For example, for header file LinearTransform., we would have:

linearTransform/linearTransform.cc:
    <C++ wrappers>

linearTransform/linearTransformContinued.py:
    <Python extensions to the wrappers>

linearTransform/__init__.py:
    from .linearTransform import *
    from .linearTransformContinued import *

Trivial extensions to wrappers SHOULD be implemented in C++

Simple extensions such as __repr__ or __reduce__ should be implemented via lambdas in compiled modules, utilizing the pybind11 Python C++ API (e.g. pybind11::object) as necessary.

Longer extensions that involve significant logic or language constructs difficult to implement using the C++ Python API (e.g. generators) should go in pure-Python files.

This rule applies regardless of whether a pure-Python extension module already exists; this prevents the correct code organization from becoming a function of history.

Using pure-Python modules only when necessary minimizes the number of source files and helps keep class definitions together.

Pybind11 headers should precede all other headers in the include ordering

pybind11.h includes Python.h and must hence be included before all other headers. To keep a reasonable grouping, all other pybind11 headers should be included in this same include block.

C++ wrapper modules SHOULD import the wrapper modules corresponding to the headers they include

This can be done with the pybind11::module::import() function. Note that it requires absolute module names, and doesn’t add any symbols to the compiled module (which is exactly what we want). For example, within the lsst.afw.geom.spherePoint module, which depends on the wrappers for Angle, we’d do:

PYBIND11_PLUGIN(spherePoint) {
    py::module::import("lsst.afw.geom.angle");
    py::module mod("spherePoint");
    ...
}

When importing wrappers that are defined by a subpackage, the subpackage (not just the C++ wrapper module) should be imported. This insulates each module from changes in how its dependencies are wrapped.

Some elements of pybind11 wrappers will fail (at runtime) if the wrappers that contain related types (e.g. base classes and those used as function arguments or return values) have not yet been imported. Our convention that wrapper modules mirror headers means the appropriate modules to import can generally be guessed from the list of headers included by the header the wrappers correspond to.

It may be impossible to import modules for some types used in a wrapper due to circular dependencies - such relationships are common in C++ (where they are typically handled with forward declarations), but circular relationships between Python modules are not allowed. In these cases we should attempt to ensure both modules are imported together in a parent package level.

Wrapper code shared across modules MUST be placed in a python.h file (or subdirectory) in the relative include path

For example, common code to wrap lsst::afw::table shall go either into:

include/lsst/afw/table/python.h

or:

include/lsst/afw/table/python/myname.h

When multiple headers are added to a python subdirectory, in general we SHOULD NOT add an aggregating python.h file; the presence of such a file encourages including more headers than are actually needed, leading to slower compilation times.

Wrapper code shared across packages SHOULD go into utils

More specifically it SHOULD go into include/lsst/utils/python/*.h in the utils package.

The only exception should be utility code that depends on other code that is not already in utils’ dependency tree.

Naming conventions

The pybind11 namespace MUST be aliased to py in source files

All pybind11 wrapper modules should include:

namespace py = pybind11;

This alias MUST NOT be defined at namespace scope in header files (see C++ rule 4-13), though it MAY be defined locally within functions in headers. For example:

#include "pybind11/pybind11.h"

namespace py = pybind11;  // required in .cc, not allowed in .h

namespace lsst { namespace afw { namespace geom { namespace {

void declareFunctions(py::module & mod) {
    namespace py = pybind11; // okay in .h, unnecessary in .cc
    ...
}

}}}} // namespace lsst::afw::geom::<anonymous>

Module object names MUST be “mod” or camel case prefixed with “mod”

If a wrapper only contains one module instance the name of the object shall be mod. Otherwise (e.g. if another module is imported into a local variable) it shall be camel case prefixed with mod as in modExample.

Class object names MUST be “cls” or camel case prefixed with “cls”

If a wrapper only contains one class the name of the object shall be cls. Otherwise it shall be camel case prefixed with cls as in clsExample.

When using a cls prefix, it is strongly encouraged to use the full class name for the remainder. However you MAY also use an abbreviated name.

Method chaining MAY be used to increase code readability

When a named class object is not needed, chaining methods can reduce boilerplate.

For example:

py::class_<Example>(mod, "Example")
    .def("foo", &Example::foo)
    .def("bar", &Example::bar);

This syntax is essentially always used with enum (see enum syntax).

Lambda arguments referring to the current object MUST be named “self”

For example:

clsExample.def("f", [](Example const & self, ... ) { ... });

Lambda arguments referring to the second object in a copy constructor or operator wrapper MUST be named “other”

For example:

clsExample.def("__eq__", [](Example const & self, Example const & other) { ... });

Names of generic class types SHOULD be called “Class”

It is sometimes desirable to give a class type a generic name (either as typename, typedef or using alias). In such cases prefer to call the type Class. This is especially common in declare functions.

Names of generic pybind11 class types SHOULD be called “PyClass”

When a generic type name or alias refers to a pybind11::class_<Ts...> object prefer to call it PyClass. This is especially again common in declare functions.

Organization

Wrappers for templates SHALL be declared in functions prefixed with “declare”

The wrapper for the templated type Example<T> shall be added by a declare function:

namespace {
    template <typename T>
    void declareExample(py::module & mod, std::string const & suffix) {
        using Class = Example<T>;
        py::class<Class, std::shared_ptr<Class>> cls(mod, ("Example" + suffix).c_str());

        cls.def("test", &Class::test);
        ...
    }
}

...

PYBIND11_PLUGIN(_Example) {
    declareExample<float>(mod, "F");
    declareExample<int>(mod, "I");
    ...
}

The return type may be non-void in case more functionality needs to be added later. The suffix argument may be omitted when not needed (e.g. when adding function overloads).

Separate declare functions MAY also be used to avoid code duplication and increase readability

In some cases it is useful to split up wrapping over multiple (non-templated) declare functions. For instance when multiple classes are defined in a single module, or when classes share many related methods.

For example:

template <typename Class, typename PyClass>
void declareCommon(PyClass & cls) {
    cls.def("read", &Class::read);
}

void declareFoo(py::module & mod) {
    py::class_<Foo> cls(mod, "Foo");

    declareCommon<Foo>(cls);
}

void declareBar(py::module & mod) {
    py::class_<Bar> cls(mod, "Bar");

    declareCommon<Bar>(cls);
}

Wrapper code in source files MUST be placed in a nested anonymous namespace

For example:

namespace sphgeom {

namespace {

...  // declare functions...

}  // <anonymous>

PYBIND11_PLUGIN(...
   ...
}

}  // sphgeom
}  // lsst

Using anonymous namespaces ensures symbols that need not be public aren’t, avoiding name clashes, reducing the size of libraries, and improving link times.

Common wrapper code in headers MUST be placed in the nested python namespace

For example:

namespace lsst {
namespace sphgeom {
namespace python {

...  // declare functions...

}  // python
}  // sphgeom
}  // lsst

py::class_ instantiations MUST be declared only once

Because py::class_ objects take many template arguments (which may change), an instantiation for a C++ type must be declared in exactly one place. If this type must appear in places other than the declaration of py::class_ instance, such as a declare function, a type alias or template type deduction should be used to avoid repeating the full py::class_ type.

When no template deduction is needed, a type alias is usually preferable:

using PyThing = py::class_<Thing>;

declareCommon(PyThing & cls) {
    ...
}

PYBIND11_PLUGIN(_Thing) {
    PyThing cls(...);
    declareThingMethods(cls);
}

If template deduction is used, it should be used on the full type, not the template parameters for py::class_ itself:

template <typename PyClass>
declareCommon(PyClass & cls) {
    ...
}

PYBIND11_PLUGIN(_Thing) {
    py::class_<Thing> cls(...);
    declareCommon(cls);
}

There should be no need to provide the template parameters explicitly when calling declareCommon here; they are inferred from the type passed to it.

Use of pybind11 features

overload_cast SHALL be used to disambiguate overloads

Example:

// overloaded function
mod.def("test", py::overload_cast<int>(test));
mod.def("test", py::overload_cast<double>(test));

// overloaded class member function
cls.def("computeSomething",
        py::overload_cast<int, double>(&MyClass::computeSomething, py::const_),
        "firstParam"_a, "anotherParam"_a);
cls.def("computeSomething",
        py::overload_cast<int, std::string>(&MyClass::computeSomething, py::const_),
        "firstParam"_a, "anotherParam"_a="foo");

Note that py::const_ is necessary for a const member function.

The shared_ptr holder type SHOULD be used for all non-trivial classes

By not specifying a holder type explicitly it becomes unique_ptr, but it is hard to anticipate when wrapping a class whether any downstream code will later use it with shared_ptr. Moreover, C++ functions taking unique_ptr arguments can never be wrapped intuitively in Python (because Python has no output arguments or ownership transfer), so we do not need to worry about wrapped instances held by shared_ptr that must be converted to unique_ptr for a function call.

The only classes that should be wrapped with unique_ptr are non polymorphic classes that are always passed by value or reference in C++ and are small enough that shared_ptr represents a significant overhead.

Note that this does not mean that shared_ptr must be used in C++ code in preference to other options; the C++ coding guidelines on when to use them still apply.

Keyword names MUST be provided for functions with multiple overloads or more than two arguments

Keyword arguments make Python code significantly more readable, especially when distinguishing between overloads or in long function signatures.

Keyword arguments MAY be provided for non-overloaded functions with two or fewer arguments, and are strongly encouraged if the meaning or order of the arguments is not apparent from the function name.

Literals MUST be used for all named arguments

The _a argument literal, from pybind11::literals MUST be used for all named arguments (e.g. mod.def("f", f, "arg1"_a, "arg2"_a);). The py::arg() construct SHALL NOT be used.

Enum scoping SHALL follow usage in C++

  • Unscoped enums SHALL export their names into the class scope using .export_values:
py::enum_<Class::State>(cls, "State")
    .value("RED", &Class::State::RED)
    .value("GREEN", &Class::State::GREEN)
    .export_values();
  • Scoped enums (i.e. enum class in C++) SHALL NOT use .export_values.

Enums used as integers SHALL be wrapped as integer attributes

Regular (non-class) enums are frequently used in C++ to define a set of related integer constants rather than an actual enumeration. Enums whose values are defined to be distinct bits (e.g. 0x01, 0x02, 0x04) are almost certainly used only as integer constants.

These enums should be wrapped as simple integer class attributes rather than pybind11 enums, e.g.:

cls.attr("NAME1") = py::cast(int(Class::NAME1));
cls.attr("NAME2") = py::cast(int(Class::NAME2));

This avoids a need for casts in Python code to deal with the fact that pybind11 enumerations are not implicitly convertible to int (unlike C++). Anonymous enums or enums with explicit values that are usable in bitwise operations should almost always be wrapped as integer attributes.

All other enums (those that are not used as a collection of integer constants) SHOULD be wrapped with py::enum_.

Enums that have a natural ordering SHALL use py::arithmetic

If enums exposed to Python have a natural ordering, and hence can be expected to be used in comparisons, py::enum_<ExampleEnum>(..., py::arithmetic()) SHALL be used (instead of either not having comparison operators or wrapping them explicitly).

Derived-class overrides of virtual methods MUST be wrapped OR noted with a comment

Because C++ polymorphism ensures the right C++ implementation is always called, only the base class version of a virtual method strictly needs to be wrapped to get the right behavior. And in some cases not wrapping a derived-class override can represent a significant reduction in code duplication. But within a pybind11 file it is hard to identify which methods are virtual, and the absence of a method in wrappers is potentially confusing unless a comment indicates that the method is not wrapped because it is an override.

Default automatic conversions SHALL be used for all STL containers

The pybind11 header pybind11/stl.h provides automatic conversion support (to standard Python list, set, tuple and dict types) for most STL containers (i.e. std::vector, std::set, std::unordered_set, std::pair, std::tuple, std::list, std::map and std::unordered_map). These conversions shall always be used instead of manual wrapping.

Manual wrapping of a standard library type is not a local operation: defining such a wrapper can break code in other modules that use the same type but expect it to be returned to Python as a native Python container.

Where copying of STL containers is undesirable an ndarray type SHOULD be used instead

The ndarray C++ types can share storage with NumPy arrays. This may sometimes require changes to the C++ API.

Default operator support SHALL NOT be used

Support from the pybind11/operators.h header cannot be applied consistently and SHALL NOT be used.

Instead all operators are to be wrapped either directly as any other function:

clsExample.def("__eq__", &Example::operator==, py::is_operator());

or using a lambda function:

clsExample.def("__eq__", [](Example const & self, Example const & other) {
    return self == other;
}, py::is_operator());

Please prefer only one style within a given module for readability.

Note

py::is_operator() is necessary to get the correct NotImplemented return when called with unsupported types. It should not be used in wrapping in-place operators (e.g. __iadd__), however, as this can lead to confusing behavior.

Division MUST be wrapped as __truediv__ and possibly __floordiv__, not __div__

Wrapping __div__ allows old-style division to work, which should be disallowed in all LSST Python code. Not defining it turns subtle differences into easy-to-spot (and fix) exceptions.

The same rule applies for in-place operators: __itruediv__ and __ifloordiv__ may be defined, but __idiv__ should not.

The reference_internal policy SHALL be used for functions (or properties) giving write access to internal data members

When a C++ method returns a non-const reference or (smart) pointer to a data member, it SHALL be wrapped with the py::return_value_policy::reference_internal call policy, even if there is an overload returning a const object of the same type.

When a C++ method returns a const reference or (smart) pointer to a data member (not a new object), and provides no non-const way to access that data member, that method SHALL be wrapped with the py::return_value_policy::automatic call policy (the default, so no need to specify), to prevent accidental modification of the internal data member (which is a much more serious offence in C++ than Python).

In rare cases, py::return_value_policy::reference_internal may be used if the expense of copying the object is large and the likelihood of accidental modification is low.

Module docstrings SHOULD be empty

Wrapper module docstrings are not visible by users (since all classes are lifted into the package namespace by __init__.py), and hence do not need to follow the usual requirements for module-level docstrings. Empty docstrings are preferable to trivial strings that just duplicate information implicit in the naming conventions (e.g. “The ‘thing’ module provides wrappers for thing.h”).

Classes SHOULD define __str__ to return a human readable string representation of the object

__str__ is intended to return a human readable string representation of the object. Typically this can be the output of operator<<:

cls.def("__str__", [](Class const& self) {
    std::ostringstream os;
    os << self;
    return os.str();
});

Classes SHOULD define __repr__ to return a minimal summary of the object including the fully-qualified name of the class

__repr__ is intended to return a minimal summary of the object. It MUST include the fully-qualified name of the class, but MAY be defined to include per-instance values or a summary thereof. For small objects, producing a string that can be passed to eval to reproduce the object is often a good guideline:

clsPoint2D.def("__str__", [](Point2D const& self) {
    return py::str("lsst.afw.geom.Point2D(%d, %d)").format(self.getX(), self.getY());
});