DM Python Style Guide

This is the version 6.0 of the DM Python Coding Standard. The Introduction to DM’s Code Style Guides provides the overarching Coding Standards policy applicable to all DM code.

Note

Changes to this document must be approved by the System Architect (RFC-24). To request changes to these standards, please file an RFC.

Contents

1. PEP 8 is the Baseline Coding Style

Data Management’s Python coding style is based the PEP 8 Style Guide for Python Code with modifications specified in this document.

PEP 8 is used throughout the Python community and should feel familiar to Python developers. DM’s deviations from PEP 8 are primarily motivated by consistency with the DM C++ Style Guide. Additional guidelines are included in this document to address specific requirements of the Data Management System.

Exceptions to PEP 8

The following table summarizes all PEP 8 guidelines are not followed by the DM Python Style Guide. These exceptions are phrased as error codes that may be ignored by the flake8 linter (see Code MAY be validated with flake8).

E133
Closing bracket is missing indentation. This pycodestyle error (via flake8) is not part of PEP 8.
E226
Missing whitespace around arithmetic operator. See Binary operators SHOULD be surrounded by a single space except for [*, /, **, //, %].
E228
Missing whitespace around bitwise or shift operator. See Binary operators SHOULD be surrounded by a single space except for [*, /, **, //, %].
E251
Unexpected spaces around keyword / parameter equals. See Keyword assignment operators SHOULD be surrounded by a space when statements appear on multiple lines.
N802
Function name should be lowercase. See 6. Naming Conventions.
N803
Argument name should be lowercase. See 6. Naming Conventions.
Maximum line length
See Line Length MUST be less than or equal to 110 columns.

Code MAY be validated with flake8

The flake8 tool may be used to validate Python source code against the portion of PEP 8 adopted by Data Management. In addition, flake8 statically checks Python for code errors. The separate pep8-naming plugin validates names according to the DM Python coding style.

Note

Flake8 only validates code against PEP 8 specifications. This style guide includes additional guidelines are not automatically linted.

Flake8 installation

Linters are installable with pip:

pip install flake8
pip install pep8-naming

Flake8 command line invocation

flake8 --ignore=E133,E226,E228,N802,N803 --max-line-length=110 .

This command lints all Python files in the current directory. Alternatively, individual files can be specified in place of ..

The ignored error codes are explained below.

Flake8 configuration files

LSST DM Packages may also include a setup.cfg file with PEP 8 exceptions:

[flake8]
max-line-length = 110
ignore = E133, E226, E228, N802, N803

flake8 can be invoked without arguments when this configuration is present.

Lines that intentionally deviate from DM’s PEP 8 MUST include a noqa comment

Lines of code may intentionally deviate from our application of PEP 8 (see above) because of limitations in flake8. In such cases, authors must append a # noqa comment to the line that includes the specific error code being ignored. See the flake8 documentation for details . This prevents the line from triggering false flake8 warnings to other developers, while also linting unexpected errors.

For example, to import a module without using it (to build a namespace, as in a __init__.py):

from .module import AClass  # noqa: F401

autopep8 MAY be used to fix PEP 8 compliance

Many PEP 8 issues in existing code can be fixed with autopep8:

autopep8 . --in-place --recursive \
    --ignore E133,E226,E228,E251,N802,N803 --max-line-length 110

The . specifies the current directory. Together with --recursive, the full tree of Python files will be processed by autopep8. Alternatively, a single file can be specified in place of ..

autopep8ʼs changes must always be validated before committing.

Style changes must be encapsulated in a distinct commit (see Commits should represent discrete logical changes to the code in Workflow document).

Note

autopep8 only fixes PEP 8 issues and does not address other guidelines listed here.

2. Layout

Line Length MUST be less than or equal to 110 columns

Limit all lines to a maximum of 110 characters. This conforms to the DM C++ Style Guide (see 4-6).

This differs from the PEP 8 recommendation of 79 characters.

Python’s implied continuation inside parens, brackets and braces SHOULD be used for wrapped lines

The preferred way of wrapping long lines is by using Python’s implied line continuation inside parentheses, brackets and braces.

If necessary, you can add an extra pair of parentheses around an expression, but sometimes using a backslash looks better. In this example, continuation is naturally implied within the __init__ method argument lists, while both \ and parentheses-based continuations are used in the if statements.

class Rectangle(Blob):
    """Documentation for Rectangle.
    """
    def __init__(self, width, height,
                 color='black', emphasis=None, highlight=0):

        # Discouraged: continuation with '\'
        if width == 0 and height == 0 and \
               color == 'red' and emphasis == 'strong' or \
               highlight > 100:
            raise ValueError("sorry, you lose")

        # Preferred: continuation with parentheses
        if width == 0 and height == 0 and (color == 'red' or
                                           emphasis is None):
            raise ValueError("I don't think so")

        Blob.__init__(self, width, height,
                      color, emphasis, highlight)

Be aware that the continued line must be distinguished from the following lines through indentation. For example, this will generate an E129 error:

if (width == 0 and
    height == 0):
    pass

Instead, the continued line should be indented:

if (width == 0 and
        height == 0):
    pass

Blank lines SHOULD NOT be added before or after a docstring

Do not use a blank line on either side of a docstring.

Consistency with the DM C++ Coding Guide namespaces SHOULD be followed

Consistency with the LSST C++ Coding Standards namespaces exists.

Good:

  • from lsst.foo.bar import myFunction is analogous to using lsst::foo::bar::myFunction
  • import lsst.foo.bar as fooBar is analogous to namespace fooBar = lsst::foo::bar

Disallowed in both Coding Standards (except in __init__.py library initialization context):

  • from lsst.foo.bar import * is analogous to using namespace lsst::foo::bar

3. Whitespace

Follow the PEP 8 whitespace style guidelines, with the following adjustments.

The minimum number of parentheses needed for correctness and readability SHOULD be used

Yes:

a = b(self.config.nSigmaToGrow*sigma + 0.5)

Less readable:

a = b((self.config.nSigmaToGrow*sigma) + 0.5)

Binary operators SHOULD be surrounded by a single space except for [*, /, **, //, %]

Always surround these binary operators with a single space on either side; this helps the user see where one token ends and another begins:

  • assignment (=),
  • augmented assignment (+=, -=, etc.),
  • comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not),
  • Booleans (and, or, not).

Use spaces around these arithmetic operators:

  • addition (+),
  • subtraction (-)

Never surround these binary arithmetic operators with whitespace:

  • multiplication (*),
  • division (/),
  • exponentiation (**),
  • floor division (//),
  • modulus (%). Note that a single space must always surround % when used for string formatting.

For example:

i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a + b)*(a - b)
print('Hello %s' % 'world!')

This deviates from PEP 8, which allows whitespace around these arithmetic operators if they appear alone. Error codes: E226 and E228.

Keyword assignment operators SHOULD be surrounded by a space when statements appear on multiple lines

However, if keyword assignments occur on a single line, where should be no additional spaces.

Thus this:

# whitespace around multi-line assignment
funcA(
    karg1 = value1,
    karg2 = value2,
    karg3 = value3,
)

# no whitespace around single-line assigment
funcB(x, y, z, karg1=value1, karg2=value2, karg3=value3)

Not this:

funcA(
    karg1=value1,
    karg2=value2,
    karg3=value3,
)

aFunction(x, y, z, karg1 = value1, karg2 = value2, karg3 = value3)

Opposes PEP 8. Error code: E251.

4. Comments

Source code comments should follow PEP 8’s recommendations with the following additional requirements.

Comments MUST always remain up-to-date with code changes

Comments that contradict the code are worse than no comments. Always make a priority of keeping the comments up-to-date when the code changes!

Sentences in comments SHOULD NOT be separated by double spaces

Following PEP 8, comments should be complete sentences.

However, sentences should not be separated by two spaces; a single space is sufficient.

This differs from PEP 8.

Block comments SHOULD reference the code following them and SHOULD be indented to the same level

Block comments generally apply to some (or all) code that follows them, and are indented to the same level as that code. Each line of a block comment starts with a # and a single space (unless it is indented text inside the comment).

Paragraphs inside a block comment are separated by a line containing a single #.

5. Documentation Strings (docstrings)

Use Numpydoc to format the content of all docstrings. The page Documenting Python APIs authoritatively describes this format. Its guidelines should be treated as an extension of this Python style guide.

See also the ReStructuredText Style Guide and the RestructuredText Formatting Conventions section in particular for guidelines on reStructuredText in general.

Docstrings SHOULD be written for all public modules, functions, classes, and methods

Write docstrings for all public modules, functions, classes, and methods. See Documenting Python APIs.

Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does. This comment should appear after the def line.

6. Naming Conventions

We follow PEP 8ʼs naming conventions, with exceptions listed here. The naming conventions for LSST Python and C++ source have been defined to be as similar as the respective languages allow.

In general:

  • class names are CamelCase with leading uppercase,
  • module variables used as module global constants are UPPERCASE_WITH_UNDERSCORES,
  • all other names are camelCase with leading lowercase.

Names may be decorated with leading and/or trailing underscores.

User defined names SHOULD NOT shadow python built-in functions

Names which shadow a python built-in function may cause confusion for readers of the code. Creating a more specific identifier is suggested to avoid collisions. In the case of filter, filterName may be appropriate; for filter objects, something like filterObj might be appropriate.

Modules which contain class definitions SHOULD be named after the class name

Modules which contain class definitions should be named after the class name (one module per class).

When a Python module wraps a C/C++ extension module, the C/C++ module SHOULD be named <module>Lib

When an extension module written in C or C++ has an accompanying Python module that provides a higher level (e.g. more object oriented) interface, the C/C++ module should append Lib to the module’s name (e.g. socketLib).

Names l (lowercase: el), O (uppercase: oh), I (uppercase: eye) MUST be avoided

Never use these characters as single character variable names:

  • l (lowercase letter el),
  • O (uppercase letter oh), or
  • I (uppercase letter eye).

In some fonts, these characters are indistinguishable from the numerals one and zero. When tempted to use l, use L instead.

7. Source Files & Modules

A Python source file name SHOULD be camelCase-with-leading-lowercase and ending in ‘.py’

A module containing a single class should be a camelCase-with-leading-lowercase transliteration of the class’s name.

The name of a test case should be descriptive without the need for a trailing numeral to distinguish one test case from another.

ASCII Encoding MUST be used for new code

Always use ASCII for new Python code.

  • Do not include a coding comment (as described in PEP 263) for ASCII files.
  • Existing code using Latin-1 encoding (a.k.a. ISO-8859-1) is acceptable so long as it has a proper coding comment. All other code must be converted to ASCII or Latin-1 except for 3rd party packages used “as is.”

Standard code order SHOULD be followed

Within a module, follow the order:

  1. Shebang line, #! /usr/bin/env python (only for executable scripts)
  2. Module-level comments (such as the license statement)
  3. Module-level docstring
  4. Imports
  5. __all__ statement, if any
  6. Private module variables (names start with underscore)
  7. Private module functions and classes (names start with underscore)
  8. Public module variables
  9. Public functions and classes
  10. Optional test suites

8. Classes

See also

Designing for Inheritance in PEP 8 describes naming conventions related to public and private class APIs.

super SHOULD NOT be used unless the author really understands the implications (e.g. in a well-understood multiple inheritance hierarchy).

Python provides super() so that each parent class’ method is only called once.

To use super(), all parent classes in the chain (also called the Method Resolution Order) need to use super() otherwise the chain gets interrupted. Other subtleties have been noted in an article by James Knight:

  • Never call super() with anything but the exact arguments you received, unless you really know what you’re doing.
  • When you use it on methods whose acceptable arguments can be altered on a subclass via addition of more optional arguments, always accept *args, **kw, and call super() like super(MyClass, self).currentmethod(alltheargsideclared, *args, **kwargs). If you don’t do this, forbid addition of optional arguments in subclasses.
  • Never use positional arguments in __init__ or __new__. Always use keyword args, and always call them as keywords, and always pass all keywords on to super().

For guidance on successfully using super(), see Raymond Hettinger’s article Super Considered Super!

9. Comparisons

is and is not SHOULD only be used for determining if two variables point to same object

Use is or is not only for the case that you need to know that two variables point to the exact same object.

To test equality in value, use == or != instead.

is and is not SHOULD be used when comparing to None

There are two reasons:

  1. is None works with NumPy arrays, whereas == None does not;
  2. is None is idiomatic.

This is also consistent with PEP 8, which states:

Comparisons to singletons like None should always be done with is or is not, never the equality operators.

For sequences, (str, list, tuple), use the fact that empty sequences are False.

Yes:

if not seq:
    pass

if seq:
    pass

No:

if len(seq):
    pass

if not len(seq):
    pass

10. Idiomatic Python

Strive to write idiomatic Python. Writing Python with accepted patterns makes your code easier for others to understand and often prevents bugs.

Fluent Python by Luciano Ramalho is an excellent guide to writing idiomatic Python.

Idiomatic Python also reduces technical debt, particularly by easing the migration from Python 2.7 to Python 3. Codes should be written in a way that helps the futurize code converter produce more efficient code. For more information see the online book Supporting Python 3 by Lennart Regebro.

A mutable object MUST NOT be used as a keyword argument default

Never use a mutable object as default value for a keyword argument in a function or method.

When used a mutable is used as a default keyword argument, the default can change from one call to another leading to unexpected behavior. This issue can be avoided by only using immutable types as default.

For example, rather than provide a default empty list:

def proclist(alist=[]):
    pass

this function should create a new list in its internal scope:

def proclist(alist=None):
    if alist is None:
        alist = []

Context managers (with) SHOULD be used for resource allocation

Use the with statement to simplify resource allocation.

For example to be sure a file will be closed when you are done with it:

with open('/etc/passwd', 'r') as f:
    for line in f:
        pass

Use open instead of file

file is gone in Python 3.

lambda SHOULD NOT be used

Avoid the use of lambda. You can almost always write clearer code by using a named function or using the functools module to wrap a function.

The set type SHOULD be used for unordered collections

Use the set type for unordered collections of objects.

The argparse module SHOULD be used for command-line scripts

Use the argparse module for command-line scripts.

Command line tasks for pipelines should use lsst.pipe.base.ArgumentParser instead.

Use from __future__ import division

This means / is floating-point division and // is truncated integer division, regardless of the type of numbers being divided. This gives more predictable behavior than the old operators, avoiding a common source of obscure bugs. It also makes intent of the code more obvious.

Use from __future__ import absolute_import

In addition, import local modules using relative imports (e.g. from . import foo or from .foo import bar). This results in clearer code and avoids shadowing global modules with local modules.

Use as when catching an exception

For example, use except Exception as e or except (LookupError, TypeError) as e. The new syntax is clearer, especially when catching multiple exception classes, and required in Python 3.

Iterators and generators SHOULD be used to iterate over large data sets efficiently

Use iterators, generators (classes that act like iterators) and generator expressions (expressions that act like iterators) to iterate over large data sets efficiently.

Use itervalues() and iteritems() instead of values() and items()

For iterating over dictionary values and items use the above idiom unless you truly need a list.

This pattern does not apply to code that has already been ported to Python 3 with futurize For more information see http://python-future.org/compatible_idioms.html#iterating-through-dict-keys-values-items.

Avoid dict.keys() and dict.iterkeys()

For iterating over keys, iterate over the dictionary itself, e.g.:

for x in mydict:
    pass

To test for inclusion use in:

if key in myDict:
    pass

This is preferred over keys() and iterkeys() and avoids the issues mentioned in the previous item.

Use from __future__ import print_function

The print() function is required in Python 3.

In general, DM code should be use logging instead of print statements.

Use next(myIter) instead of myIter.next()

The special method next has been renamed to __next__ in Python 3. This allows iterators to be advanced with the next() built-in function in both Python 2.7 and Python 3.