DM Python Style Guide¶
This is the version 6.0 of the DM Python Coding Standard. The Introduction to DM’s Code Style Guides provides the overarching Coding Standards policy applicable to all DM code.
Note
Changes to this document must be approved by the System Architect (RFC-24). To request changes to these standards, please file an RFC.
Contents
- DM Python Style Guide
- 0. Python Version
- 1. PEP 8 is the Baseline Coding Style
- 2. Layout
- 3. Whitespace
- 4. Comments
- 5. Documentation Strings (docstrings)
- 6. Naming Conventions
- 7. Source Files & Modules
- 8. Classes
- 9. Comparisons
- 10. Idiomatic Python
- Supporting Python 2.7 and 3.x simultaneously
- A mutable object MUST NOT be used as a keyword argument default
- Context managers (
with
) SHOULD be used for resource allocation - Avoid
dict.keys()
when iterating over keys or checking membership - The
subprocess
module SHOULD be used to spawn processes lambda
SHOULD NOT be used- The
set
type SHOULD be used for unordered collections - The
argparse
module SHOULD be used for command-line scripts - Iterators and generators SHOULD be used to iterate over large data sets efficiently
if False:
andif True:
SHOULD NOT be used- Properties SHOULD be used when they behave like regular instance attributes
0. Python Version¶
All DM Python code MUST work with Python 3¶
All the Python code written by LSST Data Management must be runnable using Python 3. Python 2 will cease to be supported before LSST is operational (PEP 373).
We are writing Python 3.6¶
The current baseline version is Python 3.6.
Python 3 MUST be used for all integration testing and services¶
From 2018 January 1 the baselined version of Python 3 must be used for all services, integration tests, end-to-end processing tests, and data challenges. Python 2 shall only be used when validating compatibility of code described in the next section.
DM Python library code with an external user base MUST support Python 2.7 and 3.x¶
During construction we are expecting external users to be experimenting with some of the DM code and LSST DM library code is being used in their applications.
This user community is currently transitioning from Python 2.7 to Python 3.x, and we should ensure that code works on both Python versions until our dependencies drop support for Python 2.7.
In particular, the Science Pipelines code (commonly referred to as lsst_distrib
) must support 2.7 and 3.x.
Standalone applications, code providing services, and internal programs and modules (such as Qserv, SQuaSH and dax
) do not have to support Python 2.
If code is currently supporting both 2.7 and 3.x, dropping support for Python 2.7 requires an RFC.
New code that has never supported Python 2.7 and which will not be externally usable library code or a dependency of a package that supports 2.7 does not require an RFC to request that 2.7 is not supported.
1. PEP 8 is the Baseline Coding Style¶
Data Management’s Python Coding Style is based on the PEP 8 Style Guide for Python Code with modifications specified in this document.
PEP 8 is used throughout the Python community and should feel familiar to Python developers. DM’s deviations from PEP 8 are primarily motivated by consistency with the DM C++ Style Guide. Additional guidelines are included in this document to address specific requirements of the Data Management System.
Exceptions to PEP 8¶
The following table summarizes all PEP 8 guidelines that are not followed by the DM Python Style Guide. These exceptions are organized by error codes that may be ignored by the flake8 linter (see Code MAY be validated with flake8).
- E133
- Closing bracket is missing indentation. This pycodestyle error (via flake8) is not part of PEP 8.
- E226
- Missing whitespace around arithmetic operator. See Binary operators SHOULD be surrounded by a single space except for [*, /, **, //, %].
- E228
- Missing whitespace around bitwise or shift operator. See Binary operators SHOULD be surrounded by a single space except for [*, /, **, //, %].
- E251
- Unexpected spaces around keyword / parameter equals. See Keyword assignment operators SHOULD be surrounded by a space when statements appear on multiple lines.
- Maximum line length
- See Line Length MUST be less than or equal to 110 columns.
Additionally, packages listed in Naming Conventions for Science Pipelines should disable the following rules:
- N802
- Function name should be lowercase. See Naming Conventions for Science Pipelines.
- N803
- Argument name should be lowercase. See Naming Conventions for Science Pipelines.
- N806
- Variable in function should be lowercase. See Naming Conventions for Science Pipelines.
Code MAY be validated with flake8¶
The flake8 tool may be used to validate Python source code against the portion of PEP 8 adopted by Data Management. Additionally, flake8 statically checks Python for code errors. The separate pep8-naming plugin validates names according to the DM Python Style Guide.
Note
Flake8 only validates code against PEP 8 specifications. This style guide includes additional guidelines that are not automatically linted.
Flake8 command line invocation¶
flake8 --ignore=E133,E226,E228 --max-line-length=110 .
This command lints all Python files in the current directory.
Alternatively, individual files can be specified in place of .
.
The ignored error codes are explained above. N802, N803, and N806 can be added to this list for some packages.
Flake8 configuration files¶
flake8 can be invoked without arguments when a configuration file is present.
This configuration, included in a setup.cfg
file at the root of code repositories, is consistent with the style guide:
[flake8]
max-line-length = 110
ignore = E133, E226, E228, N802, N803
exclude =
bin,
doc,
**/*/__init__.py,
**/*/version.py,
tests/.tests
The exclude
field lists paths that are not usefully linted by flake8 in DM Stack repositories.
Auto-generated Python should not be linted (including bin/
for Stack packages with bin.src/
directories).
We also discourage linting __init__.py
modules due to the abundance of PEP 8 exceptions typically involved.
Lines that intentionally deviate from DM’s PEP 8 MUST include a noqa
comment¶
Lines of code may intentionally deviate from our application of PEP 8 because of limitations in flake8.
In such cases, authors must append a # noqa
comment to the line that includes the specific error code being ignored.
See the flake8 documentation for details .
This prevents the line from triggering false flake8 warnings to other developers, while also linting unexpected errors.
For example, to import a module without using it (to build a namespace, as in a __init__.py
):
from .module import AClass # noqa: F401
autopep8 MAY be used to fix PEP 8 compliance¶
Many PEP 8 issues in existing code can be fixed with autopep8 version 1.2 or newer:
autopep8 . --in-place --recursive \
--ignore E133,E226,E228,E251,N802,N803 --max-line-length 110
The .
specifies the current directory.
Together with --recursive
, the full tree of Python files will be processed by autopep8.
Alternatively, a single file can be specified in place of .
.
autopep8ʼs changes must always be validated before committing.
Style changes must be encapsulated in a distinct commit (see Commits should represent discrete logical changes to the code in DM Development Workflow with Git, GitHub, JIRA and Jenkins).
Note
autopep8 only fixes PEP 8 issues and does not address other guidelines listed here.
2. Layout¶
See also
Documenting Python APIs with Docstrings provides guidelines for the layout of docstrings.
Line Length MUST be less than or equal to 110 columns¶
Limit all lines to a maximum of 110 characters. This conforms to the DM C++ Style Guide (see 4-6).
This differs from the PEP 8 recommendation of 79 characters.
Python’s implied continuation inside parens, brackets and braces SHOULD be used for wrapped lines¶
The preferred way of wrapping long lines is by using Python’s implied line continuation inside parentheses, brackets and braces.
If necessary, you can add an extra pair of parentheses around an expression, but sometimes using a backslash looks better.
In this example, continuation is naturally implied within the __init__
method argument lists, while both \
and parentheses-based continuations are used in the if
statements.
class Rectangle(Blob):
"""Documentation for Rectangle.
"""
def __init__(self, width, height,
color='black', emphasis=None, highlight=0):
# Discouraged: continuation with '\'
if width == 0 and height == 0 and \
color == 'red' and emphasis == 'strong' or \
highlight > 100:
raise ValueError("sorry, you lose")
# Preferred: continuation with parentheses
if width == 0 and height == 0 and (color == 'red' or
emphasis is None):
raise ValueError("I don't think so")
Blob.__init__(self, width, height,
color, emphasis, highlight)
Be aware that the continued line must be distinguished from the following lines through indentation. For example, this will generate an E129 error:
if (width == 0 and
height == 0):
pass
Instead, the continued line should be indented:
if (width == 0 and
height == 0):
pass
Consistency with the DM C++ Coding Guide namespaces SHOULD be followed¶
Consistency with the LSST C++ Coding Standards namespaces exists.
Good:
from lsst.foo.bar import myFunction
is analogous tousing lsst::foo::bar::myFunction
import lsst.foo.bar as fooBar
is analogous tonamespace fooBar = lsst::foo::bar
Disallowed in both Coding Standards (except in __init__.py
library initialization contexts):
from lsst.foo.bar import *
is analogous tousing namespace lsst::foo::bar
3. Whitespace¶
Follow the PEP 8 whitespace style guidelines, with the following adjustments.
The minimum number of parentheses needed for correctness and readability SHOULD be used¶
Yes:
a = b(self.config.nSigmaToGrow*sigma + 0.5)
Less readable:
a = b((self.config.nSigmaToGrow*sigma) + 0.5)
Binary operators SHOULD be surrounded by a single space except for [*
, /
, **
, //
, %
]¶
Always surround these binary operators with a single space on either side; this helps the user see where one token ends and another begins:
- assignment (
=
), - augmented assignment (
+=
,-=
, etc.), - comparisons (
==
,<
,>
,!=
,<>
,<=
,>=
,in
,not in
,is
,is not
), - Booleans (
and
,or
,not
).
Use spaces around these arithmetic operators:
- addition (
+
), - subtraction (
-
)
Never surround these binary arithmetic operators with whitespace:
- multiplication (
*
), - division (
/
), - exponentiation (
**
), - floor division (
//
), - modulus (
%
). Note that a single space must always surround%
when used for string formatting.
For example:
i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a + b)*(a - b)
print('Hello %s' % 'world!')
This deviates from PEP 8, which allows whitespace around these arithmetic operators if they appear alone. Error codes: E226 and E228.
Keyword assignment operators SHOULD be surrounded by a space when statements appear on multiple lines¶
However, if keyword assignments occur on a single line there should be no additional spaces.
Thus this:
# whitespace around multi-line assignment
funcA(
karg1 = value1,
karg2 = value2,
karg3 = value3,
)
# no whitespace around single-line assigment
funcB(x, y, z, karg1=value1, karg2=value2, karg3=value3)
Not this:
funcA(
karg1=value1,
karg2=value2,
karg3=value3,
)
aFunction(x, y, z, karg1 = value1, karg2 = value2, karg3 = value3)
Opposes PEP 8. Error code: E251.
4. Comments¶
Source code comments should follow PEP 8’s recommendations with the following additional requirements.
Comments MUST always remain up-to-date with code changes¶
Comments that contradict the code are worse than no comments. Always make a priority of keeping the comments up-to-date when the code changes!
Sentences in comments SHOULD NOT be separated by double spaces¶
Following PEP 8, comments should be complete sentences.
However, sentences should not be separated by two spaces; a single space is sufficient.
Block comments SHOULD reference the code following them and SHOULD be indented to the same level¶
Block comments generally apply to some (or all) code that follows them, and are indented to the same level as that code.
Each line of a block comment starts with a #
and a single space (unless it is indented text inside the comment).
Paragraphs inside a block comment are separated by a line containing a single #
.
To-do comments SHOULD include a Jira issue key¶
If the commented code is a workaround for a known issue, this rule makes it easier to find and remove the workaround once the issue has been resolved. If the commented code itself is the problem, this rule ensures the issue will be reported on Jira, making it more likely to be fixed in a timely manner.
# TODO: workaround for DM-6789
# TODO: DM-12345 is triggered by this line
5. Documentation Strings (docstrings)¶
Use Numpydoc to format the content of all docstrings. The page Documenting Python APIs with Docstrings authoritatively describes this format. Its guidelines should be treated as an extension of this Python Style Guide.
See also
The ReStructuredText Style Guide—and the RestructuredText Formatting Conventions section in particular—provide guidelines on reStructuredText in general.
Docstrings SHOULD be written for all public modules, functions, classes, and methods¶
Write docstrings for all public modules, functions, classes, and methods. See Documenting Python APIs with Docstrings.
Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does.
This comment should appear after the def
line.
6. Naming Conventions¶
We follow PEP 8ʼs naming conventions, with exceptions listed here. C++ source code included within a Python package SHOULD follow the naming conventions of the Python package for APIs that are to be visible to Python users.
All LSST Python source code is consistent with PEP 8 naming in the following ways:
- class names are
CamelCase
with leading uppercase, - module variables used as module global constants are
UPPERCASE_WITH_UNDERSCORES
,
Some packages, for historical reasons, do not fully adhere to PEP 8. These packages, and the associated naming conventions, are described in Naming Conventions for Science Pipelines. Naming style SHOULD be consistent within a top-level package built by Jenkins, or within a distinct service, and it is RECOMMENDED that PEP 8 naming convention be adopted, whilst understanding that it may be difficult to modify existing packages. Consistency within a package is mandatory. Within these stated constraints new packages SHOULD use PEP 8 naming conventions.
Names may be decorated with leading and/or trailing underscores.
Naming Conventions for Science Pipelines¶
For historical reasons, Science Pipelines code (nominally, all packages included in the lsst_apps
metapackage, as well as meas_*
, pipe_*
, and obs_*
and all dependencies), does not completely adhere to PEP 8-style.
PEP 8 style is used in the following cases:
- class names are
CamelCase
with leading uppercase, - module variables used as module global constants are
UPPERCASE_WITH_UNDERSCORES
,
but all other names are camelCase
with leading lowercase.
In particular:
- Class Attribute Names SHOULD be camelCase with leading lowercase (Error code: N803).
- Module methods (free functions) SHOULD be camelCase with leading lowercase (Error code: N802)
- Compound variable names SHOULD be camelCase with leading lowercase (Error code: N806).
Modules which contain class definitions SHOULD be named after the class name¶
Modules which contain class definitions should be named after the class name (one module per class).
User defined names SHOULD NOT shadow python built-in functions¶
Names which shadow a python built-in function may cause confusion for readers of the code.
Creating a more specific identifier is suggested to avoid collisions.
For example, in the case of filter, filter_name
may be appropriate; for filter objects, something like filter_obj
might be appropriate.
Names l (lowercase: el), O (uppercase: oh), I (uppercase: eye) MUST be avoided¶
Never use these characters as single character variable names:
l
(lowercase letter el),O
(uppercase letter oh), orI
(uppercase letter eye).
In some fonts, these characters are indistinguishable from the numerals one and zero.
When tempted to use l
, use L
instead.
Note
This matches the PEP 8 standard but is repeated here for emphasis.
7. Source Files & Modules¶
A Python source file name SHOULD be camelCase-with-leading-lowercase and ending in ‘.py’¶
A module containing a single class should be a camelCase
-with-leading-lowercase transliteration of the class’s name.
Test files must have the form test_{description}.py
for compatibility with Pytest.
The name of a test case should be descriptive without the need for a trailing numeral to distinguish one test case from another.
ASCII Encoding MUST be used for new code¶
Always use ASCII for new Python code.
- Do not include a coding comment (as described in PEP 263) for ASCII files.
- Existing code using Latin-1 encoding (a.k.a. ISO-8859-1) is acceptable so long as it has a proper coding comment. All other code must be converted to ASCII or Latin-1 except for 3rd party packages used “as is.”
Standard code order SHOULD be followed¶
Within a module, follow the order:
- Shebang line,
#! /usr/bin/env python
(only for executable scripts) - Module-level comments (such as the license statement)
- Module-level docstring
from __future__ import absolute_import, division, print_function
__all__ = [...]
statement, if present- Imports (except
from __future__ import...
) - Private module variables (names start with underscore)
- Private module functions and classes (names start with underscore)
- Public module variables
- Public functions and classes
8. Classes¶
See also
Designing for Inheritance in PEP 8 describes naming conventions related to public and private class APIs.
super
SHOULD NOT be used unless the author really understands the implications (e.g. in a well-understood multiple inheritance hierarchy).¶
Python provides super()
so that each parent class’ method is only called once.
To use super()
, all parent classes in the chain (also called the Method Resolution Order) need to use super()
otherwise the chain gets interrupted.
Other subtleties have been noted in an article by James Knight:
- Never call
super()
with anything but the exact arguments you received, unless you really know what you’re doing. - When you use it on methods whose acceptable arguments can be altered on a subclass via addition of more optional arguments, always accept
*args, **kw
, and callsuper()
likesuper(MyClass, self).currentmethod(alltheargsideclared, *args, **kwargs)
. If you don’t do this, forbid addition of optional arguments in subclasses. - Never use positional arguments in
__init__
or__new__
. Always use keyword args, and always call them as keywords, and always pass all keywords on tosuper()
.
For guidance on successfully using super()
, see Raymond Hettinger’s article Super Considered Super!
9. Comparisons¶
is
and is not
SHOULD only be used for determining if two variables point to same object¶
Use is
or is not
only for the case that you need to know that two variables point to the exact same object.
To test for equality in value, use ==
or !=
instead.
is
and is not
SHOULD be used when comparing to None
¶
There are two reasons:
is None
works with NumPy arrays, whereas== None
does not;is None
is idiomatic.
This is also consistent with PEP 8, which states:
Comparisons to singletons likeNone
should always be done withis
oris not
, never the equality operators.
For sequences, (str
, list
, tuple
), use the fact that empty sequences are False
.
Yes:
if not seq:
pass
if seq:
pass
No:
if len(seq):
pass
if not len(seq):
pass
10. Idiomatic Python¶
Strive to write idiomatic Python. Writing Python with accepted patterns makes your code easier for others to understand and often prevents bugs.
Fluent Python by Luciano Ramalho is an excellent guide to writing idiomatic Python.
Idiomatic Python also reduces technical debt, particularly by easing the migration from Python 2.7 to Python 3. Codes should be written in a way that helps the futurize code converter produce more efficient code. For more information see the online book Supporting Python 3 by Lennart Regebro.
Supporting Python 2.7 and 3.x simultaneously¶
The future
package MUST be used to provide compatibility functionality with Python 2.7¶
We use the future package to provide a means for writing code using Python 3 idioms that will also work on Python 2.7. Details on the process for migrating a 2.7 codebase to support both versions can be found in SQR-014.
itervalues()
and iteritems()
CANNOT be used¶
Python 3 does not support the iter
variants.
For more information on how to handle this efficiently in Python 2 see http://python-future.org/compatible_idioms.html#iterating-through-dict-keys-values-items.
A mutable object MUST NOT be used as a keyword argument default¶
Never use a mutable object as default value for a keyword argument in a function or method.
When used a mutable is used as a default keyword argument, the default can change from one call to another leading to unexpected behavior. This issue can be avoided by only using immutable types as defaults.
For example, rather than provide an empty list as a default:
def proclist(alist=[]):
pass
this function should create a new list in its internal scope:
def proclist(alist=None):
if alist is None:
alist = []
Context managers (with
) SHOULD be used for resource allocation¶
Use the with
statement to simplify resource allocation.
For example to be sure a file will be closed when you are done with it:
with open('/etc/passwd', 'r') as f:
for line in f:
pass
Avoid dict.keys()
when iterating over keys or checking membership¶
For iterating over keys, iterate over the dictionary itself, e.g.:
for x in mydict:
pass
To test for inclusion use in
:
if key in myDict:
pass
This is preferred over keys()
. Use keys()
when storing the keys for later access:
k = list(mydict.keys())
where list
ensures that a view or iterator is not being retained.
The subprocess
module SHOULD be used to spawn processes¶
Use the subprocess
module to spawn processes.
lambda
SHOULD NOT be used¶
Avoid the use of lambda.
You can almost always write clearer code by using a named function or using the functools
module to wrap a function.
The set
type SHOULD be used for unordered collections¶
Use the set
type for unordered collections of objects.
The argparse
module SHOULD be used for command-line scripts¶
Use the argparse
module for command-line scripts.
Command line tasks for pipelines should use lsst.pipe.base.ArgumentParser
instead.
Iterators and generators SHOULD be used to iterate over large data sets efficiently¶
Use iterators, generators (classes that act like iterators) and generator expressions (expressions that act like iterators) to iterate over large data sets efficiently.
if False:
and if True:
SHOULD NOT be used¶
Code must not be placed inside if False:
or if True:
blocks, nor left commented out.
Instead, debugging code and alternative implementations must be placed inside a “named” if
statement.
Such blocks should have a comment describing why they are disabled.
They may have a comment describing the conditions under which said code can be removed (like the completion of a ticket or a particular date).
For example, for code that will likely be removed in the future, once testing is completed:
# Delete old_thing() and the below "if" statement once all unittests are finished (DM-123456).
use_old_method = False
if use_old_method:
old_thing()
else:
new_thing()
It is often beneficial to lift such debugging flags into the method’s keyword arguments to allow users to decide which branch to run. For example:
def foo(x, debug_plots=False):
do_thing()
if debug_plots:
plot_thing()
or, using lsstDebug
, which can be controlled as part of a command line task:
import lsstDebug
def foo(x):
do_thing()
if lsstDebug.Info(__name__).debug_plots:
plot_thing()
Properties SHOULD be used when they behave like regular instance attributes¶
Properties SHOULD be added to Python objects to provide syntactic sugar for a getter (and possibly setter) when all of the following conditions are true:
- The getter method must return the same type the setter method accepts, or the types must have very similar interfaces (e.g. because they are part of the same class hierarchy, or they share an important common interface, such as a Python Sequence).
- Either the returned object must be immutable or modifying it must modify the object on which the property is defined in the expected way. Note that it may be useful to have a getter return an immutable object (e.g.
tuple
instead oflist
) to meet this criterion. This prevents confusing behavior in whicha.b.c = v
could be a silent no-op.- The getter (and setter, if it exists) must be computationally trivial; either the direct return of an internal object or an extremely simple calculation (e.g. the width of a bounding box from its starting and ending x coordinates). In general, getter methods that begin with something other than “get” should not have associated properties.
Some examples:
Image.getBBox()
SHOULD NOT have an associated property, because the returned object (Box2I
) is mutable, but modifying it does not modify the bounding box of theImage
.Psf.computeShape()
SHOULD NOT have an associated property, because the getter is not computationally trivial - as suggested by the method name.Image.getArray()
SHOULD have an associated property, because the returned object is a view that can be modified to modify the original image.Exposure.getWcs()
SHOULD have an associated property, because the returned object is a data member of theExposure
that is returned viashared_ptr
in C++, which allows modifications to theWcs
to automatically affect theExposure
.
Note that C++ getters that return STL container types cannot have properties in Python unless the usual pybind11 conversion (which typically yields list
, dict
, or set
objects) is augmented with a conversion to an immutable type (such as tuple
or frozenset
), because these conversions otherwise always yield mutable objects that do not modify the parent.
The existing getters and setters MUST NOT be removed when defining a property.