The Python components of Aurora are built using Pants.

Python Build Conventions

The Python code is laid out according to the following conventions:

  1. 1 BUILD per 3rd level directory. For a list of current top-level packages run:

    % find src/main/python -maxdepth 3 -mindepth 3 -type d |\
    while read dname; do echo $dname |\
        sed 's@src/main/python/\(.*\)/\(.*\)/\(.*\).*@\1.\2.\3@'; done
    
  2. Each BUILD file exports 1 python_library that provides a setup_py containing each python_binary in the BUILD file, named the same as the directory it’s in so that it can be referenced without a ’:’ character. The sources field in the python_library will almost always be rglobs('*.py').

  3. Other BUILD files may only depend on this single public python_library target. Any other target is considered a private implementation detail and should be prefixed with an _.

  4. python_binary targets are always named the same as the exported console script.

  5. python_binary targets must have identical dependencies to the python_library exported by the package and must use entry_point.

    The means a PEX file generated by pants will contain exactly the same files that will be available on the PYTHONPATH in the case of pip install of the corresponding library target. This will help our migration off of Pants in the future.

Annotated example - apache.thermos.runner

% find src/main/python/apache/thermos/runner
src/main/python/apache/thermos/runner
src/main/python/apache/thermos/runner/__init__.py
src/main/python/apache/thermos/runner/thermos_runner.py
src/main/python/apache/thermos/runner/BUILD
% cat src/main/python/apache/thermos/runner/BUILD
# License boilerplate omitted
import os


# Private target so that a setup_py can exist without a circular dependency. Only targets within
# this file should depend on this.
python_library(
  name = '_runner',
  # The target covers every python file under this directory and subdirectories.
  sources = rglobs('*.py'),
  dependencies = [
    '3rdparty/python:twitter.common.app',
    '3rdparty/python:twitter.common.log',
    # Source dependencies are always referenced without a ':'.
    'src/main/python/apache/thermos/common',
    'src/main/python/apache/thermos/config',
    'src/main/python/apache/thermos/core',
  ],
)

# Binary target for thermos_runner.pex. Nothing should depend on this - it's only used as an
# argument to ./pants binary.
python_binary(
  name = 'thermos_runner',
  # Use entry_point, not source so the files used here are the same ones tests see.
  entry_point = 'apache.thermos.bin.thermos_runner',
  dependencies = [
    # Notice that we depend only on the single private target from this BUILD file here.
    ':_runner',
  ],
)

# The public library that everyone importing the runner symbols uses.
# The test targets and any other dependent source code should depend on this.
python_library(
  name = 'runner',
  dependencies = [
    # Again, notice that we depend only on the single private target from this BUILD file here.
    ':_runner',
  ],
  # We always provide a setup_py. This will cause any dependee libraries to automatically
  # reference this library in their requirements.txt rather than copy the source files into their
  # sdist.
  provides = setup_py(
    # Conventionally named and versioned.
    name = 'apache.thermos.runner',
    version = open(os.path.join(get_buildroot(), '.auroraversion')).read().strip().upper(),
  ).with_binaries({
    # Every binary in this file should also be repeated here.
    # Always use the dict-form of .with_binaries so that commands with dashes in their names are
    # supported.
    # The console script name is always the same as the PEX with .pex stripped.
    'thermos_runner': ':thermos_runner',
  }),
)

Thermos Test resources

The Aurora source repository and distributions contain several binary files to qualify the backwards-compatibility of thermos with checkpoint data. Since thermos persists state to disk, to be read by the thermos observer), it is important that we have tests that prevent regressions affecting the ability to parse previously-written data.

The files included represent persisted checkpoints that exercise different features of thermos. The existing files should not be modified unless we are accepting backwards incompatibility, such as with a major release.

It is not practical to write source code to generate these files on the fly, as source would be vulnerable to drift (e.g. due to refactoring) in ways that would undermine the goal of ensuring backwards compatibility.

The most common reason to add a new checkpoint file would be to provide coverage for new thermos features that alter the data format. This is accomplished by writing and running a job configuration that exercises the feature, and copying the checkpoint file from the sandbox directory, by default this is /var/run/thermos/checkpoints/<aurora task id>.