Pytesting your python package

Caleb Scheidel

Posted on
package development pytest unit testing

Most software developers understand the advantages of packaging up their code: it makes their functions testable, reliable and reusable. Not to mention making their future work much easier and more efficient. Here at Methods, we have realized the benefits of building R and Python packages to bundle up and test our code for collaboration both internally and with clients.
This post will help readers in the data science community see how easy it is to get started developing and testing their own Python package using the pytest framework and GitLab CI.

Python package setup

The basic directory structure of a Python package looks something like this:

mypackage/
|
|__ mypackage/
|     |
|     |__ mypackage.py
|     |__ __init__.py
|
|__ tests/
|     |__ test.py
|
|__ setup.py

Functions and classes will go in one or multiple mypackage.py files. The __init__.py file is a special file designating that the files in that directory are part of a package. In the simplest case, __init__.py can be an empty file, but it can also execute initialization code for the package if specified.

Tests for the functions in the package will be contained in the tests directory, either entirely in one file or spread across several .py files which have test contained in the name.

The setup.py file contains details on the package, such as the description, author and license.

Unit Testing

For every function that is included in a software package, unit tests should be written. A “unit” is defined as an isolated test case that consists of the following components:

  • fixture: a function, class method, or data file
  • an action on the fixture: calling a function with a particular input
  • expected outcome: the expected return value of the function
  • actual outcome: the actual return value of the function
  • verification message: a report of whether the return value matches the expected or not

These tests can check the inputs and outputs of a function to make sure it works the way one would expect. The tests will be run every time modifications or improvements are made to the code within the functions, allowing you to be confident that the changes you made did not break any expected behavior.

pytest

One of the most popular unit testing packages in python is pytest. To install it, run:

pip install pytest

The pytest package differentiates itself from other python testing packages such as unittest and nose due to its straightforward syntax. It uses a single assert statement instead of numerous assertSomething commands found in unittest. The pytest framework crawls the subdirectory tree of your project, looking for and executing .py files that are named in the form test_*.py or *_test.py.

To run pytest, it is as simple as navigating to the top level of the package directory and running:

pytest

Example: Creating and testing a python package

Let’s run through an example of creating a python package and using pytest to test its functions via GitLab’s continuous integration services. The functions and examples used here were inspired by Jeff Knupp’s post on classes and object oriented programming and Kevin Ndung’u Gathuku’s post on pytest. The package will do some basic arithmetic using a created class Account that simulates users depositing or withdrawing money from their checking account. It will be set up as a class with two methods: deposit and withdraw.

The first thing to do is set up the directory structure of the package. It should look something like this:

pybank/
|
|__ pybank/
|     |
|     |__ bank.py
|     |__ __init__.py
|
|__ tests/
|     |__ test_bank.py
|
|__ setup.py

The contents of setup.py:

from setuptools import setup, find_packages

setup(
    name='pybank',
    version='0.1',
    packages=find_packages(exclude=['tests*']),
    license='none',
    description='An example python package with functions simulating depositing and withdrawing cash from a bank account',
    long_description=open('README.md').read(),
    install_requires=[],
    url='REPOSITORY_URL',
    author='AUTHOR_NAME',
    author_email='AUTHOR_EMAIL'
)

Now we’ll add our functions to bank.py. First create a class, InsufficientAmount, that raises a custom exception for when we try to spend an amount that is higher than the current balance in our account. Then create a class, Account, which sets the name and an initial balance for the account and creates two functions: withdraw and deposit.

# bank.py

class InsufficientAmount(Exception):
    pass

class Account(object):
    """
    A checking account at a local bank.  An account has the following properties:

    Attributes:
        name: a string representing the customer's name
        balance: A float holding the current balance of the customer's account
    """

    def __init__(self, name, balance=0):
        """Return an Account object whose customer name is (name) and starting balance is (balance)"""
        self.name = name
        self.balance = balance

    def withdraw(self, amount):
        """Return the balance in the account remaining after withdrawing (amount) dollars"""
        if self.balance < amount:
            raise InsufficientAmount('Not enough cash available to withdraw{}'.format(amount))
        self.balance -= amount
        return self.balance

    def deposit(self, amount):
        """Return the balance remaining in the account after depositing (amount) dollars."""
        self.balance += amount
        return self.balance

Once the functions are specified, we can write unit tests for those functions. We’ll write tests to ensure we get the expected output from setting or not setting an initial balance or name, depositing cash into the account, withdrawing cash from the account, and ensuring we get the insufficient amount error when we try to withdraw more than what the current balance is.

We can run almost all of these tests using the assert command. To test that the custom exception is raised, use the pytest.raises function. All of these tests will go in the test_wallet.py file.

# test_bank.py

import pytest
from pybank.bank import Account, InsufficientAmount

def test_setting_name():
    jack_account = Account(name = "Jack")
    assert jack_account.name == "Jack"

def test_default_balance():
    jack_account = Account(name = "Jack")
    assert jack_account.balance == 0

def test_setting_balance():
    jill_account = Account(name = "Jill", balance = 250)
    assert jill_account.balance == 250

def test_account_deposit():
    jill_account = Account(name = "Jill", balance = 250)
    jill_account.deposit(120)
    assert jill_account.balance == 370

def test_account_withdraw():
    jill_account = Account(name = "Jill", balance = 250)
    jill_account.withdraw(10)
    assert wallet.balance == 240

def test_account_withdraw_raises_exception_on_insufficient_amount():
    jack_account = Account(name = "Jack")
    with pytest.raises(InsufficientAmount):
        jack_account.withdraw(100)

At this point, all of these tests should pass if we run pytest from the command line at the top level of the package directory. There is a lot of repetition in these tests though. Let’s rewrite them using fixtures.

Fixtures are useful in this case for creating example account objects that can be passed in to each test, allowing us to avoid initializing an object inside each test separately, therefore reducing code repetition. Test functions that require fixtures should be written to accept them as arguments. Fixtures can be created with the @pytest.fixture decorator.

We can create a fixture for a jack_account with a balance of 0 and a jill_account with a balance of 250. It is good practice to add docstrings to describe what each fixture is.

# test_bank.py

import pytest
from pybank.bank import Account, InsufficientAmount

@pytest.fixture
def jack_account():
    '''Returns an Account instance with customer name Jack and a balance of zero'''
    return Account(name = "Jack")

@pytest.fixture
def jill_account():
    '''Returns an Account instance with customer name Jill and a balance of 250'''
    return Account(name = "Jill", balance = 250)

def test_setting_name(jack_account):
    assert jack_account.name == "Jack"
    
def test_default_balance(jack_account):
    assert jack_account.balance == 0   

def test_setting_balance(jill_account):
    assert jill_account.balance == 250

def test_account_deposit(jill_account):
    jill_account.deposit(120)
    assert jill_account.balance == 370

def test_account_withdraw(jill_account):
    jill_account.withdraw(10)
    assert jill_account.balance == 240

def test_account_withdraw_raises_exception_on_insufficient_amount(jack_account):
    with pytest.raises(InsufficientAmount):
        jack_account.withdraw(100)

The above tests are all individual. We can also write tests with various combinations of the parameters: initial balance, amount to deposit, and amount to withdraw.

pytest provides the ability to write parameterized test functions, which means you can define multiple sets of arguments and fixtures at the test function or class. You can do this by using the @pytest.mark.parametrize decorator.

# test_bank.py

import pytest
from pybank.bank import Account, InsufficientAmount

@pytest.fixture
def linus_account():
    '''Returns an Account instance with customer name Linus and balance of zero'''
    return Account(name = "Linus")

@pytest.mark.parametrize("deposit,withdrawal,expected", [
    (2500, 800, 1700),
    (950, 75, 875),
])
def test_transactions(linus_account, deposit, withdrawal, expected):
    linus_account.deposit(deposit)
    linus_account.withdraw(withdrawal)
    assert linus_account.balance == expected

The test_transactions function is run once for each set of parameters specified in the decorator, e.g. the test will run the first time with deposit = 2500, withdrawal = 800, and expected = 1700. The test function will then run through the second set of parameters.

Adding extra/test data to the package

In this example, it is not necessary to add a data file to test our functions with, although some packages may need to contain data to use for tests or examples. If this is the case for your package, you will also need to write a MANIFEST.in file at the top level of the project, and put the data files in their own directory under the package directory containing the __init__.py file. The structure would look like this:

pybank/
|
|__ pybank/
|     |
|     |__ data/
|     |     |__ example_data.csv
|     |
|     |__ bank.py
|     |__ __init__.py
|
|__ tests/
|     |__ test_mypackage.py
|
|__ setup.py
|
|__ MANIFEST.in

For each data file, include a line in MANIFEST.in that gives the path to the file you want to include. In this example, we would add this:

include pybank/data/example_data.csv

All of the files listed in MANIFEST.in will be included with the package when installed.

Sharing your package

The goal is to be able to share the package you create with others. We can do this by hosting the package on GitHub or GitLab. This example is hosted on GitLab, taking advantage of its built-in continuous integration services to automatically run the tests we have written every time new code is pushed. This is specified in the .gitlab-ci.yml file. A README.md file should also be included to describe the package and instruct users on how to install it.

The final directory structure of the package:

pybank/
|
|__ pybank/
|     |
|     |__ data/
|     |     |__ example_data.csv
|     |
|     |__ bank.py
|     |__ __init__.py
|
|__ tests/
|     |__ test_bank.py
|
|__ setup.py
|
|__ MANIFEST.in
|
|__ README.md
|
|__ .gitlab-ci.yml

The .gitlab-ci.yml file will orchestrate running the tests each time a code change is pushed to GitLab. It uses the python:latest Docker image to run the script we specify. For this example, all we need to do is install pytest to the Docker container and run the pytest command in the top-level directory of the package:

image: python:latest

test:
  stage: test
  script:
  - pip install pytest
  - python -m pytest -v

The -v or --verbose flag is specified to print more verbose output from the tests to the logs. This will help us see what individual tests are passing or failing.

You can see this full example on GitLab here. The output from pytest can be viewed in the continuous integration output on the CI/CD tab. A green check mark is visible, signaling that all of the tests we have written have passed!