Organizing code for a Python project

A well structured project is easy to navigate and make changes and improvements to. It’s also more likely to be used by other people – and that includes you a few weeks from now!

Organization basics

We want to write a Python program that draws triangles:

_images/triangle.png

We use the the Polygon class of the matplotlib library and write a script called draw_triangles.py to do this:

draw_triangles.py
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.set_xlabel('x')
ax.set_ylabel('y')

patch = plt.Polygon([
    (0.2, 0.2),
    (0.2, 0.6),
    (0.4, 0.4)
])

ax.add_patch(patch)

ax.text(0.2, 0.4, '(0.2, 0.4)')
ax.text(0.2, 0.6, '(0.2, 0.6)')
ax.text(0.2, 0.4, '(0.2, 0.4)')

patch = plt.Polygon([
    (0.6, 0.8),
    (0.8, 0.8),
    (0.5, 0.5)
])

ax.add_patch(patch)

ax.text(0.6, 0.8, '(0.6, 0.8)')
ax.text(0.8, 0.8, '(0.8, 0.8)')
ax.text(0.5, 0.5, '(0.5, 0.5)')

patch = plt.Polygon([
    (0.6, 0.1),
    (0.7, 0.3),
    (0.9, 0.2)
])

ax.add_patch(patch)

ax.text(0.6, 0.1, '(0.6, 0.1)')
ax.text(0.7, 0.3, '(0.7, 0.3)')
ax.text(0.9, 0.2, '(0.9, 0.2)')

plt.show()

Do you think this is a good way to organize the code? What do you think could be improved in the script draw_triangles.py?

Functions

Functions facilitate code reuse. Whenever you see yourself typing the same code twice in the same program or project, it is a clear indication that the code belongs in a function.

A good function:

  • has a descriptive name. draw_triangle is a better name than plot.
  • is small – no more than a couple of dozen lines – and does one thing. If a function is doing too much, then it should probably be broken into smaller functions.
  • can be easily tested – more on this soon.
  • is well documented – more on this later.

In the script draw_triangles.py above, it would be a good idea to define a function called draw_triangle that draws a single triangle, and re-use this function every time we need to draw a triangle:

draw_triangles.py
import matplotlib.pyplot as plt

def draw_triangle(points, ax=None):
    if ax is None:
        ax = plt.gca()
    else:
        fig, ax = plt.subplots()
        ax.set_xlabel('x')
        ax.set_ylabel('y')

    patch = plt.Polygon(points)
    ax.add_patch(patch)

    for pt in points:
        x, y = pt
        ax.text(x, y, '({}, {})'.format(x, y))

draw_triangle([
    (0.2, 0.2),
    (0.2, 0.6),
    (0.4, 0.4)
])

draw_triangle([
    (0.6, 0.8),
    (0.8, 0.8),
    (0.5, 0.5)
])

draw_triangle([
    (0.6, 0.1),
    (0.7, 0.3),
    (0.9, 0.2)
])

plt.show()

Python scripts and modules

A module is a file containing a collection of Python definitions and statements, typically named with a .py suffix.

A script is a module that is intended to be run by the Python interpreter. For example, the script draw_triangles.py can be run from the command-line using the command:

$ python draw_triangles.py

If you are using an Integrated Development Environment like Spyder or PyCharm, then the script can be run by opening it in the IDE and clicking on the “Run” button.

Modules, or specific functions from a module can be imported using the import statement:

import draw_triangles
from draw_triangles import draw_triangle

When a module is imported, all the statements in the module are executed by the Python interpreter. This happens only the first time the module is imported.

It is sometimes useful to have both importable functions as well as executable statements in a single module. When importing functions from this module, it is possible to avoid running other code by placing it under if __name__ == "__main__":

draw_triangles.py
import matplotlib.pyplot as plt

def draw_triangle(points, ax=None):
    if ax is None:
        ax = plt.gca()
    else:
        fig, ax = plt.subplots()
        ax.set_xlabel('x')
        ax.set_ylabel('y')

    patch = plt.Polygon(points)
    ax.add_patch(patch)

    for pt in points:
        x, y = pt
        ax.text(x, y, '({}, {})'.format(x, y))

if __name__ == "__main__":

    draw_triangle([
        (0.2, 0.2),
        (0.2, 0.6),
        (0.4, 0.4)
    ])

    draw_triangle([
        (0.6, 0.8),
        (0.8, 0.8),
        (0.5, 0.5)
    ])

    draw_triangle([
        (0.6, 0.1),
        (0.7, 0.3),
        (0.9, 0.2)
    ])

    plt.show()

When another module imports the module draw_triangles above, the code under if __name__ == "__main__" is not executed.

How to structure a Python project?

Let us now imagine we had a lot more code; for example, a collection of functions for:

  • plotting shapes (like draw_triangle above)
  • calculating areas
  • geometric transformations

What are the different ways to organize code for a Python project that is more than a handful of lines long?

A single module

geometry
└── draw_triangles.py

One way to organize your code is to put all of it in a single .py file (module) like draw_triangles.py above.

Multiple modules

For a small number of functions the approach above is fine, and even recommended, but as the size and/or scope of the project grows, it may be necessary to divide up code into different modules, each containing related data and functionality.

geometry
├── draw_triangles.py
└── graphics.py
graphics.py
import matplotlib.pyplot as plt

def draw_triangle(points, ax=None):
    if ax is None:
        ax = plt.gca()
    else:
        fig, ax = plt.subplots()
        ax.set_xlabel('x')
        ax.set_ylabel('y')

    patch = plt.Polygon(points)
    ax.add_patch(patch)

    for pt in points:
        x, y = pt
        ax.text(x, y, '({}, {})'.format(x, y))

Typically, the “top-level” executable code is put in a separate script which imports functions and data from other modules:

draw_triangles.py
import graphics

graphics.draw_triangle([
    (0.2, 0.2),
    (0.2, 0.6),
    (0.4, 0.4)
])

graphics.draw_triangle([
    (0.6, 0.8),
    (0.8, 0.8),
    (0.5, 0.5)
])

graphics.draw_triangle([
    (0.6, 0.1),
    (0.7, 0.3),
    (0.9, 0.2)
])

Packages

A Python package is a directory containing a file called __init__.py, which can be empty. Packages can contain modules as well as other packages (sometimes referred to as sub-packages).

For example, geometry below is a package, containing various modules:

draw_triangles.py
geometry
├── graphics.py
└── __init__.py

A module from the package can be imported using the “dot” notation:

import geometry.graphics
geometry.graphics.draw_triangle(args)

It’s also possible to import a specific function from the module:

from geometry.graphics import draw_triangle
draw_triangle(args)

Packages can themselves be imported, which really just imports the __init__.py module.

import geometry

If __init__.py is empty, there is “nothing” in the imported geometry package, and the following line gives an error:

geometry.graphics.draw_triangle(args)
AttributeError: module 'geometry' has no attribute 'graphics'

Importing from anywhere

sys.path

To improve their reusability, you typically want to be able to import your modules and packages from anywhere, i.e., from any directory on your computer.

One way to do this is to use sys.path:

import sys
sys.path.append('/path/to/geometry')

import graphics

sys.path is a list of directories that Python looks for modules and packages in when you import them.

Installable projects

A better way is to make your project “installable” using setuptools. To do this, you will need to include a setup.py with your project. Your project should be organized as follows:

draw_triangles.py
geometry
├── graphics.py
└── __init__.py
setup.py

A minimal setup.py can include the following

setup.py
from setuptools import setup

setup(name='geometry',
   version='0.1',
   author='Ashwin Srinath',
   packages=['geometry'])

You can install the package using pip with the following command (run from the same directory as setup.py):

$ pip install -e . --user

This installs the package in editable mode, creating a link to it in the user’s site-packages directory, which happens to already be in sys.path.

Once your project is installed, you don’t need to worry about adding it manually to sys.path each time you need to use it.

It’s also easy to uninstall a package; just run the following command from the same directory as setup.py:

$ pip uninstall .