Creating Python modules and applications

We relied heavily on modules in the Python library. Additionally, we added several packages, including Pillow and BeautifulSoup. The question should arise, can we create our own module?

The answer is, of course, yes. A Python module is simply a file. It turns out that each example script has been a module. We can look a little more deeply at how we can make our own modules of reusable programming. When we look at Python programs, we observe three kinds of files:

  • Library modules that are purely definitional
  • Application modules that do the real work of our applications
  • Hybrid modules that are both applications and can be used as libraries

The essential ingredient of creating a Python module is separating the real work of the top-level script from the various definitions that support this real work. All our examples of definitions have been functions created with the def statement. The other import examples of definitions are class definitions, which we'll discuss in the following sections.

Creating and using a module

To create a module of only definitions, we simply put all the function and class definitions into a file. We have to give the file a name that is an acceptable Python variable name. This means that filenames should look like Python variables; letters, digits, and _ are perfectly legal. Characters that Python uses as an operator (+, -, /, and so on) may be allowed by our OS for a filename, but these characters cannot be used to name a module file.

The file name must end in .py. This is not part of the module name; it's for the benefit of the operating system.

We might collect our statistics functions into a file named stats.py. This file defines a module named stats.

We can import the entire suite of functions or individual functions, or we can import the module as a whole. Use the following code:

>>> from stats import *

By using this, we import all the functions (and classes) defined in the stats module. We can simply use names such as mean( some_list ).

Consider we use this:

>>> from stats import mean, median

We imported two specific functions from the stats module. We ignored any other definition that might be available in that module.

We can also use this:

>>> import stats

This will import the module, but it won't put any of the names into the global namespace that we usually work with. All the names in the stats module must be accessed with a qualified name, such as stats.mean( some_list ). In very complex scripts, the use of qualified names helps clarify where a particular function or class was defined.

Creating an application module

The simplest way to create an application with a command-line interface (CLI) is to write a file and run it from the command line. Consider the following example:

python3 basic_stats.py

When we enter this in the terminal window or command window, we use the OS python3 command and provide a filename. In Windows, the name python.exe is sometimes used for Python 3, so the command may be python basic_stats.py. In most other OSes, there will often be both the python3 and python3.3 commands. On Mac OS X, the python command may refer to the old Python2.7 that is part of Mac OS X.

We can determine the difference by using the python -V command to see what version is bound to the name python.

As noted previously, we want to separate our definitions into one file, and then put the real work in another file. When we look inside basic_stats.py, we might find this:

"""Chapter 5 example 2.

Import stats library functions from ch_5_ex_1 module.
Import data acquisition from ch_5_ex_1 module.
Compute some simple descriptive statistics.
"""
from ch_5_ex_1 import mean, mode, median
from ch_5_ex_1 import get_deaths, get_cheese

year_deaths = list( get_deaths() )
years = list( year for year, death in year_deaths )
deaths= list( death for year, death in year_deaths )
print( "Year Range", min(years), "to", max(years) )
print( "Average Deaths {:.2f}".format( mean( deaths ) ) )

year_cheese= get_cheese()

print( "Average Cheese Consumption", 
    mean( [cheese for year, cheese in year_cheese] ) )

The file starts with a triple-quoted string that—like the docstring for a function—is the docstring for a module. We imported some functions from another module.

Then, we completed some processing using the functions that we imported. This is a common structure for a simple command-line module.

We can also run this via the command python3 -m basic_stats. This will use Python's internal search path to locate the module, and then run that module. Running a module is subtly different from running a file, but the net effect is the same; the file produces the output we designed via the print() statements. For details on how the -m option works, consult the documentation for the runpy module.

Creating a hybrid module

There are two significant improvements we can make to the basic_stats.py module shown previously:

  • First, we put all the processing into a function definition. We call it analyze_cheese_deaths.
  • The second is the addition of an if statement to determine the context in which the module is being used.

Here's the more sophisticated version of basic_stats.py:

"""Chapter 5 example 3.

Import stats library functions from ch_5_ex_1 module.
Import data acquisition from ch_5_ex_1 module.
Compute some simple descriptive statistics.
"""
from ch_5_ex_1 import mean, mode, median
from ch_5_ex_1 import get_deaths, get_cheese

def analyze_cheese_deaths():

    year_deaths = list( get_deaths() )
    years = list( year for year, death in year_deaths )
    deaths= list( death for year, death in year_deaths )
    print( "Year Range", min(years), "to", max(years) )
    print( "Average Deaths {:.2f}".format( mean( deaths ) ) )

    year_cheese= get_cheese()
    print( "Average Cheese Consumption", 
        mean( [cheese for year, cheese in year_cheese] ) )

if __name__ == "__main__":
    analyze_cheese_deaths()

Creating a function definition to encapsulate the real work gives us a way to extend or reuse this script. We can reuse a function definition (via import) more easily than we can reuse a top-level script.

The __name__ variable is a global that Python sets to show the processing context. The top-level module—the one named on the command line—has the __name__ variable set to __main__. All other module imports have the __name__ variable set to the module name.

Yes, the global variable, __name__, has double-underscores before and after. This marks it as part of the machinery of Python. Similarly, the string value for the main module name, __main__, involves double underscores.

This technique allows us to create a module that can be run as a command and also be imported to provide definitions. The idea is to promote reusable programming. Each time we set out to solve a problem, we don't need to reinvent the wheel and other related technology. We should import prior work and build on that.