Chapter 8. Code Management and Improvement

This chapter covers libraries used to manage or simplify the development and build process, system integration, server management, and performance optimization.

Continuous Integration

Nobody describes continuous integration better than Martin Fowler:¹

Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily—leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.

The three most popular tools for CI right now are Travis-CI, Jenkins, and Buildbot—which are all listed in the following sections. They are frequently used with Tox, a Python tool to manage virtualenv and tests from the command line. Travis is for multiple Python interpreters on a single platform, and Jenkins (most popular) and Buildbot (written in Python) can manage builds on multiple machines. Many also use Buildout (discussed in “Buildout”) and Docker (discussed in “Docker”) to rapidly and repeatably build complex environments for their test battery.

Tox

Tox is an automation tool providing packaging, testing, and deployment of Python software right from the console or CI server. It is a generic virtualenv management and test command-line tool that provides the following features:

Checks that packages install correctly with different Python versions and interpreters
Runs tests in each of the environments, configuring your test tool of choice
Acts as a frontend to continuous integration servers, reducing boilerplate and merging CI and shell-based testing

Install it using pip:

$ pip install tox

System Administration

The tools in this section are for managing and monitoring systems—server automation, system monitoring, and workflow management.

Travis-CI

Travis-CI is a distributed CI server which builds tests for open source projects for free. It provides multiple workers that run Python tests and seamlessly integrates with GitHub. You can even have it comment on your pull requests² whether this particular set of changes breaks the build or not. So if you are hosting your code on GitHub, Travis-CI is a great and easy way to get started with continuous integration. Travis-CI can build your code on a virtual machine that is running Linux, OS X, or iOS.

To get started, add a .travis.yml file to your repository with this example content:

language: python
python:
  - "2.6"
  - "2.7"
  - "3.3"
  - "3.4"
script: python tests/test_all_of_the_units.py
branches:
  only:
    - master

This will get your project tested on all the listed Python versions by running the given script and will only build the master branch. There are a lot more options you can enable, like notifications, before and after steps, and much more. The Travis-CI docs explain all of these options and are very thorough. To use Tox with Travis-CI, add a Tox script to your repository, and change the line with script: in it to become:

install:
  - pip install tox
script:
  - tox

In order to activate testing for your project, go to the Travis-CI site and log in with your GitHub account. Then activate your project in your profile settings and you’re ready to go. From now on, your project’s tests will be run on every push to GitHub.

Jenkins

Jenkins CI is an extensible continuous integration engine and currently the most popular CI engine. It works on Windows, Linux, and OS X and plugs in to “every Source Code Management (SCM) tool that exists.” Jenkins is a Java servlet (the Java equivalent of a Python WSGI application) that ships with its own servlet container, so you can run it directly using java --jar jenkins.war. For more information, refer to the Jenkins installation instructions; the Ubuntu page has instructions for how to place Jenkins behind an Apache or Nginx reverse proxy.

You interact with Jenkins via a web-based dashboard, or its HTTP-based RESTful API³ (e.g., at http://myServer:8080/api), meaning we can use HTTP to communicate with the Jenkins server from remote machines. For examples, look at Apache’s Jenkins Dashboard or the Pylons project’s Jenkins Dashboard.

The most frequently used Python tool to interact with the Jenkins API is python-jenkins, created by the OpenStack⁴ infrastructure team. Most Python users configure Jenkins to run a Tox script as part of the build process. For more information, see the documentation for using Tox with Jenkins and this guide to set up Jenkins with multiple build machines.

Buildbot

Buildbot is a Python system to automate the compile/test cycle to validate code changes. It works like Jenkins in that it polls your source control manager for changes, builds and test your code on multiple computers according to your instructions (with built-in support for Tox), and then tells you what happened. It runs behind a Twisted web server. For an example of what the web interface looks like, here is Chromium’s public buildbot dashboard (Chromium powers the Chrome browser).

Because Buildbot is pure Python, it’s installed via pip:

$ pip install buildbot

The 0.9 version has a REST API, but it is still in beta, so you won’t be able to use it unless you expressly specify the version number (e.g., pip install buildbot==0.9.00.9.0rc1). Buildbot has a reputation for being the most powerful, but also the most complex of the continuous integration tools. To get started, follow their excellent tutorial.

Server Automation

Salt, Ansible, Puppet, Chef, and CFEngine are server automation tools that provide an elegant way for system administrators to manage their fleet of physical and virtual machines. They all can manage Linux, Unix-like systems, and Windows machines. We’re of course partial to Salt and Ansible, as they’re written in Python. But they’re still new, and the other options are more widely used. The following sections provide a quick summary of these options.

Note

For the record, folks at Docker say that they expect system automation tools like Salt, Ansible, and the rest to be complemented by, and not replaced by Docker—see this post about how Docker fits into the rest of DevOps.

Salt

Salt calls its master node the master and its agent nodes minions, or minion hosts. Its main design goal is speed—networking by default is done using ZeroMQ, with TCP connections between the master and its “minions,” and members of the Salt team have even written their own (optional) transmission protocol, RAET, which is faster than TCP and not as lossy as UDP.

Salt supports Python versions 2.6 and 2.7 and can be installed via pip:

$ pip install salt  # No Python 3 yet ...

After configuring a master server and any number of minion hosts, we can run arbitrary shell commands or use prebuilt modules of complex commands on our minions. The following command lists all available minion hosts, using ping in salt’s test module:

$ salt '*' test.ping

You can filter minion hosts by either matching the minion ID, or by using the grains system, which uses static host information like the operating system version or the CPU architecture to provide a host taxonomy for the Salt modules. For example, the following command uses the grains system to list only the available minions running CentOS:

$ salt -G 'os:CentOS' test.ping

Salt also provides a state system. States can be used to configure the minion hosts. For example, when a minion host is ordered to read the following state file, it will install and start the Apache server:

apache:
  pkg:
    - installed
  service:
    - running
    - enable: True
    - require:
      - pkg: apache

State files can be written using YAML, augmented by the Jinja2 template system, or can be pure Python modules. For more information, see the Salt documentation.

Ansible

The biggest advantage of Ansible over the other system automation tools is that it does not require anything (except Python) to be permanently installed on client machines. All of the other options⁵ keep daemons running on the clients to poll the master. Their configuration files are in the YAML format. Playbooks are Ansible’s configuration, deployment, and orchestration documents, and are written in YAML with Jinja2 for templating. Ansible supports Python versions 2.6 and 2.7 and can be installed via pip:

$ pip install ansible  # No Python 3 yet...

Ansible requires an inventory file that describes the hosts to which it has access. The following code is an example of a host and playbook that will ping all the hosts in the inventory file. Here is an example inventory file (hosts.yml):

[server_name]
127.0.0.1

Here is an example playbook (ping.yml):

---
- hosts: all

  tasks:
    - name: ping
      action: ping

To run the playbook:

$ ansible-playbook ping.yml -i hosts.yml --ask-pass

The Ansible playbook will ping all of the servers in the hosts.yml file. You can also select groups of servers using Ansible. For more information about Ansible, read the Ansible documentation. The Servers for Hackers Ansible tutorial is also a great and detailed introduction.

Puppet

Puppet is written in Ruby and provides its own language—PuppetScript—for configuration. It has a designated server, the Puppet Master, that’s responsible for orchestrating its Agent nodes. Modules are small, shareable units of code written to automate or define the state of a system. Puppet Forge is a repository for modules written by the community for Open Source Puppet and Puppet Enterprise.

Agent nodes send basic facts about the system (e.g., the operating system, kernel, architecture, IP address, and hostname) to the Puppet Master. The Puppet Master then compiles a catalog with information provided by the agents on how each node should be configured and sends it to the agent. The agent enforces the change as prescribed in the catalog and sends a report back to the Puppet Master.

Facter (yes, spelled with an “-er”) is an interesting tool that ships with Puppet and pulls basic facts about the system. These facts can be referenced as a variable while writing your Puppet modules:

$ facter kernel
Linux
$
$ facter operatingsystem
Ubuntu

Writing Modules in Puppet is pretty straightforward: Puppet Manifests (files with the extension *.pp) together form Puppet Modules. Here is an example of Hello World in Puppet:

notify { 'Hello World, this message is getting logged into the agent node':

    #As nothing is specified in the body, the resource title
    #is the notification message by default.
}

Here is another example, with system-based logic. To reference other facts, prepend a $ sign to the variable name—for instance, $hostname, or in this case, $operatingsystem:

notify{ 'Mac Warning':
    message => $operatingsystem ? {
        'Darwin' => 'This seems to be a Mac.',
        default  => 'I am a PC.',
    },
}

There are several resource types for Puppet, but the package-file-service paradigm is all you need for for the majority of the configuration management. The following Puppet code makes sure that the OpenSSH-Server package is installed in a system and the sshd service (the SSH server daemon) is notified to restart every time the sshd configuration file is changed:

package { 'openssh-server':
    ensure => installed,
}

file { '/etc/ssh/sshd_config':
    source   => 'puppet:///modules/sshd/sshd_config',
    owner    => 'root',
    group    => 'root',
    mode     => '640',
    notify   =>  Service['sshd'], # sshd will restart
                                  # whenever you edit this
                                  # file
    require  => Package['openssh-server'],

}

service { 'sshd':
    ensure    => running,
    enable    => true,
    hasstatus => true,
    hasrestart=> true,
}

For more information, refer to the Puppet Labs documentation.

Chef

If Chef is your choice for configuration management, you will primarily use Ruby to write your infrastructure code. Chef is similar to Puppet, but designed with the opposite philosophy: Puppet provides a framework that simplifies things at the expense of flexibility, while Chef provides nearly no framework—its goal is to be very extensible, and so it is more difficult to use.

Chef clients run on every node in your infrastructure and regularly check with your Chef server to ensure your system is always aligned and represents the desired state. Each individual Chef client configures itself. This distributed approach makes Chef a scalable automation platform.

Chef works by using custom recipes (configuration elements), implemented in cookbooks. Cookbooks, which are basically packages for infrastructure choices, are usually stored in your Chef server. Read DigitalOcean’s tutorial series on Chef to learn how to create a simple Chef server.

Use the knife command to create a simple cookbook:

$ knife cookbook create cookbook_name

Andy Gale’s “Getting started with Chef” is a good starting point for Chef beginners. Many community cookbooks can be found on the Chef Supermarket—they’re a good starting point for your own cookbooks. For more information, check out the full Chef documentation.

CFEngine

CFEngine has a tiny footprint because it’s written in C. Its main design goal is robustness to failure, accomplished via autonomous agents operating in a distributed network (as opposed to a master/client architecture) that communicate using Promise Theory. If you want a headless architecture, try this system.

System and Task Monitoring

The following libraries all help system administrators monitor running jobs but have very different applications: Psutil provides information in Python that can be obtained by Unix utility functions, Fabric makes it easy to define and execute commands on a list of remote hosts via SSH, and Luigi makes it possible to schedule and monitor long-running batch processes like chained Hadoop commands.

Psutil

Psutil is a cross-platform (including Windows) interface to different system information (e.g., CPU, memory, disks, network, users, and processes)—it makes accessible within Python information that many of us are accustomed to obtaining via Unix commands such as top, ps, df, and netstat. Get it using pip:

$ pip install psutil

Here is an example that monitors for server overload (if any of the tests—net, CPU—fail, it will send an email):

# Functions to get system values:
from psutil import cpu_percent, net_io_counters
# Functions to take a break:
from time import sleep
# Package for email services:
import smtplib
import string

MAX_NET_USAGE = 400000
MAX_ATTACKS = 4
attack = 0
counter = 0
while attack <= MAX_ATTACKS:
    sleep(4)
    counter = counter + 1
    # Check the CPU usage
    if cpu_percent(interval = 1) > 70:
        attack = attack + 1
    # Check the net usage
    neti1 = net_io_counters()[1]
    neto1 = net_io_counters()[0]
    sleep(1)
    neti2 = net_io_counters()[1]
    neto2 = net_io_counters()[0]
    # Calculate the bytes per second
    net = ((neti2+neto2) - (neti1+neto1))/2
    if net > MAX_NET_USAGE:
        attack = attack + 1
    if counter > 25:
        attack = 0
        counter = 0

# Write a very important email if attack is higher than 4
TO = "you@your_email.com"
FROM = "webmaster@your_domain.com"
SUBJECT = "Your domain is out of system resources!"
text = "Go and fix your server!"
BODY = string.join(
        ("From: %s" %FROM,"To: %s" %TO,"Subject: %s" %SUBJECT, "",text), "\r\n")
server = smtplib.SMTP('127.0.0.1')
server.sendmail(FROM, [TO], BODY)
server.quit()

For a good example use of Psutil, see glances, a full terminal application that behaves like a widely extended top (which lists running process by CPU use or a user-specified sort order), with the ability of a client-server monitoring tool.

Fabric

Fabric is a library for simplifying system administration tasks. It allows you to SSH to multiple hosts and execute tasks on each one. This is convenient for system administration or application deployment. Use pip to install Fabric:

$ pip install fabric

Here is a complete Python module defining two Fabric tasks—memory_usage and deploy:

# fabfile.py
from fabric.api import cd, env, prefix, run, task

env.hosts = ['my_server1', 'my_server2']   # Where to SSH

@task
def memory_usage():
    run('free -m')

@task
def deploy():
    with cd('/var/www/project-env/project'):
        with prefix('. ../bin/activate'):
            run('git pull')
            run('touch app.wsgi')

The with statement just nests the commands in so that in the end deploy() becomes this for each host:

$ ssh hostname cd /var/ww/project-env/project && ../bin/activate && git pull
$ ssh hostname cd /var/ww/project-env/project && ../bin/activate && \
> touch app.wsgi

With the previous code saved in a file named fabfile.py (the default module name fab looks for), we can check memory usage with our new memory_usage task:

$ fab memory_usage
[my_server1] Executing task 'memory'
[my_server1] run: free -m
[my_server1] out:              total     used     free   shared  buffers   cached
[my_server1] out: Mem:          6964     1897     5067        0      166      222
[my_server1] out: -/+ buffers/cache:     1509     5455
[my_server1] out: Swap:            0        0        0

[my_server2] Executing task 'memory'
[my_server2] run: free -m
[my_server2] out:              total     used     free   shared  buffers   cached
[my_server2] out: Mem:          1666      902      764        0      180      572
[my_server2] out: -/+ buffers/cache:      148     1517
[my_server2] out: Swap:          895        1      894

and we can deploy with:

$ fab deploy

Additional features include parallel execution, interaction with remote programs, and host grouping. The examples in the Fabric documentation are easy to follow.

Luigi

Luigi is a pipeline management tool developed and released by Spotify. It helps developers manage the entire pipeline of large, long-running batch jobs, stitching together things such as Hive queries, database queries, Hadoop Java jobs, pySpark jobs, and any tasks you want to write yourself. They don’t all have to be big data applications—the API allows you to schedule anything. But Spotify made it to run their jobs over Hadoop, so they provide all of these utilities already in luigi.contrib. Install it with pip:

$ pip install luigi

It includes a web interface, so users can filter for their tasks and view dependency graphs of the pipeline workflow and its progress. There are example Luigi tasks in their GitHub repository , or see the Luigi documentation.

Speed

This chapter lists the Python community’s most common approaches to speed optimization. Table 8-1 shows your optimization options, after you’ve done the simple things like profiling your code and comparing options for code snippets to first get all of the performance you can directly from Python.

You may have already heard of the global interpreter lock (GIL)—it is how the C implementation of Python allows multiple threads to operate at the same time. Python’s memory management isn’t entirely thread-safe, so the GIL is required to prevent multiple threads from running the same Python code at once.

The GIL is often cited as a limitation of Python, but it’s not really as big of a deal as it’s made out to be—it’s only a hindrance when processes are CPU bound (in which case, like with NumPy or the cryptography libraries discussed soon, the code is rewritten in C and exposed with Python bindings). For anything else (like network I/O or file I/O), the bottleneck is the code blocking in a single thread while waiting for the I/O. You can solve blocking problems using threads or event-driven programming.

We should also note that in Python 2, there were slower and faster versions of libraries—StringIO and cStringIO, ElementTree and cElementTree. The C implementations are faster, but had to be imported explicitly. Since Python 3.3, the regular versions import from the faster implementation whenever possible, and the C-prefixed libraries are deprecated.

Table 8-1. Speed options
Option	License	Reasons to use
Threading	PSFL	Allows you to create multiple execution threads. Threading (when using CPython, because of the GIL) does not use multiple processes; the different threads switch when one is blocking, which is useful when your bottleneck is some blocking task, like wating on I/O. There is no GIL in some other implementations of Python, like Jython and IronPython.
Multiprocessing/subprocess	PSFL	Tools in the multiprocessing library allow you to actually spawn other Python processes, bypassing the GIL. And subprocess allows you to launch multiple command-line processes.
PyPy	MIT license	It’s a Python interpreter (Python 2.7.10 or 3.2.5 right now) that provides just-in-time compilation to C when possible. Effortless: no coding necessary, and it usually gives a good boost. It’s a drop-in replacement for CPython that usually works—any C libraries should use the CFFI, or be on the PyPy compatibility list.
Cython	Apache license	It provides two ways to statically compile Python code: the first choice is to use an annotation language, Cython (*.pxd). The second choice is to statically compile pure Python and use Cython’s provided decorators to specify object type.
Numba	BSD license	It provides both a static (via its `pycc` tool) or a just-in-time runtime compiler to machine code that uses NumPy arrays. It requires Python 2.7 or 3.4+, the llvmlite library, and its dependency, the LLVM (Low-Level Virtual Machine) compiler infrastructure.
Weave	BSD license	It provides a way to “weave” a few lines of C into Python, but only use it if you’re already using Weave. Otherwise, use Cython—Weave is now deprecated.
PyCUDA/gnumpy/TensorFlow/Theano/PyOpenCL	MIT/modified BSD/BSD/BSD/MIT	These libraries provide different ways to use a NVIDIA GPU, provided you have one installed, and can install NVIDIA’s CUDA toolchain. PyOpenCL can use processors other than NVIDIA’s use other processors They each have a different application—for example, gnumpy is intended to be a drop-in replacement for NumPy.
Direct use of C/C++ libraries	—	The speed improvement is worth the extra time you’ll need to spend coding in C/C++.

Jeff Knupp, author of Writing Idiomatic Python, wrote a blog post about getting around the GIL, citing David Beazley’s deep look⁶ into the subject.

Threading and the other optimization options in Table 8-1 are discussed in more detail in the following sections.

Threading

Python’s threading library allows you to create multiple threads. Because of the GIL (at least in CPython), there will only be one Python process running per Python interpreter, meaning there will only be a performance gain when at least one thread is blocking (e.g., on I/O). The other option for I/O is to use event handling. For that, see the paragraphs on asyncio in “Performance networking tools in Python’s Standard Library”.

What happens in Python when you have multiple threads is the kernel notices that one thread is blocking on I/O, and it switches to allow the next thread to use the processor until it blocks or is finished. All of this happens automatically when you start your threads. There’s a good example use of threading on Stack Overflow, and the Python Module of the Week series has a great threading introduction. Or see the threading documentation in the Standard Library.

Multiprocessing

The multiprocessing module in Python’s Standard Library provides a way to bypass the GIL—by launching additional Python interpreters. The separate processes can communicate using a multiprocessing.Pipe, or by a multiprocessing.Queue, or share memory via a multiprocessing.Array and multiprocessing.Value, which implement locking automatically. Share data sparingly; these objects implement locking to prevent simultaneous access by different processes.

Here’s an example to show that the speed gain from using a pool of worker processes isn’t always proportional to the number of workers used. There’s a trade-off between the computational time saved and the time it takes to launch another interpeter. The example uses the Monte Carlo method (of drawing random numbers) to estimate the value of Pi:⁷

>>> import multiprocessing
>>> import random
>>> import timeit
>>>
>>> def calculate_pi(iterations):
...     x = (random.random() for i in range(iterations))
...     y = (random.random() for i in range(iterations))
...     r_squared = [xi**2 + yi**2 for xi, yi in zip(x, y)]
...     percent_coverage = sum([r <= 1 for r in r_squared]) / len(r_squared)
...     return 4 * percent_coverage
...
>>>
>>> def run_pool(processes, total_iterations):
...     with multiprocessing.Pool(processes) as pool:  
...         # Divide the total iterations among the processes.
...         iterations = [total_iterations // processes] * processes  
...         result = pool.map(calculate_pi, iterations)  
...     print( "%0.4f" % (sum(result) / processes), end=',  ')
...
>>>
>>> ten_million = 10000000        
>>> timeit.timeit(lambda: run_pool(1, ten_million), number=10)
3.141,  3.142,  3.142,  3.141,  3.141,  3.142,  3.141,  3.141,  3.142,  3.142,
134.48382110201055 
>>>                                
>>> timeit.timeit(lambda: run_pool(10, ten_million), number=10)
3.142,  3.142,  3.142,  3.142,  3.142,  3.142,  3.141,  3.142,  3.142,  3.141,
74.38514468498761

: Using the multiprocessing.Pool within a context manager reinforces that the pool should only be used by the process that creates it.
: The total iterations will always be the same; they’ll just be divided between a different number of processes.
: pool.map() creates the multiple processes—one per item in the iterations list, up to the maximum number stated when the pool was initialized (in multiprocessing.Pool(processes)).
: There is only one process for the first timeit trial.
: 10 repetitions of one single process running with 10 million iterations took 134 seconds.
: There are 10 processes for the second timeit trial.
: 10 repetitions of 10 processes each running with one million iterations took 74 seconds.

The point of all this was that there is overhead in making the multiple processes, but the tools for runing multiple processes in Python are robust and mature. See the multiprocessing documentation in the Standard Library for more information, and check out Jeff Knupp’s blog post about getting around the GIL, because it has a few paragraphs about multiprocessing.

Subprocess

The subprocess library was introduced into the Standard Library in Python 2.4 and defined in PEP 324. It launches a system call (like unzip or curl) as if called from the command line (by default, without calling the system shell), with the developer selecting what to do with the subprocess’s input and output pipes. We recommend Python 2 users get an updated version with some bugfixes from the subprocess32 package. Install it using pip:

$ pip install subprocess32

There is a great subprocess tutorial on the Python Module of the Week blog.

PyPy

PyPy is a pure-Python implementation of Python. It’s fast, and when it works, you don’t have to do anything to your code, and it just runs faster for free. You should try this option before anything else.

You can’t get it using pip, because it’s actually another implementation of Python. Scroll through the PyPy downloads page for your correct version of Python and your operating system.

Here is a slightly modified version of David Beazley’s CPU bound test code, with an added loop for multiple tests. You can see the difference between PyPy and CPython. First it’s run using the CPython:

$ # CPython
$ ./python -V
Python 2.7.1
$
$ ./python measure2.py
1.06774401665
1.45412397385
1.51485204697
1.54693889618
1.60109114647

And here is the same script, and the only thing different is the Python interpreter—it’s running with PyPy:

$ # PyPy
$ ./pypy -V
Python 2.7.1 (7773f8fc4223, Nov 18 2011, 18:47:10)
[PyPy 1.7.0 with GCC 4.4.3]
$
$ ./pypy measure2.py
0.0683999061584
0.0483210086823
0.0388588905334
0.0440690517426
0.0695300102234

So, just by downloading PyPy, it went from an average of about 1.4 seconds to around 0.05 seconds—more than 20 times faster. Sometimes your code won’t even double in speed, but other times you really do get a big boost. And with no effort outside of downloading the PyPy interpreter. If you want your C library to be compatible with PyPy, follow PyPy’s advice and use the CFFI instead of ctypes in the Standard Library.

Cython

Unfortunately, PyPy doesn’t work with all libraries that use C extensions. For those cases, Cython (pronounced “PSI-thon”—not the same as CPython, the standard C implementation of Python) implements a superset of the Python language that lets you write C and C++ modules for Python. Cython also allows you to call functions from compiled C libraries, and provides a context, nogil, that allows you to release the GIL around a section of code, provided it does not manipulate Python objects in any way. Using Cython allows you to take advantage of Python’s strong typing⁸ of variables and operations.

Here’s an example of strong typing with Cython:

def primes(int kmax):
"""Calculation of prime numbers with additional Cython keywords"""

    cdef int n, k, i
    cdef int p[1000]
    result = []
    if kmax > 1000:
        kmax = 1000
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result

This implementation of an algorithm to find prime numbers has some additional keywords compared to the next one, which is implemented in pure Python:

def primes(kmax):
"""Calculation of prime numbers in standard Python syntax"""

    p= range(1000)
    result = []
    if kmax > 1000:
        kmax = 1000
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result

Notice that in the Cython version you declare integers and integer arrays to be compiled into C types while also creating a Python list:

# Cython version

def primes(int kmax):  
    """Calculation of prime numbers with additional Cython keywords"""
    cdef int n, k, i  
    cdef int p[1000]  
    result = []

: The type is declared to be an integer.
: The upcoming variables n, k, and i are declared as integers.
: And we then have preallocated a 1000-long array of integers for p.

What is the difference? In the Cython version, you can see the declaration of the variable types and the integer array in a similar way as in standard C. For example, the addtional type declaration (of integer) in the cdef int n,k,i allows the Cython compiler to generate more efficient C code than it could without type hints. Because the syntax is incompatible with standard Python, it is not saved in *.py files—instead, Cython code is saved in *.pyx files.

What’s the difference in speed? Let’s try it!

import time
# activate pyx compiler
import pyximport  
pyximport.install()  
# primes implemented with Cython
import primesCy
# primes implemented with Python
import primes

print("Cython:")
t1 = time.time()
print primesCy.primes(500)
t2 = time.time()
print("Cython time: %s" %(t2-t1))
print("")
print("Python")
t1 = time.time()  
print(primes.primes(500))
t2 = time.time()
print("Python time: {}".format(t2-t1))

: The pyximport module allows you to import *.pyx files (e.g., primesCy.pyx) with the Cython-compiled version of the primes function.
: The pyximport.install() command allows the Python interpreter to start the Cython compiler directly to generate C-code, which is automatically compiled to a *.so C-library. Cython is then able to import this library for you in your Python code, easily and efficiently.
: With the time.time() function, you are able to compare the time between these two different calls to find 500 prime numbers. On a standard notebook (dual-core AMD E-450 1.6 GHz), the measured values are:

Cython time: 0.0054 seconds

Python time: 0.0566 seconds

And here the output of an embedded ARM BeagleBone machine:

Cython time: 0.0196 seconds

Python time: 0.3302 seconds

Numba

Numba is a NumPy-aware Python compiler (just-in-time [JIT] specializing compiler) that compiles annotated Python (and NumPy) code to LLVM (Low-Level Virtual Machine) through special decorators. Briefly, Numba uses LLVM to compile Python down to machine code that can be natively executed at runtime.

If you use Anaconda, install Numba with conda install numba; if not, install it by hand. You must already have NumPy and LLVM installed before installing Numba. Check the LLVM version you need (it’s on the PyPI page for llvmlite), and download that version from whichever place matches your OS:

LLVM builds for Windows.
LLVM builds for Debian/Ubuntu.
LLVM builds for Fedora.
For a discussion of how to build from source for other Unix systems, see “Building the Clang + LLVM compilers”.
On OS X, use brew install homebrew/versions/llvm37 (or whatever version number is now current).

Once you have LLVM and NumPy, install Numba using pip. You may need to help the installer find the llvm-config file by providing an environment variable LLVM_CONFIG with the appropriate path, like this:

$ LLVM_CONFIG=/path/to/llvm-config-3.7 pip install numba

Then, to use it in your code, just decorate your functions:

from numba import jit, int32

@jit  
def f(x):
    return x + 3

@jit(int32(int32, int32))  
def g(x, y):
    return x + y

: With no arguments, the @jit decorator does lazy compilation—deciding itself whether to optimize the function, and how.
: For eager compilation, specify types. The function will be compiled with the given specialization, and no other will be allowed—the return value and the two arguments will all have type numba.int32.

There is a nogil flag that can allow code to ignore the Global Interpreter Lock, and a module numba.pycc that can be used to compile the code ahead of time. For more information, see Numba’s user manual.

GPU libraries

Numba can optionally be built with capacity to run on the computer’s graphics processing unit (GPU), a chip optimized for the fast, parallel computation used in modern video games. You’ll need to have a NVIDIA GPU, with NVIDIA’s CUDA Toolkit installed. Then follow the documentation for using Numba’s CUDA JIT with the GPU.

Outside of Numba, the other popular library with GPU capability is TensorFlow, released by Google under the Apache v2.0 license. It provides tensors (mutidimensional matrices) and a way to chain tensor operations together, for fast matrix math. Currently it can only use the GPU on Linux operating systems. For installation instructions, see the following pages:

For those not on Linux, Theano, from the University of Montréal, was the de facto matrix-math-over-GPU libary in Python until Google posted TensorFlow. Theano is still under active development. It has a page dedicated to using the GPU. Theano supports Windows, OS X, and Linux operating systems, and is available via pip:

$ pip install Theano

For lower-level interaction with the GPU, you can try PyCUDA.

Finally, people without a NVIDIA GPU can use PyOpenCL, a wrapper for Intel’s OpenCL library, which is compatible with a number of different hardware sets.

Interfacing with C/C++/FORTRAN Libraries

Each of the libraries described in the following sections are very different: both CFFI and ctypes are Python libraries, F2PY is for FORTRAN, SWIG can make C objects available in multiple languages (not just Python), and Boost.Python is a C++ library that can expose C++ objects to Python and vice versa. Table 8-2 goes into a little more detail.

Table 8-2. C and C++ interfaces
Library	License	Reasons to use
CFFI	MIT license	It provides the best compatibility with PyPy. It allows you to write C code from within Python that can be compiled to build a shared C library with Python bindings.
ctypes	Python Software Foundation license	It’s in the Python Standard Library. It allows you to wrap existing DLLs or shared objects that you didn’t write or don’t have control over. It provides the second-best compatibility with PyPy.
F2PY	BSD license	This lets you use a FORTRAN library. F2PY is a part of NumPy, so you should be using NumPy.
SWIG	GPL (output is not restricted)	It provides a way to autogenerate libraries in multiple languages, using a special file format that is neither C nor Python.
Boost.Python	Boost Software license	It’s not a command-line tool; it’s a C++ library that can be included in the C++ code and used to identify which objects to expose to Python.

C Foreign Function Interface

The CFFI package provides a simple mechanism to interface with C from both CPython and PyPy. CFFI is recommended by PyPy for the best compatibility between CPython and PyPy. It supports two modes: the inline application binary interface (ABI) compatibility mode (see the following code example) allows you to dynamically load and run functions from executable modules (essentially exposing the same functionality as LoadLibrary or dlopen), and an API mode, which allows you to build C extension modules.⁹

Install it using pip:

$ pip install cffi

Here is an example with ABI interaction:

from cffi import FFI
ffi = FFI()
ffi.cdef("size_t strlen(const char*);")  
clib = ffi.dlopen(None)  
length = clib.strlen("String to be evaluated.")  
# prints: 23
print("{}".format(length))

: The string here could be lifted from a function declaration from a C header file.
: Open the shared library (*.DLL or *.so).
: Now we can treat clib as if it were a Python module and just call functions we defined with dot notation.

ctypes

ctypes is the de facto library for interfacing with C/C++ from CPython, and it’s in the Standard Library. It provides full access to the native C interface of most major operating systems (e.g., kernel32 on Windows, or libc on *nix), plus support for loading and interfacing with dynamic libraries—shared objects (*.so) or DLLs—at runtime. It brings along with it a whole host of types for interacting with system APIs and allows you to easily define your own complex types, such as structs and unions, and allows you to modify things like padding and alignment if needed. It can be a bit crufty to use (because you have to type so many extra characters), but in conjunction with the Standard Library’s struct module, you are essentially provided full control over how your data types get translated into something usable by a pure C/C++ method.

For example, a C struct defined like this in a file named my_struct.h:

struct my_struct {
    int a;
    int b;
};

could be implemented as shown in a file named my_struct.py:

import ctypes
class my_struct(ctypes.Structure):
    _fields_ = [("a", c_int),
                ("b", c_int)]

F2PY

The Fortran-to-Python interface generator (F2PY) is a part of NumPy, so to get it, install NumPy using pip:

$ pip install numpy

It provides a versatile command-line function, f2py, that can be used three different ways, all documented in the F2PY quickstart guide. If you have control over the source code, you can add special comments with instructions for F2PY that clarify the intent of each argument (which items are return values and which are inputs), and then just run F2PY like this:

$ f2py -c fortran_code.f -m python_module_name

When you can’t do that, F2PY can generate an intermediate file with extension *.pyf that you can modify, to then produce the same results. This would be three steps:

$ f2py fortran_code.f -m python_module_name -h interface_file.pyf  
$ vim interface_file.pyf  
$ f2py -c interface_file.pyf fortran_code.f

: Autogenerate an intermediate file that defines the interface between the FORTRAN function signatures and the Python signatures.
: Edit the file so that it correctly labels input and output variables.
: Now compile the code and build the extension modules.

SWIG

The Simplified Wrapper Interface Generator (SWIG) supports a large number of scripting languages, including Python. It’s a popular, widely used command-line tool that generates bindings for interpreted languages from annotated C/C++ header files. To use it, first use SWIG to autogenerate an intermediate file from the header—with *.i suffix. Next, modify that file to reflect the actual interface you want, and then run the build tool to compile the code into a shared library. All of this is done step by step in the SWIG tutorial.

While it does have some limits (it currently seems to have issues with a small subset of newer C++ features, and getting template-heavy code to work can be a bit verbose), SWIG provides a great deal of power and exposes lots of features to Python with little effort. Additionally, you can easily extend the bindings SWIG creates (in the interface file) to overload operators and built-in methods, and effectively re-cast C++ exceptions to be catchable by Python.

Here is an example that shows how to overload __repr__. This excerpt would be from a file named MyClass.h:

#include <string>
class MyClass {
private:
    std::string name;
public:
    std::string getName();
};

And here is myclass.i :

%include "string.i"

%module myclass
%{
#include <string>
#include "MyClass.h"
%}

%extend MyClass {
    std::string __repr__()
    {
        return $self->getName();
    }
}

%include "MyClass.h"

There are more Python examples in the SWIG GitHub repository. Install SWIG using your package manager, if it’s there (apt-get install swig, yum install swig.i386, or brew install swig), or else use this link to download SWIG, then follow the installation instructions for your operating system. If you’re missing the Perl Compatible Regular Expressions (PCRE) library in OS X, use Homebrew to install it:

$ brew install pcre

Boost.Python

Boost.Python requires a bit more manual work to expose C++ object functionality, but it is capable of providing all the same features SWIG does and then some—for example, wrappers to access Python objects as PyObjects in C++, as well as the tools to expose C++ objects to Python. Unlike SWIG, Boost.Python is a library, not a command-line tool, and there is no need to create an intermediate file with different formatting—it’s all written directly in C++. Boost.Python has an extensive, detailed tutorial if you wish to go this route .

¹ Fowler is an advocate for best practices in software design and development, and one of continuous integration’s most vocal proponents. The quote is excerpted from his blog post on continuous integration. He hosted a series of discussions about test-driven development (TDD) and its relationship to extreme development with David Heinemeier Hansson (creater of Ruby on Rails) and Kent Beck (instigator of the extreme programming (XP) movement, with CI as one of its cornerstones).

² On GitHub, other users submit pull requests to notify owners of another repository that they have changes they’d like to merge.

³ REST stands for “representational state transfer.” It’s not a standard or a protocol, just a set of design principles developed during the creation of the HTTP 1.1 standard. A list of relevant architectural constraints for REST is available on Wikipedia.

⁴ OpenStack provides free software for cloud networking, storage, and computation so that organizations can host private clouds for themselves or public clouds that third parties can pay to use.

⁵ Except for Salt-SSH, which is an alternative Salt architecture, probably created in response to users wanting an Ansible-like option from Salt.

⁶ David Beazley has a great guide (PDF) that describes how the GIL operates. He also covers the new GIL (PDF) in Python 3.2. His results show that maximizing performance in a Python application requires a strong understanding of the GIL, how it affects your specific application, how many cores you have, and where your application bottlenecks are.

⁷ Here is a full derivation of the method. Basically you’re throwing darts at a 2 x 2 square, with a circle that has radius = 1 inside. If the darts land with equal likelihood anywhere on the board, the percent that are in the circle is equal to Pi / 4. Which means 4 times the percent in the circle is equal to Pi.

⁸ It is possible for a language to both be strongly and dynamically typed, as described in this Stack Overflow discussion.

⁹ Special care must be taken when writing C extensions to make sure you register your threads with the interpreter.

Previous Chapter

7. User Interaction

Next Chapter

9. Software Interfaces

Table of Contents for The Hitchhiker's Guide to Python