This chapter covers libraries used to manage or simplify the development and build process, system integration, server management, and performance optimization.
Nobody describes continuous integration better than Martin Fowler:1
Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily—leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.
The three most popular tools for CI right now are Travis-CI, Jenkins, and Buildbot—which are all listed in the following sections. They are frequently used with Tox, a Python tool to manage virtualenv and tests from the command line. Travis is for multiple Python interpreters on a single platform, and Jenkins (most popular) and Buildbot (written in Python) can manage builds on multiple machines. Many also use Buildout (discussed in “Buildout”) and Docker (discussed in “Docker”) to rapidly and repeatably build complex environments for their test battery.
Tox is an automation tool providing packaging, testing, and deployment of Python software right from the console or CI server. It is a generic virtualenv management and test command-line tool that provides the following features:
Checks that packages install correctly with different Python versions and interpreters
Runs tests in each of the environments, configuring your test tool of choice
Acts as a frontend to continuous integration servers, reducing boilerplate and merging CI and shell-based testing
Install it using pip:
$ pip install tox
The tools in this section are for managing and monitoring systems—server automation, system monitoring, and workflow management.
Travis-CI is a distributed CI server which builds tests for open source projects for free. It provides multiple workers that run Python tests and seamlessly integrates with GitHub. You can even have it comment on your pull requests2 whether this particular set of changes breaks the build or not. So if you are hosting your code on GitHub, Travis-CI is a great and easy way to get started with continuous integration. Travis-CI can build your code on a virtual machine that is running Linux, OS X, or iOS.
To get started, add a .travis.yml file to your repository with this example content:
language:pythonpython:-"2.6"-"2.7"-"3.3"-"3.4"script:python tests/test_all_of_the_units.pybranches:only:-master
This will get your project tested on all the listed Python versions by
running the given script and will only build the master branch. There are a
lot more options you can enable, like notifications, before and after steps,
and much more. The Travis-CI docs
explain all of these options and are very thorough.
To use Tox with Travis-CI, add a Tox script to your repository,
and change the line with script: in it to become:
install:-pip install toxscript:-tox
In order to activate testing for your project, go to the Travis-CI site and log in with your GitHub account. Then activate your project in your profile settings and you’re ready to go. From now on, your project’s tests will be run on every push to GitHub.
Jenkins CI is an extensible continuous integration engine
and currently the most popular CI engine. It works on Windows,
Linux, and OS X and plugs in to “every Source Code Management (SCM) tool that exists.”
Jenkins is a Java servlet (the Java equivalent
of a Python WSGI application) that ships with its own servlet container, so you can run it directly
using java --jar jenkins.war.
For more information, refer to the
Jenkins installation instructions; the Ubuntu page has instructions for how to place Jenkins behind an Apache or Nginx reverse proxy.
You interact with Jenkins via a web-based dashboard, or its HTTP-based RESTful API3 (e.g., at http://myServer:8080/api), meaning we can use HTTP to communicate with the Jenkins server from remote machines. For examples, look at Apache’s Jenkins Dashboard or the Pylons project’s Jenkins Dashboard.
The most frequently used Python tool to interact with the Jenkins API is python-jenkins, created by the OpenStack4 infrastructure team. Most Python users configure Jenkins to run a Tox script as part of the build process. For more information, see the documentation for using Tox with Jenkins and this guide to set up Jenkins with multiple build machines.
Buildbot is a Python system to automate the compile/test cycle to validate code changes. It works like Jenkins in that it polls your source control manager for changes, builds and test your code on multiple computers according to your instructions (with built-in support for Tox), and then tells you what happened. It runs behind a Twisted web server. For an example of what the web interface looks like, here is Chromium’s public buildbot dashboard (Chromium powers the Chrome browser).
Because Buildbot is pure Python, it’s installed via pip:
$ pip install buildbot
The 0.9 version has a REST API,
but it is still in beta, so you won’t be able to use it unless you expressly specify
the version number (e.g., pip install buildbot==0.9.00.9.0rc1).
Buildbot has a reputation for being the most powerful, but also the
most complex of the continuous integration tools. To get started,
follow their excellent tutorial.
Salt, Ansible, Puppet, Chef, and CFEngine are server automation tools that provide an elegant way for system administrators to manage their fleet of physical and virtual machines. They all can manage Linux, Unix-like systems, and Windows machines. We’re of course partial to Salt and Ansible, as they’re written in Python. But they’re still new, and the other options are more widely used. The following sections provide a quick summary of these options.
For the record, folks at Docker say that they expect system automation tools like Salt, Ansible, and the rest to be complemented by, and not replaced by Docker—see this post about how Docker fits into the rest of DevOps.
Salt calls its master node the master and its agent nodes minions, or minion hosts. Its main design goal is speed—networking by default is done using ZeroMQ, with TCP connections between the master and its “minions,” and members of the Salt team have even written their own (optional) transmission protocol, RAET, which is faster than TCP and not as lossy as UDP.
Salt supports Python versions 2.6 and 2.7 and can be installed via pip:
$pip install salt# No Python 3 yet ...
After configuring a master server and any number of minion hosts, we can run
arbitrary shell commands or use prebuilt modules of complex commands on our
minions.
The following command lists all available minion hosts, using ping in salt’s test module:
$salt'*'test.ping
You can filter minion hosts by either matching the minion ID, or by using the grains system, which uses static host information like the operating system version or the CPU architecture to provide a host taxonomy for the Salt modules. For example, the following command uses the grains system to list only the available minions running CentOS:
$salt -G'os:CentOS'test.ping
Salt also provides a state system. States can be used to configure the minion hosts. For example, when a minion host is ordered to read the following state file, it will install and start the Apache server:
apache:pkg:-installedservice:-running-enable:True-require:-pkg:apache
State files can be written using YAML, augmented by the Jinja2 template system, or can be pure Python modules. For more information, see the Salt documentation.
The biggest advantage of Ansible over the other system automation tools
is that it does not require anything (except Python) to be permanently installed on client
machines.
All of the other options5
keep daemons running on the clients to poll the master.
Their configuration files are in the YAML format.
Playbooks are Ansible’s configuration, deployment, and
orchestration documents, and are written in YAML with Jinja2 for templating.
Ansible supports Python versions 2.6 and 2.7 and can be installed via pip:
$pip install ansible# No Python 3 yet...
Ansible requires an inventory file that describes the hosts to which it has access. The following code is an example of a host and playbook that will ping all the hosts in the inventory file. Here is an example inventory file (hosts.yml):
[server_name]127.0.0.1
Here is an example playbook (ping.yml):
----hosts:alltasks:-name:pingaction:ping
To run the playbook:
$ ansible-playbook ping.yml -i hosts.yml --ask-pass
The Ansible playbook will ping all of the servers in the hosts.yml file. You can also select groups of servers using Ansible. For more information about Ansible, read the Ansible documentation. The Servers for Hackers Ansible tutorial is also a great and detailed introduction.
Puppet is written in Ruby and provides its own language—PuppetScript—for configuration. It has a designated server, the Puppet Master, that’s responsible for orchestrating its Agent nodes. Modules are small, shareable units of code written to automate or define the state of a system. Puppet Forge is a repository for modules written by the community for Open Source Puppet and Puppet Enterprise.
Agent nodes send basic facts about the system (e.g., the operating system, kernel, architecture, IP address, and hostname) to the Puppet Master. The Puppet Master then compiles a catalog with information provided by the agents on how each node should be configured and sends it to the agent. The agent enforces the change as prescribed in the catalog and sends a report back to the Puppet Master.
Facter (yes, spelled with an “-er”) is an interesting tool that ships with Puppet and pulls basic facts about the system. These facts can be referenced as a variable while writing your Puppet modules:
$facter kernel Linux$$facter operatingsystem Ubuntu
Writing Modules in Puppet is pretty straightforward: Puppet Manifests (files with the extension *.pp) together form Puppet Modules. Here is an example of Hello World in Puppet:
notify{'Hello World, this message is getting logged into the agent node':#As nothing is specified in the body, the resource title#is the notification message by default.}
Here is another example, with system-based logic. To reference other facts,
prepend a $ sign to the variable name—for instance, $hostname, or in this
case, $operatingsystem:
notify{'Mac Warning':message=>$operatingsystem?{'Darwin'=>'This seems to be a Mac.',default=>'I am a PC.',},}
There are several resource types for Puppet, but the package-file-service paradigm is all you need for for the majority of the configuration management. The following Puppet code makes sure that the OpenSSH-Server package is installed in a system and the sshd service (the SSH server daemon) is notified to restart every time the sshd configuration file is changed:
package{'openssh-server':ensure=>installed,}file{'/etc/ssh/sshd_config':source=>'puppet:///modules/sshd/sshd_config',owner=>'root',group=>'root',mode=>'640',notify=>Service['sshd'],# sshd will restart# whenever you edit this# filerequire=>Package['openssh-server'],}service{'sshd':ensure=>running,enable=>true,hasstatus=>true,hasrestart=>true,}
For more information, refer to the Puppet Labs documentation.
If Chef is your choice for configuration management, you will primarily use Ruby to write your infrastructure code. Chef is similar to Puppet, but designed with the opposite philosophy: Puppet provides a framework that simplifies things at the expense of flexibility, while Chef provides nearly no framework—its goal is to be very extensible, and so it is more difficult to use.
Chef clients run on every node in your infrastructure and regularly check with your Chef server to ensure your system is always aligned and represents the desired state. Each individual Chef client configures itself. This distributed approach makes Chef a scalable automation platform.
Chef works by using custom recipes (configuration elements), implemented in cookbooks. Cookbooks, which are basically packages for infrastructure choices, are usually stored in your Chef server. Read DigitalOcean’s tutorial series on Chef to learn how to create a simple Chef server.
Use the knife command to create a simple cookbook:
$ knife cookbook create cookbook_name
Andy Gale’s “Getting started with Chef” is a good starting point for Chef beginners. Many community cookbooks can be found on the Chef Supermarket—they’re a good starting point for your own cookbooks. For more information, check out the full Chef documentation.
CFEngine has a tiny footprint because it’s written in C. Its main design goal is robustness to failure, accomplished via autonomous agents operating in a distributed network (as opposed to a master/client architecture) that communicate using Promise Theory. If you want a headless architecture, try this system.
The following libraries all help system administrators monitor running jobs but have very different applications: Psutil provides information in Python that can be obtained by Unix utility functions, Fabric makes it easy to define and execute commands on a list of remote hosts via SSH, and Luigi makes it possible to schedule and monitor long-running batch processes like chained Hadoop commands.
Psutil is a cross-platform (including Windows)
interface to different
system information (e.g., CPU, memory, disks, network, users, and processes)—it makes accessible within Python information that many of us are accustomed to
obtaining via Unix commands
such as top, ps, df, and netstat.
Get it using pip:
$ pip install psutil
Here is an example that monitors for server overload (if any of the tests—net, CPU—fail, it will send an email):
# Functions to get system values:frompsutilimportcpu_percent,net_io_counters# Functions to take a break:fromtimeimportsleep# Package for email services:importsmtplibimportstringMAX_NET_USAGE=400000MAX_ATTACKS=4attack=0counter=0whileattack<=MAX_ATTACKS:sleep(4)counter=counter+1# Check the CPU usageifcpu_percent(interval=1)>70:attack=attack+1# Check the net usageneti1=net_io_counters()[1]neto1=net_io_counters()[0]sleep(1)neti2=net_io_counters()[1]neto2=net_io_counters()[0]# Calculate the bytes per secondnet=((neti2+neto2)-(neti1+neto1))/2ifnet>MAX_NET_USAGE:attack=attack+1ifcounter>25:attack=0counter=0# Write a very important email if attack is higher than 4TO="you@your_email.com"FROM="webmaster@your_domain.com"SUBJECT="Your domain is out of system resources!"text="Go and fix your server!"BODY=string.join(("From:%s"%FROM,"To:%s"%TO,"Subject:%s"%SUBJECT,"",text),"\r\n")server=smtplib.SMTP('127.0.0.1')server.sendmail(FROM,[TO],BODY)server.quit()
For a good example use of Psutil, see glances,
a full terminal application that behaves like a widely extended top (which lists
running process by CPU use or a user-specified sort order),
with the ability of a client-server monitoring tool.
Fabric is a library for simplifying system
administration tasks. It allows you to SSH to multiple hosts and execute
tasks on each one. This is convenient for system administration or
application deployment. Use pip to install Fabric:
$ pip install fabric
Here is a complete Python module defining two
Fabric tasks—memory_usage and deploy:
# fabfile.pyfromfabric.apiimportcd,env,prefix,run,taskenv.hosts=['my_server1','my_server2']# Where to SSH@taskdefmemory_usage():run('free -m')@taskdefdeploy():withcd('/var/www/project-env/project'):withprefix('. ../bin/activate'):run('git pull')run('touch app.wsgi')
The with statement just nests the commands in so that
in the end deploy() becomes this for each host:
$sshhostnamecd/var/ww/project-env/project&&../bin/activate&&gitpull$sshhostnamecd/var/ww/project-env/project&&../bin/activate&&\>touchapp.wsgi
With the previous code saved in a file named fabfile.py (the default module
name fab looks for), we can check memory usage with our new memory_usage task:
$fab memory_usage[my_server1]Executing task'memory'[my_server1]run: free -m[my_server1]out: total used free shared buffers cached[my_server1]out: Mem:6964189750670166222[my_server1]out: -/+ buffers/cache:15095455[my_server1]out: Swap:000[my_server2]Executing task'memory'[my_server2]run: free -m[my_server2]out: total used free shared buffers cached[my_server2]out: Mem:16669027640180572[my_server2]out: -/+ buffers/cache:1481517[my_server2]out: Swap:8951894
and we can deploy with:
$ fab deploy
Additional features include parallel execution, interaction with remote programs, and host grouping. The examples in the Fabric documentation are easy to follow.
Luigi is a pipeline management tool
developed and released by Spotify. It helps developers manage the entire
pipeline of large, long-running batch jobs, stitching together things
such as Hive queries, database queries, Hadoop Java jobs, pySpark jobs, and
any tasks you want to write yourself. They don’t all have to be big
data applications—the API allows you to schedule anything. But Spotify made
it to run their jobs over Hadoop, so they provide all of these utilities
already in luigi.contrib.
Install it with pip:
$ pip install luigi
It includes a web interface, so users can filter for their tasks and view dependency graphs of the pipeline workflow and its progress. There are example Luigi tasks in their GitHub repository, or see the Luigi documentation.
This chapter lists the Python community’s most common approaches to speed optimization. Table 8-1 shows your optimization options, after you’ve done the simple things like profiling your code and comparing options for code snippets to first get all of the performance you can directly from Python.
You may have already heard of the global interpreter lock (GIL)—it is how the C implementation of Python allows multiple threads to operate at the same time. Python’s memory management isn’t entirely thread-safe, so the GIL is required to prevent multiple threads from running the same Python code at once.
The GIL is often cited as a limitation of Python, but it’s not really as big of a deal as it’s made out to be—it’s only a hindrance when processes are CPU bound (in which case, like with NumPy or the cryptography libraries discussed soon, the code is rewritten in C and exposed with Python bindings). For anything else (like network I/O or file I/O), the bottleneck is the code blocking in a single thread while waiting for the I/O. You can solve blocking problems using threads or event-driven programming.
We should also note that in Python 2, there were slower and faster versions of libraries—StringIO and cStringIO, ElementTree and cElementTree. The C implementations are faster, but had to be imported explicitly. Since Python 3.3, the regular versions import from the faster implementation whenever possible, and the C-prefixed libraries are deprecated.
| Option | License | Reasons to use |
|---|---|---|
Threading |
PSFL |
|
Multiprocessing/subprocess |
PSFL |
|
PyPy |
MIT license |
|
Cython |
Apache license |
|
Numba |
BSD license |
|
Weave |
BSD license |
|
PyCUDA/gnumpy/TensorFlow/Theano/PyOpenCL |
MIT/modified BSD/BSD/BSD/MIT |
|
Direct use of C/C++ libraries |
— |
|
Jeff Knupp, author of Writing Idiomatic Python, wrote a blog post about getting around the GIL, citing David Beazley’s deep look6 into the subject.
Threading and the other optimization options in Table 8-1 are discussed in more detail in the following sections.
Python’s threading library allows you to create multiple threads. Because of the GIL (at least in CPython), there will only be one Python process running per Python interpreter, meaning there will only be a performance gain when at least one thread is blocking (e.g., on I/O). The other option for I/O is to use event handling. For that, see the paragraphs on asyncio in “Performance networking tools in Python’s Standard Library”.
What happens in Python when you have multiple threads is the kernel notices that one thread is blocking on I/O, and it switches to allow the next thread to use the processor until it blocks or is finished. All of this happens automatically when you start your threads. There’s a good example use of threading on Stack Overflow, and the Python Module of the Week series has a great threading introduction. Or see the threading documentation in the Standard Library.
The multiprocessing module
in Python’s Standard Library provides a way to bypass the GIL—by launching additional
Python interpreters. The separate processes can communicate
using a multiprocessing.Pipe, or by a multiprocessing.Queue,
or share memory via a multiprocessing.Array and multiprocessing.Value,
which implement locking automatically. Share data sparingly; these objects
implement locking to prevent simultaneous access by different processes.
Here’s an example to show that the speed gain from using a pool of worker processes isn’t always proportional to the number of workers used. There’s a trade-off between the computational time saved and the time it takes to launch another interpeter. The example uses the Monte Carlo method (of drawing random numbers) to estimate the value of Pi:7
>>>importmultiprocessing>>>importrandom>>>importtimeit>>>>>>defcalculate_pi(iterations):...x=(random.random()foriinrange(iterations))...y=(random.random()foriinrange(iterations))...r_squared=[xi**2+yi**2forxi,yiinzip(x,y)]...percent_coverage=sum([r<=1forrinr_squared])/len(r_squared)...return4*percent_coverage...>>>>>>defrun_pool(processes,total_iterations):...withmultiprocessing.Pool(processes)aspool:...# Divide the total iterations among the processes....iterations=[total_iterations//processes]*processes...result=pool.map(calculate_pi,iterations)...("%0.4f"%(sum(result)/processes),end=',')...>>>>>>ten_million=10000000>>>timeit.timeit(lambda:run_pool(1,ten_million),number=10)3.141,3.142,3.142,3.141,3.141,3.142,3.141,3.141,3.142,3.142,134.48382110201055>>>>>>timeit.timeit(lambda:run_pool(10,ten_million),number=10)3.142,3.142,3.142,3.142,3.142,3.142,3.141,3.142,3.142,3.141,74.38514468498761

Using the multiprocessing.Pool within a context manager reinforces
that the pool should only be used by the process that creates it.

The total iterations will always be the same; they’ll just be divided between a different number of processes.

pool.map() creates the multiple processes—one per item in the iterations list, up to the
maximum number stated when the pool was initialized (in multiprocessing.Pool(processes)).

There is only one process for the first timeit trial.

10 repetitions of one single process running with 10 million iterations took 134 seconds.

There are 10 processes for the second timeit trial.

10 repetitions of 10 processes each running with one million iterations took 74 seconds.
The point of all this was that there is overhead in making the multiple processes, but the tools for runing multiple processes in Python are robust and mature. See the multiprocessing documentation in the Standard Library for more information, and check out Jeff Knupp’s blog post about getting around the GIL, because it has a few paragraphs about multiprocessing.
The subprocess library was introduced into the
Standard Library in Python 2.4 and defined in
PEP 324.
It launches a system call (like unzip or curl) as if called from the command line (by default,
without calling the system shell),
with the developer selecting what to do with the subprocess’s input and output pipes.
We recommend Python 2 users get an updated version with some bugfixes from the
subprocess32 package. Install it using pip:
$ pip install subprocess32
There is a great subprocess tutorial on the Python Module of the Week blog.
PyPy is a pure-Python implementation of Python. It’s fast, and when it works, you don’t have to do anything to your code, and it just runs faster for free. You should try this option before anything else.
You can’t get it using pip, because it’s actually another implementation of
Python. Scroll through the PyPy downloads page
for your correct version of Python and your operating system.
Here is a slightly modified version of David Beazley’s CPU bound test code, with an added loop for multiple tests. You can see the difference between PyPy and CPython. First it’s run using the CPython:
$# CPython$./python -V Python 2.7.1$$./python measure2.py 1.06774401665 1.45412397385 1.51485204697 1.54693889618 1.60109114647
And here is the same script, and the only thing different is the Python interpreter—it’s running with PyPy:
$# PyPy$./pypy -V Python 2.7.1(7773f8fc4223, Nov182011, 18:47:10)[PyPy 1.7.0 with GCC 4.4.3]$$./pypy measure2.py 0.0683999061584 0.0483210086823 0.0388588905334 0.0440690517426 0.0695300102234
So, just by downloading PyPy, it went from an average of about 1.4 seconds to around 0.05 seconds—more than 20 times faster. Sometimes your code won’t even double in speed, but other times you really do get a big boost. And with no effort outside of downloading the PyPy interpreter. If you want your C library to be compatible with PyPy, follow PyPy’s advice and use the CFFI instead of ctypes in the Standard Library.
Unfortunately, PyPy doesn’t work with all libraries that use C extensions.
For those cases,
Cython (pronounced “PSI-thon”—not the same as CPython,
the standard C implementation of Python)
implements a superset of the Python language
that lets you write C and C++ modules for Python. Cython also
allows you to call functions from compiled C libraries, and provides a
context, nogil, that allows you to
release the GIL
around a section of code, provided it does not manipulate Python objects in any way.
Using Cython allows you to take advantage of Python’s strong typing8 of variables and operations.
Here’s an example of strong typing with Cython:
defprimes(intkmax):"""Calculation of prime numbers with additional Cython keywords"""cdefintn,k,icdefintp[1000]result=[]ifkmax>1000:kmax=1000k=0n=2whilek<kmax:i=0whilei<kandn%p[i]!=0:i=i+1ifi==k:p[k]=nk=k+1result.append(n)n=n+1returnresult
This implementation of an algorithm to find prime numbers has some additional keywords compared to the next one, which is implemented in pure Python:
defprimes(kmax):"""Calculation of prime numbers in standard Python syntax"""p=range(1000)result=[]ifkmax>1000:kmax=1000k=0n=2whilek<kmax:i=0whilei<kandn%p[i]!=0:i=i+1ifi==k:p[k]=nk=k+1result.append(n)n=n+1returnresult
Notice that in the Cython version you declare integers and integer arrays to be compiled into C types while also creating a Python list:
# Cython versiondefprimes(intkmax):"""Calculation of prime numbers with additional Cython keywords"""cdefintn,k,icdefintp[1000]result=[]

The type is declared to be an integer.

The upcoming variables n, k, and i are declared as integers.

And we then have preallocated a 1000-long array of integers for p.
What is the difference? In the Cython version, you can see the
declaration of the variable types and the integer array in a similar way as
in standard C. For example, the addtional type declaration
(of integer) in the cdef int n,k,i
allows the Cython compiler to generate more
efficient C code than it could without type hints.
Because the syntax is incompatible with standard Python,
it is not saved in *.py files—instead,
Cython code is saved in *.pyx files.
What’s the difference in speed? Let’s try it!
importtime# activate pyx compilerimportpyximportpyximport.install()# primes implemented with CythonimportprimesCy# primes implemented with Pythonimportprimes("Cython:")t1=time.time()primesCy.primes(500)t2=time.time()("Cython time:%s"%(t2-t1))("")("Python")t1=time.time()(primes.primes(500))t2=time.time()("Python time: {}".format(t2-t1))

The pyximport module allows you to import *.pyx files (e.g.,
primesCy.pyx) with the Cython-compiled version of the primes
function.

The pyximport.install() command allows the Python interpreter to
start the Cython compiler directly to generate C-code, which is automatically
compiled to a *.so C-library. Cython is then able to import this
library for you in your Python code, easily and efficiently.

With the time.time() function, you are able to compare the time between these two
different calls to find 500 prime numbers. On a standard notebook (dual-core
AMD E-450 1.6 GHz), the measured values are:
Cythontime: 0.0054 seconds Pythontime: 0.0566 seconds
And here the output of an embedded ARM BeagleBone machine:
Cythontime: 0.0196 seconds Pythontime: 0.3302 seconds
Numba is a NumPy-aware Python compiler (just-in-time [JIT] specializing compiler) that compiles annotated Python (and NumPy) code to LLVM (Low-Level Virtual Machine) through special decorators. Briefly, Numba uses LLVM to compile Python down to machine code that can be natively executed at runtime.
If you use Anaconda,
install Numba with conda install numba; if not, install it by hand.
You must already have NumPy and LLVM installed before installing Numba.
Check the LLVM version you need (it’s on the
PyPI page for llvmlite), and download that
version from whichever place matches your OS:
For a discussion of how to build from source for other Unix systems, see “Building the Clang + LLVM compilers”.
On OS X, use brew install homebrew/versions/llvm37 (or whatever version number is now current).
Once you have LLVM and NumPy, install Numba using pip.
You may need to help the installer find the llvm-config file by
providing an environment variable LLVM_CONFIG with the appropriate
path, like this:
$ LLVM_CONFIG=/path/to/llvm-config-3.7 pip install numba
Then, to use it in your code, just decorate your functions:
fromnumbaimportjit,int32@jitdeff(x):returnx+3@jit(int32(int32,int32))defg(x,y):returnx+y

With no arguments, the @jit decorator does lazy compilation—deciding
itself whether to optimize the function, and how.

For eager compilation, specify types. The function will be compiled
with the given specialization, and no other will be allowed—the return
value and the two arguments will all have type numba.int32.
There is a nogil flag that can allow code to ignore the Global Interpreter Lock,
and a module numba.pycc that can be used to compile the code ahead of time.
For more information, see Numba’s user manual.
Numba can optionally be built with capacity to run on the computer’s graphics processing unit (GPU), a chip optimized for the fast, parallel computation used in modern video games. You’ll need to have a NVIDIA GPU, with NVIDIA’s CUDA Toolkit installed. Then follow the documentation for using Numba’s CUDA JIT with the GPU.
Outside of Numba, the other popular library with GPU capability is TensorFlow, released by Google under the Apache v2.0 license. It provides tensors (mutidimensional matrices) and a way to chain tensor operations together, for fast matrix math. Currently it can only use the GPU on Linux operating systems. For installation instructions, see the following pages:
For those not on Linux, Theano, from the University
of Montréal, was the de facto matrix-math-over-GPU libary in Python until Google posted TensorFlow.
Theano is still under active development.
It has a page dedicated to using the GPU.
Theano supports Windows, OS X, and Linux operating systems, and is available via pip:
$ pip install Theano
For lower-level interaction with the GPU, you can try PyCUDA.
Finally, people without a NVIDIA GPU can use PyOpenCL, a wrapper for Intel’s OpenCL library, which is compatible with a number of different hardware sets.
Each of the libraries described in the following sections are very different: both CFFI and ctypes are Python libraries, F2PY is for FORTRAN, SWIG can make C objects available in multiple languages (not just Python), and Boost.Python is a C++ library that can expose C++ objects to Python and vice versa. Table 8-2 goes into a little more detail.
| Library | License | Reasons to use |
|---|---|---|
CFFI |
MIT license |
|
ctypes |
Python Software Foundation license |
|
F2PY |
BSD license |
|
SWIG |
GPL |
|
Boost.Python |
Boost Software license |
|
The CFFI package provides a simple mechanism to interface with C from both CPython and PyPy. CFFI is recommended by PyPy for the best compatibility between CPython and PyPy. It supports two modes: the inline application binary interface (ABI) compatibility mode (see the following code example) allows you to dynamically load and run functions from executable modules (essentially exposing the same functionality as LoadLibrary or dlopen), and an API mode, which allows you to build C extension modules.9
Install it using pip:
$pipinstallcffi
Here is an example with ABI interaction:
fromcffiimportFFIffi=FFI()ffi.cdef("size_t strlen(const char*);")clib=ffi.dlopen(None)length=clib.strlen("String to be evaluated.")# prints: 23("{}".format(length))
ctypes is the de facto library for interfacing with C/C++ from CPython, and it’s in the Standard Library. It provides full access to the native C interface of most major operating systems (e.g., kernel32 on Windows, or libc on *nix), plus support for loading and interfacing with dynamic libraries—shared objects (*.so) or DLLs—at runtime. It brings along with it a whole host of types for interacting with system APIs and allows you to easily define your own complex types, such as structs and unions, and allows you to modify things like padding and alignment if needed. It can be a bit crufty to use (because you have to type so many extra characters), but in conjunction with the Standard Library’s struct module, you are essentially provided full control over how your data types get translated into something usable by a pure C/C++ method.
For example, a C struct defined like this in a file named my_struct.h:
structmy_struct{inta;intb;};
could be implemented as shown in a file named my_struct.py:
importctypesclassmy_struct(ctypes.Structure):_fields_=[("a",c_int),("b",c_int)]
The Fortran-to-Python interface generator (F2PY)
is a part of NumPy, so to get it, install NumPy using pip:
$ pip install numpy
It provides a versatile command-line function, f2py, that can be used
three different ways, all documented in
the F2PY quickstart guide.
If you have control over the source code, you can add special
comments with instructions for F2PY that clarify the intent of each argument
(which items are return values and which are inputs), and then just
run F2PY like this:
$ f2py -c fortran_code.f -m python_module_name
When you can’t do that, F2PY can generate an intermediate file with extension *.pyf that you can modify, to then produce the same results. This would be three steps:
$f2pyfortran_code.f-mpython_module_name-hinterface_file.pyf$viminterface_file.pyf$f2py-cinterface_file.pyffortran_code.f
The Simplified Wrapper Interface Generator (SWIG) supports a large number of scripting languages, including Python. It’s a popular, widely used command-line tool that generates bindings for interpreted languages from annotated C/C++ header files. To use it, first use SWIG to autogenerate an intermediate file from the header—with *.i suffix. Next, modify that file to reflect the actual interface you want, and then run the build tool to compile the code into a shared library. All of this is done step by step in the SWIG tutorial.
While it does have some limits (it currently seems to have issues with a small subset of newer C++ features, and getting template-heavy code to work can be a bit verbose), SWIG provides a great deal of power and exposes lots of features to Python with little effort. Additionally, you can easily extend the bindings SWIG creates (in the interface file) to overload operators and built-in methods, and effectively re-cast C++ exceptions to be catchable by Python.
Here is an example that shows how to overload __repr__. This
excerpt would be from a file named MyClass.h:
#include <string>classMyClass{private:std::stringname;public:std::stringgetName();};
And here is myclass.i :
%include"string.i"%modulemyclass%{#include <string>#include "MyClass.h"%}%extendMyClass{std::string__repr__(){return$self->getName();}}%include"MyClass.h"
There are more Python examples in the
SWIG GitHub repository.
Install SWIG using your package manager, if it’s there (apt-get install swig, yum install swig.i386, or
brew install swig), or else
use this link to download SWIG,
then follow the
installation instructions
for your operating system. If you’re missing the Perl Compatible Regular Expressions (PCRE)
library in OS X, use Homebrew to install it:
$ brew install pcre
Boost.Python requires a bit more manual work to expose C++ object functionality, but it is capable of providing all the same features SWIG does and then some—for example, wrappers to access Python objects as PyObjects in C++, as well as the tools to expose C++ objects to Python. Unlike SWIG, Boost.Python is a library, not a command-line tool, and there is no need to create an intermediate file with different formatting—it’s all written directly in C++. Boost.Python has an extensive, detailed tutorial if you wish to go this route.
1 Fowler is an advocate for best practices in software design and development, and one of continuous integration’s most vocal proponents. The quote is excerpted from his blog post on continuous integration. He hosted a series of discussions about test-driven development (TDD) and its relationship to extreme development with David Heinemeier Hansson (creater of Ruby on Rails) and Kent Beck (instigator of the extreme programming (XP) movement, with CI as one of its cornerstones).
2 On GitHub, other users submit pull requests to notify owners of another repository that they have changes they’d like to merge.
3 REST stands for “representational state transfer.” It’s not a standard or a protocol, just a set of design principles developed during the creation of the HTTP 1.1 standard. A list of relevant architectural constraints for REST is available on Wikipedia.
4 OpenStack provides free software for cloud networking, storage, and computation so that organizations can host private clouds for themselves or public clouds that third parties can pay to use.
5 Except for Salt-SSH, which is an alternative Salt architecture, probably created in response to users wanting an Ansible-like option from Salt.
6 David Beazley has a great guide (PDF) that describes how the GIL operates. He also covers the new GIL (PDF) in Python 3.2. His results show that maximizing performance in a Python application requires a strong understanding of the GIL, how it affects your specific application, how many cores you have, and where your application bottlenecks are.
7 Here is a full derivation of the method. Basically you’re throwing darts at a 2 x 2 square, with a circle that has radius = 1 inside. If the darts land with equal likelihood anywhere on the board, the percent that are in the circle is equal to Pi / 4. Which means 4 times the percent in the circle is equal to Pi.
8 It is possible for a language to both be strongly and dynamically typed, as described in this Stack Overflow discussion.
9 Special care must be taken when writing C extensions to make sure you register your threads with the interpreter.