Chapter 11. Data Persistence

We mentioned ZIP compression and pickling already in “Data Serialization”, so there isn’t much left to cover besides databases in this chapter.

This chapter is mostly about Python libraries that interface with relational databases. These are the kinds of database we normally think about—they contain structured data stored in tables and are accessed using SQL.¹

Structured Files

We already mentioned tools for JSON, XML, and ZIP files in Chapter 9, and pickling and XDR when talking about serialization. We recommend PyYAML (get it via pip install pyyaml) to parse YAML. Python also has tools in its Standard Library for CSV, *.netrc used by some FTP clients, *.plist files used in OS X, and a dialect of the Windows INI format via configparser.²

Also, there’s a persistent key-value store available via the shelve module in Python’s Standard Library. Its backend is the best available variant of the database manager (dbm—a key-value database) on your computer:³

>>> import shelve
>>>
>>> with shelve.open('my_shelf') as s:
...     s['d'] = {'key': 'value'}
...
>>> s = shelve.open('my_shelf', 'r')
>>> s['d']
{'key': 'value'}

You can check which database backend you’re using like this:

>>> import dbm
>>> dbm.whichdb('my_shelf')
'dbm.gnu'

And you can get the GNU implementation of dbm here for Windows, or check your package manager (brew, apt, yum) first, then try the dbm source code.

Database Libraries

The Python Database API (DB-API2) defines a standard interface for database access in Python. It’s documented in PEP 249 and in a more detailed introduction to Python’s DB-API. Nearly all Python database drivers conform to this interface, so when you just want to query a database in Python, choose any one that connects to the database that you are using: sqlite3 for the SQLite database, psycopg2 for Postgres, and MySQL-python for MySQL, for example.⁴

Code with lots of SQL strings and hardcoded columns and tables can quickly become messy, error-prone, and hard to debug. The libraries in Table 11-1 (except for sqlite3, the SQLite driver) provide a database abstraction layer (DAL) that abstracts away the structure, grammar, and data types of SQL to present an API.

Because Python is an object-oriented language, the database abstraction can also implement object-relational mapping (ORM) to provide a mapping between the Python objects and the underlying database, plus operators on attributes in those classes that represent an abstracted version of SQL in Python.

All of the libraries in Table 11-1 (with the exception of sqlite3 and Records) provide an ORM, and their implementations use one of two patterns:⁵ the Active Record pattern, where records simultaneously represent the abstracted data and interact with the database; and the Data Mapper pattern, where one layer interfaces with the database, another layer presents the data, and in between is a mapper function that performs the necessary logic to convert between the two (essentially performing the logic of a SQL view outside of the database).

When performing queries, both the Active Record and Data Mapper patterns behave about the same, but in the Data Mapper pattern, the user must explicitly state table names, add primary keys, and create helper tables to support many-to-many relationships (like on a receipt, where one transaction ID would be associated with multiple purchases) — all of that is done behind the scenes when using the Active Record pattern.

The most popular libraries are sqlite3, SqlAlchemy, and the Django ORM. Records is in a category of its own—as more of a SQL client that provides many options for output formatting—and the remaining libraries can be thought of as standalone, lighter weight versions of the Django ORM underneath (because they all use the ActiveRecord pattern), but with different implementations, and very different and unique APIs.

Table 11-1. Database libraries
Library	License	Reasons to use
sqlite3 (driver, not ORM)	PSFL	It’s in the Standard Library. It’s good for sites with low or moderate traffic that only need the simpler data types and a few queries—it’s got low latency because there’s no network communication. It’s good for learning SQL or Python’s DB-API, or prototyping a database application.
SQLAlchemy	MIT license	It provides a Data Mapper pattern with a two-layer API that has an ORM top layer resembling the API in other libraries, plus a low-level layer of tables directly attached to the database. It gives you explicit control (via the lower level Classical Mappings API) over the structure and schemas in your database; this is useful, for example, if your database administrators are not the same people as your web developers. Dialects: SQLite, PostgreSQL, MySQL, Oracle, MS-SQL Server, Firebird, or Sybase (or register your own).
Django ORM	BSD license	It provides the Active Record pattern that can generate the database infrastructure implicitly from the user-defined models in the application. It’s tightly coupled with Django. Dialects: SQLite, PostgreSQL, MySQL, or Oracle; alternatively, use a third-party library: SAP SQL Anywhere, IBM DB2, MS-SQL Server, Firebird, or ODBC.
peewee	MIT license	It provides an Active Record pattern, but that’s because the tables you define in the ORM are the tables you see in the database (plus an index column). Dialects: SQLite, MySQL, and Postgres (or add your own).
PonyORM	AGPLv3	It provides an Active Record pattern with an intuitive generator-based syntax. There is also an online GUI Entity-Relationship diagram editor (to draw the data model that defines the tables in a database and their relationship to each other) that can be translated to SQL code that will create the tables. Dialects: SQLite, MySQL, Postgres, and Oracle (or add your own).
SQLObject	LGPL	It was one of the first to use the ActiveRecord pattern in Python. Dialects: SQLite, MySQL, Postgres, Firebird, Sybase, MAX DB, MS-SQL Server (or add your own).
Records (query interface, not ORM)	ISC license	It provides a simple way to query a database and generate a report document: SQL in, XLS (or JSON or YAML or CSV or LaTex) out. Plus a command-line interface that can be used for interactive querying or one-line report generation. It uses the powerful SQLAlchemy as its backend.

The following sections provide additional details about the libraries listed in Table 11-1.

sqlite3

SQLite is a C library that provides the database behind sqlite3. The database is stored as a single file, by convention with the extension *.db. The “when to use SQLite” page says it’s been demonstrated to work as a database backend for websites with hundreds of thousands of hits per day. Their page also has a list of SQL commands that SQLite understands, and you can consult the W3Schools’ quick SQL reference for instructions on how to use them. Here’s an example:

import sqlite3
db = sqlite3.connect('cheese_emporium.db')

db.execute('CREATE TABLE cheese(id INTEGER, name TEXT)')
db.executemany(
    'INSERT INTO cheese VALUES (?, ?)',
    [(1, 'red leicester'),
     (2, 'wensleydale'),
     (3, 'cheddar'),
    ]
)
db.commit()
db.close()

The allowable SQLite types are NULL, INTEGER, REAL, TEXT, and BLOB (bytes), or you can do other stuff that’s in the sqlite3 documentation to register new data types (e.g., they implement a datetime.datetime type that is stored as TEXT).

SQLAlchemy

SQLAlchemy is a very popular database toolkit—Django comes with an option to switch from its own ORM to SQLAchemy, it’s the backend for the Flask mega-tutorial to build your own blog, and Pandas uses it as its SQL backend.

SQLAlchemy is the only library listed here to follow Martin Fowler’s Data Mapper pattern instead of the more frequently implemented Active Record pattern. Unlike the other libraries, SQLAlchemy not only provides an ORM layer but also a generalized API (called the Core layer) for writing database-agnostic code without SQL. The ORM layer is stacked on top of the Core layer, which uses Table objects that directly map to the underlying database. The mapping between these objects and the ORM must be done explicitly by the user, so it takes more code to get started, and can be frustrating for those who are new to relational databases. The benefit is a far greater control over the database—nothing is created unless you explicitly put it there.

SQLAlchemy can run on Jython and PyPy, and supports Python 2.5 through the latest 3.x versions. The next few code snippets will show the work required to create a many-to-many object mapping. We’ll create three objects in the ORM layer: Customer, Cheese, and Purchase. There can be many purchases for one customer (a many-to-one relation), and the purchase can be for many types of cheese (a many-to-many relation). The reason we’re doing this in such detail is to show the unmapped table purchases_cheeses—it does not need to be present in the ORM because its only purpose is to provide a linkage between the types of cheese and the purchases. Other ORMs would create this table silently in the background—so this shows one of the big differences between SQLAlchemy and the other libraries:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Date, Integer, String, Table, ForeignKey
from sqlalchemy.orm import relationship

Base = declarative_base()  

class Customer(Base):  
    __tablename__ = 'customers'
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    def __repr__(self):
       return "<Customer(name='%s')>" % (self.name)

purchases_cheeses = Table(  
    'purchases_cheeses', Base.metadata,
    Column('purch_id', Integer, ForeignKey('purchases.id', primary_key=True)),
    Column('cheese_id', Integer, ForeignKey('cheeses.id', primary_key=True))
)

class Cheese(Base):  
    __tablename__ = 'cheeses'
    id = Column(Integer, primary_key=True)
    kind = Column(String, nullable=False)
    purchases = relationship(  
        'Purchase', secondary='purchases_cheeses', back_populates='cheeses'  
    )
    def __repr__(self):
       return "<Cheese(kind='%s')>" % (self.kind)

class Purchase(Base):
    __tablename__ = 'purchases'
    id = Column(Integer, primary_key=True)
    customer_id = Column(Integer, ForeignKey('customers.id', primary_key=True))
    purchase_date = Column(Date, nullable=False)
    customer = relationship('Customer')
    cheeses = relationship(  
        'Cheese', secondary='purchases_cheeses', back_populates='purchases'
    )
    def __repr__(self):
       return ("<Purchase(customer='%s', dt='%s')>" %
                (self.customer.name, self.purchase_date))

: The declarative base object is a metaclass⁶ that intercepts the creation of each mapped table in the ORM and defines a corresponding table in the Core layer.
: Objects in the ORM layer inherit from the declarative base.
: This is an unmapped table in the core layer—it’s not a class and not derived from the declarative base. It will correspond the table purchases_cheeses in the database and exists to provide the many-to-many mapping between cheeses and purchase IDs.
: Compare that with Cheese—a mapped table in the ORM layer. Under the hood, Cheese.__table__ is created in the core layer. It will correspond to a table named cheeses in the database.
: This relationship explicitly defines the relationship between the mapped classes Cheese and Purchase: they are related indirectly through the secondary table purchases_cheeses (as opposed to directly via a ForeignKey).
: back_populates adds an event listener so that when a new Purchase object is added to Cheese.purchases, the Cheese object will also appear in Purchase.cheeses.
: This is the other half of the plumbing for the many-to-many relationship.

Tables are explicitly created by the declarative base:

from sqlalchemy import create_engine
engine = create_engine('sqlite://')
Base.metadata.create_all(engine)

And now the interaction, using objects in the ORM layer, looks the same as in the other libraries with ORMs:

from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
sess = Session()

leicester = Cheese(kind='Red Leicester')
camembert = Cheese(kind='Camembert')
sess.add_all((camembert, leicester))
cat = Customer(name='Cat')
sess.add(cat)
sess.commit()  

import datetime
d = datetime.date(1971, 12, 18)
p = Purchase(purchase_date=d, customer=cat)
p.cheeses.append(camembert)  
sess.add(p)
sess.commit()

: You must explicitly commit() to push changes to the database.
: Objects in the many-to-many relationship aren’t added during instantiation—they have to be appended after the fact.

Here are a few sample queries:

>>> for row in sess.query(Purchase,Cheese).filter(Purchase.cheeses):  
...     print(row)
...
(<Purchase(customer='Douglas', dt='1971-12-17')>, <Cheese(kind='Camembert')>)
(<Purchase(customer='Douglas', dt='1971-12-17')>, <Cheese(kind='Red Leicester')>)
(<Purchase(customer='Cat', dt='1971-12-18')>, <Cheese(kind='Camembert')>)
>>>
>>> from sqlalchemy import func
>>> (sess.query(Purchase,Cheese)  
...     .filter(Purchase.cheeses)
...     .from_self(Cheese.kind, func.count(Purchase.id))
...     .group_by(Cheese.kind)
... ).all()
[('Camembert', 2), ('Red Leicester', 1)]

: This is how to do the many-to-many join across the purchases_cheeses table, which is not mapped to a top-level ORM object.
: This query counts the number of purchases of each kind of cheese.

To learn more, see the SQLAlchemy documentation.

Django ORM

The Django ORM is the interface used by Django to provide database access. Their implementation of the Active Record pattern is probably the closest one in our list to the Ruby on Rails ActiveRecord library.

It is tightly integrated with Django, so usually you only use it because you’re making a Django web application. Try the Django ORM tutorial from Django Girls if you want to follow along while building a web application.⁷

If you want to try out Django’s ORM without making a whole web app, copy this skeleton GitHub project to use only Django’s ORM, and follow its instructions. There may be some differences across versions of Django. Ours settings.py looks like this:

# settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': 'tmp.db',
    }
}
INSTALLED_APPS = ("orm_only",)
SECRET_KEY = "A secret key may also be required."

Every abstracted table in the Django ORM subclasses the Django Model object, like this:

from django.db import models

class Cheese(models.Model):
    type = models.CharField(max_length=30)

class Customer(models.Model):
    name = models.CharField(max_length=50)

class Purchase(models.Model):
    purchase_date = models.DateField()
    customer = models.ForeignKey(Customer)  
    cheeses = models.ManyToManyField(Cheese)

: The ForeignKey relationship denotes a many-to-one relationship—the customer can make many purchases, but a purchase is associated with a single customer. Use OneToOneField for a one-to-one relation.
: And use ManyToManyField to denote a many-to-many relationship.

Next, we have to execute a command to build the tables. On the command line, with the virtual environment activated, and in the same directory as manage.py, type:

(venv)$ python manage.py migrate

With the tables created, here’s how to add data to the database. Without the instance.save() method, the data in the new row will not make it to the database:

leicester = Cheese.objects.create(type='Red Leicester')
camembert = Cheese.objects.create(type='Camembert')
leicester.save()  
camembert.save()

doug = Customer.objects.create(name='Douglas')
doug.save()

# Add a time of purchase
import datetime
now = datetime.datetime(1971, 12, 18, 20)
day = datetime.timedelta(1)

p = Purchase(purchase_date=now - 1 * day, customer=doug)
p.save()
p.cheeses.add(camembert, leicester)

: Objects must be saved to be added to the database and must be saved to be added in inserts that cross-reference other objects.
: You must add objects in a many-to-many mapping separately.

Querying via the ORM looks like this in Django:

# Filter for all purchases that happened in the past 7 days:
queryset = Purchase.objects.filter(purchase_date__gt=now - 7 * day)  

# Show who bought what cheeses in the query set:
for v in queryset.values('customer__name', 'cheeses__type'):  
    print(v)

# Aggregate purchases by cheese type:
from django.db.models import Count
sales_counts = (  
    queryset.values('cheeses__type')
    .annotate(total=Count('cheeses'))  
    .order_by('cheeses__type')
)
for sc in sales_counts:
    print(sc)

: In Django, the filtering operator (gt, greater than) is appended after a double underscore to the table’s attribute purchase_date—Django parses this under the hood.
: Double underscores after a foreign key identifier will access the attribute in the corresponding table.
: In case you haven’t seen the notation, you can put parentheses around a long statement and break it across lines for legibility.
: The query set’s annotate clause adds extra fields to each result.

peewee

The primary goal of peewee is to be a lightweight way for people who know SQL to interact with a database. What you see is what you get (you neither manually build a top layer that abstracts the table structure behind the scenes, like SQLAlchemy, nor does the library magically build a bottom layer underneath your tables, like Django ORM). Its goal is to fill a different niche than SQLAlchemy—doing a few things, but doing them quickly, simply, and Pythonically.

There is very little “magic,” except to create primary keys for the tables if the user didn’t. You’d create a table like this:

import peewee
database = peewee.SqliteDatabase('peewee.db')

class BaseModel(peewee.Model):
    class Meta:  
        database = database  

class Customer(BaseModel):
    name = peewee.TextField()  

class Purchase(BaseModel):
    purchase_date = peewee.DateField()
    customer = peewee.ForeignKeyField(Customer, related_name='purchases')  

class Cheese(BaseModel):
    kind = peewee.TextField()

class PurchaseCheese(BaseModel):
    """For the many-to-many relationship."""
    purchase = peewee.ForeignKeyField(Purchase)
    cheese = peewee.ForeignKeyField(Cheese)

database.create_tables((Customer, Purchase, Cheese, PurchaseCheese))

: peewee keeps model configuration details in a namespace called Meta, an idea borrowed from Django.
: Associate every Model with a database.
: A primary key is added implicitly if you don’t explicitly add it.
: This adds the attribute purchases to Customer records for easy access but doesn’t do anything to the tables.

Initialize data and add it to the database in one step with the create() method, or initialize it first, and add it later—there are configuration options to control autocommitting and utilities to do transactions. Here it’s done in one step:

leicester = Cheese.create(kind='Red Leicester')
camembert = Cheese.create(kind='Camembert')
cat = Customer.create(name='Cat')

import datetime
d = datetime.date(1971, 12, 18)

p = Purchase.create(purchase_date=d, customer=cat)  
PurchaseCheese.create(purchase=p, cheese=camembert)  
PurchaseCheese.create(purchase=p, cheese=leicester)

: Directly add an object (like cat), and peewee will use its primary key.
: There’s no magic for the many-to-many mapping—just add new entries manually.

And query like this:

>>> for p in Purchase.select().where(Purchase.purchase_date > d - 1 * day):
...     print(p.customer.name, p.purchase_date)
...
Douglas 1971-12-18
Cat 1971-12-19
>>>
>>> from peewee import fn
>>> q = (Cheese
...     .select(Cheese.kind, fn.COUNT(Purchase.id).alias('num_purchased'))
...     .join(PurchaseCheese)
...     .join(Purchase)
...     .group_by(Cheese.kind)
...  )
>>> for chz in q:
...     print(chz.kind, chz.num_purchased)
...
Camembert 2
Red Leicester 1

There is a collection of add-ons available, that include advanced transaction support⁸ and support for custom functions that can hook data and execute prior to storage—for example, compression or hashing.

PonyORM

PonyORM takes a different approach to the query grammar: instead of writing an SQL-like language or boolean expressions, it uses Python’s generator syntax. There’s also a graphical schema editor that can generate PonyORM entities for you. It supports Python 2.6+ and Python 3.3+.

To accomplish its intuitive syntax, Pony requires that all relationships between tables be bidirectional—all related tables must explicitly refer to each other, like this:

import datetime
from pony import orm

db = orm.Database()
db.bind('sqlite', ':memory:')

class Cheese(db.Entity):  
    type = orm.Required(str)  
    purchases = orm.Set(lambda: Purchase)  

class Customer(db.Entity):
    name = orm.Required(str)
    purchases = orm.Set(lambda: Purchase)  

class Purchase(db.Entity):
    date = orm.Required(datetime.date)
    customer = orm.Required(Customer)  
    cheeses = orm.Set(Cheese) 

db.generate_mapping(create_tables=True)

: A Pony database Entity stores an object’s state in the database, connecting the database to the object through its existence.
: Pony uses standard Python types to identify the type of the column—from str to datetime.datetime, in addition to the user-defined Entities like Purchase, Customer, and Cheese.
: lambda: Purchase is used here because Purchase is not yet defined.
: The orm.Set(lambda: Purchase) is the first half of the definition of the one-to-many Customer to Purchase relation.
: The orm.Required(Customer) is the second half of the one-to-many Customer to Purchase relationship.
: The orm.Set(Cheese) relationship, combined with the orm.Set(lambda: Purchase) in define a many-to-many relationship.

With the data entities defined, object instantiation looks like it does in the other libraries. Entities are created on the fly and committed with the call to orm.commit():

camembert = Cheese(type='Camembert')
leicester = Cheese(type='Red Leicester')
cat = Customer(name='Cat')
doug = Customer(name='Douglas')


d = datetime.date(1971, 12, 18)
day = datetime.timedelta(1)
Purchase(date=(d - 1 * day), customer=doug, cheeses={camembert, leicester})
Purchase(date=d, customer=cat, cheeses={camembert})
orm.commit()

And querying—Pony’s tour de force—really does look like it’s pure Python:

yesterday = d - 1.1 * day
for cheese in (
        orm.select(p.cheeses for p in Purchase if p.date > yesterday)  
    ):
    print(cheese.type)

for cheese, purchase_count in (
        orm.left_join((c, orm.count(p))  
        for c in Cheese
        for p in c.purchases)
    ):
    print(cheese.type, purchase_count)

: This is what a query looks like using Python’s generator syntax.
: The orm.count() function aggregates by counting.

SQLObject

SQLObject, first released in October 2002, is the oldest ORM in this list. Its implementation of the Active Record pattern—as well as its novel idea to overload the standard operators (like ==, <, <=, etc.) as a way of abstracting some of the SQL logic into Python, which is now implemented by almost all of the ORM libraries—made it extremely popular.

It supports a wide variety of databases (common database systems MySQL, Postgres, and SQLite, and more exotic systems like SAP DB, SyBase, and MSSQL) but currently only supports Python 2.6 and Python 2.7. It’s still actively maintained, but has become less prevalent with the adoption of SQLAlchemy.

Records

Records is a minimalist SQL library, designed for sending raw SQL queries to various databases. It’s basically Tablib and SQLAlchemy bundled together with a nice API and a command-line application that acts like a SQL client that can output YAML, XLS, and the other Tablib formats. Records isn’t by any means a replacement for ORM libraries; a typical use case would be to query a database and create a report (e.g., a monthly report saving the recent sales figures to a spreadsheet). Data can be used programatically, or exported to a number of useful data formats:

>>> import records
>>> db = records.Database('sqlite:///mydb.db')
>>>
>>> rows = db.query('SELECT * FROM cheese')
>>> print(rows.dataset)
name         |price
-------------|-----
red leicester|1.0
wensleydale  |2.2
>>>
>>> print(rows.export('json'))
[{"name": "red leicester", "price": 1.0}, {"name": "wensleydale", "price": 2.2}]

Records also includes a command-line tool that exports data using SQL, like this:

$ records 'SELECT * FROM cheese' yaml --url=sqlite:///mydb.db
- {name: red leicester, price: 1.0}
- {name: wensleydale, price: 2.2}

$ records 'SELECT * FROM cheese' xlsx --url=sqlite:///mydb.db  > cheeses.xlsx

NoSQL database libraries

There is also an entire universe of not only SQL databases—a catchall for any database that people are using that’s not a traditional relational database. If you look on PyPI, things can get confusing, with a few dozen similarly named Python packages. We recommend searching specifically on the main project’s site for Python to get an opinion on the best library for a product (i.e., run a Google search for “Python site:vendorname.com”). Most of these provide a Python API and quickstart tutorials for how to use it. Some examples:

MongoDB: MongoDB is a distributed document store. You can think of it like a giant Python dictionary that can live on a cluster, with its own filter and query language. For the Python API, see MongoDB’s getting started with Python page.
Cassandra: Cassandra is a distributed table store. It provides fast lookup and can tolerate wide tables but is not inteded for joins—rather, the paradigm is to have multiple duplicate views of the data that are keyed on different columns. For more on the Python APIs, see the planet Cassandra page.
HBase: HBase is a distributed column store (in this context, column store means data are stored like (row id, column name, value), allowing for very sparse arrays such as a dataset of “from” and “to” links for websites that make up the Web). It is built on top of Hadoop’s Distributed File System. For more information about Python APIs, see HBase’s “supporting projects” page.
Druid: Druid is a distributed column store intended to collect (and optionally aggregate before it stores) event data (in this context, column store means the columns can be ordered and sorted, and then storage may be compressed for faster I/O and smaller footprint). Here is a link to Druid’s Python API on GitHub.
Redis: Redis is a distributed in-memory key value store—the point is to reduce latency by not having to do disk I/O. You could store frequent query results for faster web lookup, for example. Here is a list of Python clients for Redis that highlights redis-py as their preferred interface, and here is the redis-py page.
Couchbase: Couchbase is another distributed document store, with a more SQL-like API (as compared to MongoDB’s JavaScript-like API)—here is a link to Couchbase’s Python SDK.
Neo4j: Neo4j is a graph database, intended to store objects with graph-like relationships. Here is a link to Neo4j’s Python guide.
LMDB: LMDB, the Symas Lightning Memory-mapped Database is a key-value store database with a memory-mapped file, meaning the file doesn’t have to be read from the beginning to get to the part where the data is—so the performance is near the speed of an in-memory store . Python bindings are in the lmdb library.

¹ Relational databases were introduced by in 1970 by Edgar F. Codd, who, while working at IBM, wrote “A Relational Model of Data for Large Share Data Banks.” It was ignored until 1977, when Larry Ellison started a company—which would eventually become Oracle—based on its technology. Other competing ideas, like key-value stores and hierarchical database models, were largely ignored after the success of relational databases, until the recent not only SQL (NoSQL) movement revived nonrelational storage options in a cluster computing setting.

² It’s ConfigParser in Python 2—see the configparser documentation for the precise dialect understood by the parser.

³ The dbm library stores key-value pairs in an on-disk hash table. The precise way that this happens depends on whether it’s using the gdbm, ndbm, or “dumb” backend. The “dumb” one is implemented in Python and described in the documentation. The other two are in the gdbm manual. With ndbm there’s an upper bound on the value sizes stored. The file is locked when opened for writing unless (with gdbm only) you open the database file with an ru or wu, and even then, updates in write mode may not be visible on the other connections.

⁴ Although the Structured Query Language (SQL) is an ISO standard, database vendors choose how much of the standard to implement, and can add their own features. This means a Python library that serves as a database driver must understand the dialect of SQL that is spoken by the database it interfaces with.

⁵ Defined in Martin Fowler’s Patterns of Enterprise Application Architecture. For more information on what goes into Python’s ORM designs, we recommend the SQLAlchemy entry in the “Architecture of Open Source Applications,” and this comprehensive list of links related to Python ORMs from FullStack Python.

⁶ There’s a great explanation of Python metaclasses on Stack Overflow.

⁷ Django Girls is a phenomenal charity organization of brilliant programmers dedicated to providing free Django training in a celebratory environment to women around the world.

⁸ Transaction contexts allow you to roll back executions if an error occurs in an intermediate step.

Previous Chapter

10. Data Manipulation

Next Chapter

A. Additional Notes