Table of Contents for
Using SQLite

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Using SQLite by Jay A. Kreibich Published by O'Reilly Media, Inc., 2010
  1. Cover
  2. Using SQLite
  3. O'Reilly Strata Conference
  4. Using SQLite
  5. Dedication
  6. A Note Regarding Supplemental Files
  7. Preface
  8. SQLite Versions
  9. Email Lists
  10. Example Code Download
  11. How We Got Here
  12. Conventions Used in This Book
  13. Using Code Examples
  14. Safari® Books Online
  15. How to Contact Us
  16. 1. What Is SQLite?
  17. Self-Contained, No Server Required
  18. Single File Database
  19. Zero Configuration
  20. Embedded Device Support
  21. Unique Features
  22. Compatible License
  23. Highly Reliable
  24. 2. Uses of SQLite
  25. Database Junior
  26. Application Files
  27. Application Cache
  28. Archives and Data Stores
  29. Client/Server Stand-in
  30. Teaching Tool
  31. Generic SQL Engine
  32. Not the Best Choice
  33. Big Name Users
  34. 3. Building and Installing SQLite
  35. SQLite Products
  36. Precompiled Distributions
  37. Documentation Distribution
  38. Source Distributions
  39. Building
  40. Build and Installation Options
  41. An sqlite3 Primer
  42. Summary
  43. 4. The SQL Language
  44. Learning SQL
  45. Brief Background
  46. General Syntax
  47. SQL Data Languages
  48. Data Definition Language
  49. Data Manipulation Language
  50. Transaction Control Language
  51. System Catalogs
  52. Wrap-up
  53. 5. The SELECT Command
  54. SQL Tables
  55. The SELECT Pipeline
  56. Advanced Techniques
  57. SELECT Examples
  58. What’s Next
  59. 6. Database Design
  60. Tables and Keys
  61. Common Structures and Relationships
  62. Normal Form
  63. Indexes
  64. Transferring Design Experience
  65. Closing
  66. 7. C Programming Interface
  67. API Overview
  68. Library Initialization
  69. Database Connections
  70. Prepared Statements
  71. Bound Parameters
  72. Convenience Functions
  73. Result Codes and Error Codes
  74. Utility Functions
  75. Summary
  76. 8. Additional Features and APIs
  77. Date and Time Features
  78. ICU Internationalization Extension
  79. Full-Text Search Module
  80. R*Trees and Spatial Indexing Module
  81. Scripting Languages and Other Interfaces
  82. Mobile and Embedded Development
  83. Additional Extensions
  84. 9. SQL Functions and Extensions
  85. Scalar Functions
  86. Aggregate Functions
  87. Collation Functions
  88. SQLite Extensions
  89. 10. Virtual Tables and Modules
  90. Introduction to Modules
  91. Module API
  92. Simple Example: dblist Module
  93. Advanced Example: weblog Module
  94. Best Index and Filter
  95. Wrap-Up
  96. A. SQLite Build Options
  97. Shell Directives
  98. ENABLE_READLINE
  99. Default Values
  100. SQLITE_DEFAULT_AUTOVACUUM
  101. SQLITE_DEFAULT_CACHE_SIZE
  102. SQLITE_DEFAULT_FILE_FORMAT
  103. SQLITE_DEFAULT_JOURNAL_SIZE_LIMIT
  104. SQLITE_DEFAULT_MEMSTATUS
  105. SQLITE_DEFAULT_PAGE_SIZE
  106. SQLITE_DEFAULT_TEMP_CACHE_SIZE
  107. YYSTACKDEPTH
  108. Sizes and Limits
  109. SQLITE_MAX_ATTACHED
  110. SQLITE_MAX_COLUMN
  111. SQLITE_MAX_COMPOUND_SELECT
  112. SQLITE_MAX_DEFAULT_PAGE_SIZE
  113. SQLITE_MAX_EXPR_DEPTH
  114. SQLITE_MAX_FUNCTION_ARG
  115. SQLITE_MAX_LENGTH
  116. SQLITE_MAX_LIKE_PATTERN_LENGTH
  117. SQLITE_MAX_PAGE_COUNT
  118. SQLITE_MAX_PAGE_SIZE
  119. SQLITE_MAX_SQL_LENGTH
  120. SQLITE_MAX_TRIGGER_DEPTH
  121. SQLITE_MAX_VARIABLE_NUMBER
  122. Operation and Behavior
  123. SQLITE_CASE_SENSITIVE_LIKE
  124. SQLITE_HAVE_ISNAN
  125. SQLITE_OS_OTHER
  126. SQLITE_SECURE_DELETE
  127. SQLITE_THREADSAFE
  128. SQLITE_TEMP_STORE
  129. Debug Settings
  130. SQLITE_DEBUG
  131. SQLITE_MEMDEBUG
  132. Enable Extensions
  133. SQLITE_ENABLE_ATOMIC_WRITE
  134. SQLITE_ENABLE_COLUMN_METADATA
  135. SQLITE_ENABLE_FTS3
  136. SQLITE_ENABLE_FTS3_PARENTHESIS
  137. SQLITE_ENABLE_ICU
  138. SQLITE_ENABLE_IOTRACE
  139. SQLITE_ENABLE_LOCKING_STYLE
  140. SQLITE_ENABLE_MEMORY_MANAGEMENT
  141. SQLITE_ENABLE_MEMSYS3
  142. SQLITE_ENABLE_MEMSYS5
  143. SQLITE_ENABLE_RTREE
  144. SQLITE_ENABLE_STAT2
  145. SQLITE_ENABLE_UPDATE_DELETE_LIMIT
  146. SQLITE_ENABLE_UNLOCK_NOTIFY
  147. YYTRACKMAXSTACKDEPTH
  148. Limit Features
  149. SQLITE_DISABLE_LFS
  150. SQLITE_DISABLE_DIRSYNC
  151. SQLITE_ZERO_MALLOC
  152. Omit Core Features
  153. B. sqlite3 Command Reference
  154. Command-Line Options
  155. Interactive Dot-Commands
  156. .backup
  157. .bail
  158. .databases
  159. .dump
  160. .echo
  161. .exit
  162. .explain
  163. .headers
  164. .help
  165. .import
  166. .indices
  167. .iotrace
  168. .load
  169. .log
  170. .mode
  171. .nullvalue
  172. .output
  173. .prompt
  174. .quit
  175. .read
  176. .restore
  177. .schema
  178. .separator
  179. .show
  180. .tables
  181. .timeout
  182. .timer
  183. .width
  184. C. SQLite SQL Command Reference
  185. SQLite SQL Commands
  186. ALTER TABLE
  187. ANALYZE
  188. ATTACH DATABASE
  189. BEGIN TRANSACTION
  190. COMMIT TRANSACTION
  191. CREATE INDEX
  192. CREATE TABLE
  193. CREATE TRIGGER
  194. CREATE VIEW
  195. CREATE VIRTUAL TABLE
  196. DELETE
  197. DETACH DATABASE
  198. DROP INDEX
  199. DROP TABLE
  200. DROP TRIGGER
  201. DROP VIEW
  202. END TRANSACTION
  203. EXPLAIN
  204. INSERT
  205. PRAGMA
  206. REINDEX
  207. RELEASE SAVEPOINT
  208. REPLACE
  209. ROLLBACK TRANSACTION
  210. SAVEPOINT
  211. SELECT
  212. UPDATE
  213. VACUUM
  214. D. SQLite SQL Expression Reference
  215. Literal Expressions
  216. Logic Representations
  217. Unary Expressions
  218. Binary Expressions
  219. Function Calls
  220. Column Names
  221. General Expressions
  222. AND
  223. BETWEEN
  224. CASE
  225. CAST
  226. COLLATE
  227. EXISTS
  228. GLOB
  229. IN
  230. IS
  231. ISNULL
  232. LIKE
  233. MATCH
  234. NOTNULL
  235. OR
  236. RAISE
  237. REGEXP
  238. SELECT
  239. E. SQLite SQL Function Reference
  240. Scalar Functions
  241. abs()
  242. changes()
  243. coalesce()
  244. date()
  245. datetime()
  246. glob()
  247. ifnull()
  248. hex()
  249. julianday()
  250. last_insert_rowid()
  251. length()
  252. like()
  253. load_extension()
  254. lower()
  255. ltrim()
  256. match()
  257. max()
  258. min()
  259. nullif()
  260. quote()
  261. random()
  262. randomblob()
  263. regex()
  264. replace()
  265. round()
  266. rtrim()
  267. sqlite_compileoption_get()
  268. sqlite_compileoption_used()
  269. sqlite_source_id()
  270. sqlite_version()
  271. strftime()
  272. substr()
  273. time()
  274. total_changes()
  275. trim()
  276. typeof()
  277. upper()
  278. zeroblob()
  279. Aggregate Functions
  280. avg()
  281. count()
  282. group_concat()
  283. max()
  284. min()
  285. sum()
  286. total()
  287. F. SQLite SQL PRAGMA Reference
  288. SQLite PRAGMAs
  289. auto_vacuum
  290. cache_size
  291. case_sensitive_like
  292. collation_list
  293. count_changes
  294. database_list
  295. default_cache_size
  296. encoding
  297. foreign_keys
  298. foreign_key_list
  299. freelist_count
  300. full_column_names
  301. fullfsync
  302. ignore_check_constraints
  303. incremental_vacuum
  304. index_info
  305. index_list
  306. integrity_check
  307. journal_mode
  308. journal_size_limit
  309. legacy_file_format
  310. locking_mode
  311. lock_proxy_file
  312. lock_status
  313. max_page_count
  314. omit_readlock
  315. page_count
  316. page_size
  317. parser_trace
  318. quick_check
  319. read_uncommitted
  320. recursive_triggers
  321. reverse_unordered_selects
  322. schema_version
  323. secure_delete
  324. short_column_names
  325. sql_trace
  326. synchronous
  327. table_info
  328. temp_store
  329. temp_store_directory
  330. user_version
  331. vdbe_trace
  332. vdbe_listing
  333. writable_schema
  334. G. SQLite C API Reference
  335. API Datatypes
  336. sqlite3
  337. sqlite3_backup
  338. sqlite3_blob
  339. sqlite3_context
  340. sqlite3_int64, sqlite3_uint64, sqlite_int64, sqlite_uint64
  341. sqlite3_module
  342. sqlite3_mutex
  343. sqlite3_stmt
  344. sqlite3_value
  345. sqlite3_vfs
  346. API Functions
  347. sqlite3_aggregate_context()
  348. sqlite3_auto_extension()
  349. sqlite3_backup_finish()
  350. sqlite3_backup_init()
  351. sqlite3_backup_pagecount()
  352. sqlite3_backup_remaining()
  353. sqlite3_backup_step()
  354. sqlite3_bind_xxx()
  355. sqlite3_bind_parameter_count()
  356. sqlite3_bind_parameter_index()
  357. sqlite3_bind_parameter_name()
  358. sqlite3_blob_bytes()
  359. sqlite3_blob_close()
  360. sqlite3_blob_open()
  361. sqlite3_blob_read()
  362. sqlite3_blob_write()
  363. sqlite3_busy_handler()
  364. sqlite3_busy_timeout()
  365. sqlite3_changes()
  366. sqlite3_clear_bindings()
  367. sqlite3_close()
  368. sqlite3_collation_needed()
  369. sqlite3_column_xxx()
  370. sqlite3_column_bytes()
  371. sqlite3_column_count()
  372. sqlite3_column_database_name()
  373. sqlite3_column_decltype()
  374. sqlite3_column_name()
  375. sqlite3_column_origin_name()
  376. sqlite3_column_table_name()
  377. sqlite3_column_type()
  378. sqlite3_commit_hook()
  379. sqlite3_compileoption_get()
  380. sqlite3_compileoption_used()
  381. sqlite3_complete()
  382. sqlite3_config()
  383. sqlite3_context_db_handle()
  384. sqlite3_create_collation()
  385. sqlite3_create_function()
  386. sqlite3_create_module()
  387. sqlite3_data_count()
  388. sqlite3_db_config()
  389. sqlite3_db_handle()
  390. sqlite3_db_mutex()
  391. sqlite3_db_status()
  392. sqlite3_declare_vtab()
  393. sqlite3_enable_load_extension()
  394. sqlite3_enable_shared_cache()
  395. sqlite3_errcode()
  396. sqlite3_errmsg()
  397. sqlite3_exec()
  398. sqlite3_extended_errcode()
  399. sqlite3_extended_result_codes()
  400. sqlite3_file_control()
  401. sqlite3_finalize()
  402. sqlite3_free()
  403. sqlite3_free_table()
  404. sqlite3_get_autocommit()
  405. sqlite3_get_auxdata()
  406. sqlite3_get_table()
  407. sqlite3_initialize()
  408. sqlite3_interrupt()
  409. sqlite3_last_insert_rowid()
  410. sqlite3_libversion()
  411. sqlite3_libversion_number()
  412. sqlite3_limit()
  413. sqlite3_load_extension()
  414. sqlite3_log()
  415. sqlite3_malloc()
  416. sqlite3_memory_highwater()
  417. sqlite3_memory_used()
  418. sqlite3_mprintf()
  419. sqlite3_mutex_alloc()
  420. sqlite3_mutex_enter()
  421. sqlite3_mutex_free()
  422. sqlite3_mutex_held()
  423. sqlite3_mutex_leave()
  424. sqlite3_mutex_notheld()
  425. sqlite3_mutex_try()
  426. sqlite3_next_stmt()
  427. sqlite3_open()
  428. sqlite3_open_v2()
  429. sqlite3_overload_function()
  430. sqlite3_prepare_xxx()
  431. sqlite3_profile()
  432. sqlite3_progress_handler()
  433. sqlite3_randomness()
  434. sqlite3_realloc()
  435. sqlite3_release_memory()
  436. sqlite3_reset()
  437. sqlite3_reset_auto_extension()
  438. sqlite3_result_xxx()
  439. sqlite3_result_error_xxx()
  440. sqlite3_rollback_hook()
  441. sqlite3_set_authorizer()
  442. sqlite3_set_auxdata()
  443. sqlite3_shutdown()
  444. sqlite3_sleep()
  445. sqlite3_snprintf()
  446. sqlite3_soft_heap_limit()
  447. sqlite3_sourceid()
  448. sqlite3_sql()
  449. sqlite3_status()
  450. sqlite3_step()
  451. sqlite3_stmt_status()
  452. sqlite3_strnicmp()
  453. sqlite3_table_column_metadata()
  454. sqlite3_threadsafe()
  455. sqlite3_total_changes()
  456. sqlite3_trace()
  457. sqlite3_unlock_notify()
  458. sqlite3_update_hook()
  459. sqlite3_user_data()
  460. sqlite3_value_xxx()
  461. sqlite3_value_bytes()
  462. sqlite3_value_numeric_type()
  463. sqlite3_value_type()
  464. sqlite3_version[]
  465. sqlite3_vfs_find()
  466. sqlite3_vfs_register()
  467. sqlite3_vfs_unregister()
  468. sqlite3_vmprintf()
  469. Index
  470. About the Author
  471. Colophon
  472. Copyright

Common Structures and Relationships

Database design has a small number of design structures and relationships that act as basic building blocks. Once you master how to use them and how to manipulate them, you can use these building blocks to build much larger and more complex data representations.

One-to-One Relationships

The most basic kind of inter-table relationship is the one-to-one relationship. As you can guess, this type of relationship establishes a reference from a single row in one table to a single row in another table. Most commonly, one-to-one relationships are represented by having a foreign key in one table reference a primary key in another table. If the foreign key column is made unique, only one reference will be allowed. As Figure 6-3 illustrates, a unique foreign key creates a one-to-one relationship between the rows of the two tables.

In a one-to-one relationship, table B has a foreign key that references the primary key of table A. This associates every non-NULL foreign key in table B with some row in table A.
Figure 6-3. In a one-to-one relationship, table B has a foreign key that references the primary key of table A. This associates every non-NULL foreign key in table B with some row in table A.

In the strictest sense, a foreign key relationship is a one-to-(zero or one) relationship. When two tables are involved in a one-to-one relationship, there is nothing to enforce that every primary key has an incoming reference from a foreign key. For that matter, if the foreign key allows NULLs, there may be unassigned foreign keys as well.

One-to-one relationships are commonly used to create detail tables. As the name implies, a detail table typically holds details that are linked to the records in a more prominent table. Detail tables can be used to hold data that is only relevant to a small subsection of the database application. Breaking the detail data apart from the main tables allows different sections of the database design to evolve and grow much more independently.

Detail tables can also be helpful when extended data only applies to a limited number of records. For example, a website might have a sales_items table that lists common information (price, inventory, weight, etc.) for all available items. Type-specific data can then be held in detail tables, such as cd_info for CDs (artist name, album name, etc.) or dvd_info (directors, studio, etc.) for DVDs. Although the sales_items table would have a unique one-to-one relationship with every type-specific info table, each individual row in the sales_item table would be referenced by only one type of detail table.

One-to-one relationships can also be used to isolate very large data elements, such as BLOBs. Consider an employee database that contains an image of each employee. Due to the data storage and I/O overhead, it might be unwise to include a photo column directly in the employee table, but it is easy to create a photo table that references the employee table. Consider these two tables:

CREATE TABLE employee ( 
    employee_id  INTEGER   NOT NULL   PRIMARY KEY,
    name         TEXT   NOT NULL
    /* ...etc... */
);

CREATE TABLE employee_photo (
    employee_id  INTEGER   NOT NULL   PRIMARY KEY
                 REFERENCES employee,
    photo_data   BLOB
    /* ...etc... */
);

This example is a bit unique, because the employee_photo.employee_id column is both the primary key for the employee_photo table, as well as a foreign key to the employee table. Since we want a one-to-one relationship, it makes sense to just pair up primary keys. Because this foreign key does not allow NULL keys, every employee_photo row must be matched to a specific employee row. The database does not guarantee that every employee will have a matching employee_photo, however.

One-to-Many Relationships

One-to-many relationships establish a link between a single row in one table to multiple rows in another table. This is done by expanding a one-to-one relationship, allowing one side to contain duplicate keys. One-to-many relationships are often used to associate lists or arrays of items back to a single row. For example, multiple shipping addresses can be linked to a single account. Unless otherwise specified, “many” usually means “zero or more.”

The only difference between a one-to-one relationship and a one-to-many relationship is that the one-to-many relationship allows for duplicate foreign key values. This allows multiple rows in one table (the many table) to refer back to a single row in another table (the One table). In building a one-to-many relationship, the foreign key must always be on the many side.

Figure 6-4 illustrates the relationship in more detail.

When building a one-to-many relationship, the primary keys must be unique, but foreign key columns may contain duplicate values. This means a one-to-many relationship must have the foreign key on the many side. Here we see a foreign key value in Table B refer to the primary key in Table A.
Figure 6-4. When building a one-to-many relationship, the primary keys must be unique, but foreign key columns may contain duplicate values. This means a one-to-many relationship must have the foreign key on the many side. Here we see a foreign key value in Table B refer to the primary key in Table A.

Any time you find yourself wondering how to stuff an array or list into the column of a table, the solution is to separate the array elements out into their own table and establish a one-to-many relationship.

Note

If you need to represent a list or array, try using a one-to-many relationship.

The same is true any time you start to contemplate sets of columns, such as item0, item1, item2, etc., that are all designed to hold instances of the same type of value. Such designs have inherent limits, and the insert, update, and removal process becomes quite complex. It is much easier to just break the data out into its own table and establish a proper relationship.

One example of a one-to-many relationship is music albums, and the songs they contain. Each album has a list of songs associated with that album. For example:

CREATE TABLE albums (
        album_id INTEGER   NOT NULL   PRIMARY KEY,
        album_name TEXT );

CREATE TABLE tracks (
        track_id INTEGER   NOT NULL   PRIMARY KEY,
        track_name TEXT,
        track_number INTEGER,
        track_length INTEGER, -- in seconds
        album_id INTEGER   NOT NULL   REFERENCES albums );

Each album and track has a unique ID. Each track also has a foreign key reference back to its album. Consider:

INSERT INTO albums VALUES ( 1, "The Indigo Album" );
INSERT INTO tracks VALUES ( 1, "Metal Onion", 1, 137, 1 );
INSERT INTO tracks VALUES ( 2, "Smooth Snake", 2, 212, 1 );
INSERT INTO tracks VALUES ( 3, "Turn A", 3, 255, 1 );

INSERT INTO albums VALUES ( 2, "Morning Jazz" );
INSERT INTO tracks VALUES ( 4, "In the Bed", 1, 214, 2 );
INSERT INTO tracks VALUES ( 5, "Water All Around", 2, 194, 2 );
INSERT INTO tracks VALUES ( 6, "Time Soars", 3, 265, 2 );
INSERT INTO tracks VALUES ( 7, "Liquid Awareness", 4, 175, 2 );

To get a simple list of tracks and their associated album, we just join the tables back together. We can also sort by both album name and track number:

sqlite> SELECT album_name, track_name, track_number
   ...>   FROM albums JOIN tracks USING ( album_id )
   ...>   ORDER BY album_name, track_number;

album_name    track_name  track_number
------------  ----------  ------------
Morning Jazz  In the Bed  1           
Morning Jazz  Water All   2           
Morning Jazz  Time Soars  3           
Morning Jazz  Liquid Awa  4           
The Indigo A  Metal Onio  1           
The Indigo A  Smooth Sna  2           
The Indigo A  Turn A      3           

We can also manipulate the track groupings:

sqlite> SELECT album_name, sum( track_length ) AS runtime, count(*) AS tracks
   ...>   FROM albums JOIN tracks USING ( album_id )
   ...>   GROUP BY album_id;

album_name        runtime     tracks    
----------------  ----------  ----------
The Indigo Album  604         3         
Morning Jazz      848         4         

This query groups the tracks based off their album, and then aggregates the track data together.

Many-to-Many Relationships

The next step is the many-to-many relationship. A many-to-many relationship associates one row in the first table to many rows in the second table while simultaneously allowing individual rows in the second table to be linked to multiple rows in the first table. In a sense, a many-to-many relationship is really two one-to-many relationships built across each other.

Figure 6-5 shows the classic many-to-many example of people and groups. One person can belong to many different groups, while each group is made up of many different people. Common operations are to find all the groups a person belongs to, or to find all the people in a group.

A many-to-many relationship is like two one-to-many relationships built across each other. In this example, each individual person can be a member of one or more groups, while each group can contain one or more people.
Figure 6-5. A many-to-many relationship is like two one-to-many relationships built across each other. In this example, each individual person can be a member of one or more groups, while each group can contain one or more people.

Many-to-many relationships are a bit more complex than other relationships. Although the tables have a many-to-many relationship with each other, the entries in both tables must remain unique. We cannot duplicate either person rows or group rows for the purpose of matching keys. This is a problem, since each foreign key can only reference one row. This makes it impossible for a foreign key of one table (such as a group) to refer to multiple rows of another table (such as people).

To solve this, we go back to the advice from before: if you need to add a list to a row, break out that list into its own table and establish a one-to-many relationship with the new table. You cannot directly represent a many-to-many relationship with only two tables, but you can take a pair of one-to-many relationships and link them together. The link requires a small table, known as a link table, or bridge table, that sits between the two many tables. Each many-to-many relationship requires a unique bridge table.

In its most basic form, the bridge table consists of nothing but two foreign keys—one for each of the tables it is linking together. Each row of the bridge table links one row in the first many table to one row in the second many table. In our People-to-Groups example, the link table defines a membership of one person in one group. This is illustrated in Figure 6-6.

Implementing a many-to-many relationship requires a bridge table. In this example, each row of the bridge table represents a membership of one person in one group. Note that the primary key of the bridge table is a multicolumn key over (p_id, g_id). This keeps memberships unique.
Figure 6-6. Implementing a many-to-many relationship requires a bridge table. In this example, each row of the bridge table represents a membership of one person in one group. Note that the primary key of the bridge table is a multicolumn key over (p_id, g_id). This keeps memberships unique.

The logical representation of a many-to-many relationship is actually a one-to-many-to-one relationship, with the bridge table (acting as a record of membership) in the middle. Each person has a one-to-many relationship with memberships, just as groups have a one-to-many relationship with memberships. In addition to binding the two tables together, the bridge table can also hold additional information about the membership itself, such as a first-joined date or an expiration date.

Here are three tables. The people and groups table are obvious enough. The p_g_bridge table acts as the bridge table between the people table and groups table. The two columns are both foreign key references, one to people and one to groups. Establishing a primary key across both foreign key columns ensures the memberships remain unique:

CREATE TABLE people ( pid INTEGER   PRIMARY KEY, name TEXT, ... );
CREATE TABLE groups ( gid INTEGER   PRIMARY KEY, name TEXT, ... );
CREATE TABLE p_g_bridge( 
        pid INTEGER   NOT NULL   REFERENCES people,
        gid INTEGER   NOT NULL   REFERENCES groups,
        PRIMARY KEY ( pid, gid )
);

This query will list all the groups that a person belongs to:

SELECT groups.name AS group_name
  FROM people JOIN p_g_bridge USING ( pid ) JOIN groups USING ( gid )
  WHERE people.name = search_person_name;

The query simply links people to groups using the bridge table, and then filters out the appropriate rows.

We don’t always need all three tables. This query counts all the members of a group without utilizing the people table:

SELECT name AS group_name, count(*) AS members
  FROM groups JOIN p_g_bridge USING ( gid )
  GROUP BY gid;

There are many other queries we can do, like finding all the groups that have no members:

SELECT name AS group_name
  FROM groups LEFT OUTER JOIN p_g_bridge USING ( gid )
  WHERE pid IS NULL;

This query performs an outer join from the groups table to the p_g_bridge table. Any unmatched group rows will be padded out with a NULL in the p_g_bridge.pid column. Since this column is marked NOT NULL, we know the only possible way for a row to be NULL in that column is from the outer join, meaning the row is unmatched to any memberships. A very similar query could be used to find any people that have no memberships.

Hierarchies and Trees

Hierarchies and other tree-style data relationships are common and frequently show up in database design. Modeling one in a database can be a challenge because you tend to ask different types of questions when dealing with hierarchies.

Common tree operations include finding all the sub-nodes under a given node, or querying the depth of a given node. These operations have an inherent recursion in them—a concept that SQL doesn’t support. This can lead to clumsy queries or complex representations.

There are two common methods for representing a tree relation using database tables. The first is the adjacency model, which uses a simple representation that is easy to modify but complex to query. The other common representation is the nested set, which allows relatively simple queries, but at the cost of a more complex representation that can be expensive to modify.

Adjacency Model

A tree is basically a series of nodes that have a one-to-many relationship between the parents and the children. We already know how to define a one-to-many relationship: give each child a foreign key that points to the primary key of the parent. The only trick is that the same table is sitting on both sides of the relationship.

For example, here is a basic adjacency model table:

CREATE TABLE tree (
    node   INTEGER   NOT NULL   PRIMARY KEY,
    name   TEXT,
    parent INTEGER   REFERENCES tree );

Each node in the tree will have a unique node identifier. Each node will also have a reference to its parent node. The root of the tree can simply have a NULL parent reference. This allows multiple trees to be stored in the same table, as multiple nodes can be defined as a root node of different trees.

If we want to represent this tree:

A
  A.1
    A.1.a
  A.2
    A.2.a
    A.2.b
  A.3

We would use the following data:

INSERT INTO tree VALUES ( 1, 'A',     NULL );
INSERT INTO tree VALUES ( 2, 'A.1',   1 );
INSERT INTO tree VALUES ( 3, 'A.1.a', 2 );
INSERT INTO tree VALUES ( 4, 'A.2',   1 );
INSERT INTO tree VALUES ( 5, 'A.2.a', 4 );
INSERT INTO tree VALUES ( 6, 'A.2.b', 4 );
INSERT INTO tree VALUES ( 7, 'A.3',   1 );

The following query will give a list of nodes and parents by joining the tree table to itself:

sqlite> SELECT n.name AS node, p.name AS parent
   ...>   FROM tree AS n JOIN tree AS p ON n.parent = p.node;

node        parent    
----------  ----------
A.1         A         
A.1.a       A.1       
A.2         A         
A.2.a       A.2       
A.2.b       A.2       
A.3         A         

Inserting or removing nodes is fairly straightforward, as is moving subtrees around to different parents.

What isn’t easy are tree-centric operations, like counting all of the nodes in a subtree, or computing the depth of a specific node (often used for output formatting). You can do limited traversals (like finding a grand-parent) by joining the tree table to itself multiple times, but you cannot write a single query that can compute an answer to these types of questions on a tree of arbitrary size. The only choice is to write application routines that loop over different levels in the tree, computing the answers you seek.

Overall, the adjacency model is easy to understand, and the trees are easy to modify. The model is based on foreign keys, and can take full advantage of the database’s built-in referential integrity. The major disadvantage is that many types of common data queries require the application code to loop over several individual database queries and assist in calculating answers.

Nested set

As the name implies, the nested set representation depends on nesting groups of nodes inside other groups. Rather than representing some type of parent-child relationship, each node holds bounding data about the full subtree underneath it. With some clever math, this allows us to query all kinds of information about a node.

A nested set table might look like this:

CREATE TABLE nest (
    name  TEXT,
    lower INTEGER   NOT NULL   UNIQUE,
    upper INTEGER   NOT NULL   UNIQUE,
    CHECK ( lower < upper ) );

The nested set can be visualized by converting the tree structure into a parenthetical representation. We then count the parentheses and record the upper and lower bound of each nested set. The index numbers can also be calculated with a depth-first tree traversal, where the lower bound is a pre-visit count and the upper bound is a post-visit count.

A( A.1( A.1.a( ) ), A.2( A.2.a( ), A.2.b( )  ), A.3(  )  )
 1    2      3 4 5     6      7 8       9 10 11    12 13 14

Or, in SQL:

INSERT INTO nest VALUES ( 'A',      1, 14 );
INSERT INTO nest VALUES ( 'A.1',    2,  5 );
INSERT INTO nest VALUES ( 'A.1.a',  3,  4 );
INSERT INTO nest VALUES ( 'A.2',    6, 11 );
INSERT INTO nest VALUES ( 'A.2.a',  7,  8 );
INSERT INTO nest VALUES ( 'A.2.b',  9, 10 );
INSERT INTO nest VALUES ( 'A.3',   12, 13 );

This might not look that useful, but it allows a number of different queries. For example, if you want to find all the leaf nodes (nodes without children) just look for nodes that have an upper and lower value that are next to each other:

SELECT name FROM nest WHERE lower + 1 = upper;

You can find the depth of a node by counting all of its ancestors:

sqlite> SELECT n.name AS name, count(*) AS depth
   ...>   FROM nest AS n JOIN nest AS p
   ...>     ON p.lower <= n.lower AND p.upper >= n.upper
   ...>   GROUP BY n.name;

name        depth     
----------  ----------
A           1         
A.1         2         
A.1.a       3         
A.2         2         
A.2.a       3         
A.2.b       3         
A.3         2         

There are many other queries that match patterns or differences in the upper and lower bounds.

Nested sets are very efficient at calculating many types of queries, but they are expensive to change. For the math to work correctly, there can’t be any gaps in the numbering sequence. This means that any insert or delete requires renumbering a significant number of entries in the table. This isn’t bad for something with a few dozen nodes, but it can quickly prove impractical for a tree with hundreds of nodes.

Additionally, because nested sets aren’t based off any kind of key reference, the database can’t help enforce the correctness of the tree structure. This leaves database integrity and correctness in the hands of the application—something that is normally avoided.

More information

This is just a brief overview of how to represent tree relationships. If you need to implement a tree, I suggest you do a few web searches on adjacency model or nested set. Many of the larger SQL books mentioned in Wrap-up also have sections on tree and hierarchies.