Table of Contents for
Using SQLite

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Using SQLite by Jay A. Kreibich Published by O'Reilly Media, Inc., 2010
  1. Cover
  2. Using SQLite
  3. O'Reilly Strata Conference
  4. Using SQLite
  5. Dedication
  6. A Note Regarding Supplemental Files
  7. Preface
  8. SQLite Versions
  9. Email Lists
  10. Example Code Download
  11. How We Got Here
  12. Conventions Used in This Book
  13. Using Code Examples
  14. Safari® Books Online
  15. How to Contact Us
  16. 1. What Is SQLite?
  17. Self-Contained, No Server Required
  18. Single File Database
  19. Zero Configuration
  20. Embedded Device Support
  21. Unique Features
  22. Compatible License
  23. Highly Reliable
  24. 2. Uses of SQLite
  25. Database Junior
  26. Application Files
  27. Application Cache
  28. Archives and Data Stores
  29. Client/Server Stand-in
  30. Teaching Tool
  31. Generic SQL Engine
  32. Not the Best Choice
  33. Big Name Users
  34. 3. Building and Installing SQLite
  35. SQLite Products
  36. Precompiled Distributions
  37. Documentation Distribution
  38. Source Distributions
  39. Building
  40. Build and Installation Options
  41. An sqlite3 Primer
  42. Summary
  43. 4. The SQL Language
  44. Learning SQL
  45. Brief Background
  46. General Syntax
  47. SQL Data Languages
  48. Data Definition Language
  49. Data Manipulation Language
  50. Transaction Control Language
  51. System Catalogs
  52. Wrap-up
  53. 5. The SELECT Command
  54. SQL Tables
  55. The SELECT Pipeline
  56. Advanced Techniques
  57. SELECT Examples
  58. What’s Next
  59. 6. Database Design
  60. Tables and Keys
  61. Common Structures and Relationships
  62. Normal Form
  63. Indexes
  64. Transferring Design Experience
  65. Closing
  66. 7. C Programming Interface
  67. API Overview
  68. Library Initialization
  69. Database Connections
  70. Prepared Statements
  71. Bound Parameters
  72. Convenience Functions
  73. Result Codes and Error Codes
  74. Utility Functions
  75. Summary
  76. 8. Additional Features and APIs
  77. Date and Time Features
  78. ICU Internationalization Extension
  79. Full-Text Search Module
  80. R*Trees and Spatial Indexing Module
  81. Scripting Languages and Other Interfaces
  82. Mobile and Embedded Development
  83. Additional Extensions
  84. 9. SQL Functions and Extensions
  85. Scalar Functions
  86. Aggregate Functions
  87. Collation Functions
  88. SQLite Extensions
  89. 10. Virtual Tables and Modules
  90. Introduction to Modules
  91. Module API
  92. Simple Example: dblist Module
  93. Advanced Example: weblog Module
  94. Best Index and Filter
  95. Wrap-Up
  96. A. SQLite Build Options
  97. Shell Directives
  98. ENABLE_READLINE
  99. Default Values
  100. SQLITE_DEFAULT_AUTOVACUUM
  101. SQLITE_DEFAULT_CACHE_SIZE
  102. SQLITE_DEFAULT_FILE_FORMAT
  103. SQLITE_DEFAULT_JOURNAL_SIZE_LIMIT
  104. SQLITE_DEFAULT_MEMSTATUS
  105. SQLITE_DEFAULT_PAGE_SIZE
  106. SQLITE_DEFAULT_TEMP_CACHE_SIZE
  107. YYSTACKDEPTH
  108. Sizes and Limits
  109. SQLITE_MAX_ATTACHED
  110. SQLITE_MAX_COLUMN
  111. SQLITE_MAX_COMPOUND_SELECT
  112. SQLITE_MAX_DEFAULT_PAGE_SIZE
  113. SQLITE_MAX_EXPR_DEPTH
  114. SQLITE_MAX_FUNCTION_ARG
  115. SQLITE_MAX_LENGTH
  116. SQLITE_MAX_LIKE_PATTERN_LENGTH
  117. SQLITE_MAX_PAGE_COUNT
  118. SQLITE_MAX_PAGE_SIZE
  119. SQLITE_MAX_SQL_LENGTH
  120. SQLITE_MAX_TRIGGER_DEPTH
  121. SQLITE_MAX_VARIABLE_NUMBER
  122. Operation and Behavior
  123. SQLITE_CASE_SENSITIVE_LIKE
  124. SQLITE_HAVE_ISNAN
  125. SQLITE_OS_OTHER
  126. SQLITE_SECURE_DELETE
  127. SQLITE_THREADSAFE
  128. SQLITE_TEMP_STORE
  129. Debug Settings
  130. SQLITE_DEBUG
  131. SQLITE_MEMDEBUG
  132. Enable Extensions
  133. SQLITE_ENABLE_ATOMIC_WRITE
  134. SQLITE_ENABLE_COLUMN_METADATA
  135. SQLITE_ENABLE_FTS3
  136. SQLITE_ENABLE_FTS3_PARENTHESIS
  137. SQLITE_ENABLE_ICU
  138. SQLITE_ENABLE_IOTRACE
  139. SQLITE_ENABLE_LOCKING_STYLE
  140. SQLITE_ENABLE_MEMORY_MANAGEMENT
  141. SQLITE_ENABLE_MEMSYS3
  142. SQLITE_ENABLE_MEMSYS5
  143. SQLITE_ENABLE_RTREE
  144. SQLITE_ENABLE_STAT2
  145. SQLITE_ENABLE_UPDATE_DELETE_LIMIT
  146. SQLITE_ENABLE_UNLOCK_NOTIFY
  147. YYTRACKMAXSTACKDEPTH
  148. Limit Features
  149. SQLITE_DISABLE_LFS
  150. SQLITE_DISABLE_DIRSYNC
  151. SQLITE_ZERO_MALLOC
  152. Omit Core Features
  153. B. sqlite3 Command Reference
  154. Command-Line Options
  155. Interactive Dot-Commands
  156. .backup
  157. .bail
  158. .databases
  159. .dump
  160. .echo
  161. .exit
  162. .explain
  163. .headers
  164. .help
  165. .import
  166. .indices
  167. .iotrace
  168. .load
  169. .log
  170. .mode
  171. .nullvalue
  172. .output
  173. .prompt
  174. .quit
  175. .read
  176. .restore
  177. .schema
  178. .separator
  179. .show
  180. .tables
  181. .timeout
  182. .timer
  183. .width
  184. C. SQLite SQL Command Reference
  185. SQLite SQL Commands
  186. ALTER TABLE
  187. ANALYZE
  188. ATTACH DATABASE
  189. BEGIN TRANSACTION
  190. COMMIT TRANSACTION
  191. CREATE INDEX
  192. CREATE TABLE
  193. CREATE TRIGGER
  194. CREATE VIEW
  195. CREATE VIRTUAL TABLE
  196. DELETE
  197. DETACH DATABASE
  198. DROP INDEX
  199. DROP TABLE
  200. DROP TRIGGER
  201. DROP VIEW
  202. END TRANSACTION
  203. EXPLAIN
  204. INSERT
  205. PRAGMA
  206. REINDEX
  207. RELEASE SAVEPOINT
  208. REPLACE
  209. ROLLBACK TRANSACTION
  210. SAVEPOINT
  211. SELECT
  212. UPDATE
  213. VACUUM
  214. D. SQLite SQL Expression Reference
  215. Literal Expressions
  216. Logic Representations
  217. Unary Expressions
  218. Binary Expressions
  219. Function Calls
  220. Column Names
  221. General Expressions
  222. AND
  223. BETWEEN
  224. CASE
  225. CAST
  226. COLLATE
  227. EXISTS
  228. GLOB
  229. IN
  230. IS
  231. ISNULL
  232. LIKE
  233. MATCH
  234. NOTNULL
  235. OR
  236. RAISE
  237. REGEXP
  238. SELECT
  239. E. SQLite SQL Function Reference
  240. Scalar Functions
  241. abs()
  242. changes()
  243. coalesce()
  244. date()
  245. datetime()
  246. glob()
  247. ifnull()
  248. hex()
  249. julianday()
  250. last_insert_rowid()
  251. length()
  252. like()
  253. load_extension()
  254. lower()
  255. ltrim()
  256. match()
  257. max()
  258. min()
  259. nullif()
  260. quote()
  261. random()
  262. randomblob()
  263. regex()
  264. replace()
  265. round()
  266. rtrim()
  267. sqlite_compileoption_get()
  268. sqlite_compileoption_used()
  269. sqlite_source_id()
  270. sqlite_version()
  271. strftime()
  272. substr()
  273. time()
  274. total_changes()
  275. trim()
  276. typeof()
  277. upper()
  278. zeroblob()
  279. Aggregate Functions
  280. avg()
  281. count()
  282. group_concat()
  283. max()
  284. min()
  285. sum()
  286. total()
  287. F. SQLite SQL PRAGMA Reference
  288. SQLite PRAGMAs
  289. auto_vacuum
  290. cache_size
  291. case_sensitive_like
  292. collation_list
  293. count_changes
  294. database_list
  295. default_cache_size
  296. encoding
  297. foreign_keys
  298. foreign_key_list
  299. freelist_count
  300. full_column_names
  301. fullfsync
  302. ignore_check_constraints
  303. incremental_vacuum
  304. index_info
  305. index_list
  306. integrity_check
  307. journal_mode
  308. journal_size_limit
  309. legacy_file_format
  310. locking_mode
  311. lock_proxy_file
  312. lock_status
  313. max_page_count
  314. omit_readlock
  315. page_count
  316. page_size
  317. parser_trace
  318. quick_check
  319. read_uncommitted
  320. recursive_triggers
  321. reverse_unordered_selects
  322. schema_version
  323. secure_delete
  324. short_column_names
  325. sql_trace
  326. synchronous
  327. table_info
  328. temp_store
  329. temp_store_directory
  330. user_version
  331. vdbe_trace
  332. vdbe_listing
  333. writable_schema
  334. G. SQLite C API Reference
  335. API Datatypes
  336. sqlite3
  337. sqlite3_backup
  338. sqlite3_blob
  339. sqlite3_context
  340. sqlite3_int64, sqlite3_uint64, sqlite_int64, sqlite_uint64
  341. sqlite3_module
  342. sqlite3_mutex
  343. sqlite3_stmt
  344. sqlite3_value
  345. sqlite3_vfs
  346. API Functions
  347. sqlite3_aggregate_context()
  348. sqlite3_auto_extension()
  349. sqlite3_backup_finish()
  350. sqlite3_backup_init()
  351. sqlite3_backup_pagecount()
  352. sqlite3_backup_remaining()
  353. sqlite3_backup_step()
  354. sqlite3_bind_xxx()
  355. sqlite3_bind_parameter_count()
  356. sqlite3_bind_parameter_index()
  357. sqlite3_bind_parameter_name()
  358. sqlite3_blob_bytes()
  359. sqlite3_blob_close()
  360. sqlite3_blob_open()
  361. sqlite3_blob_read()
  362. sqlite3_blob_write()
  363. sqlite3_busy_handler()
  364. sqlite3_busy_timeout()
  365. sqlite3_changes()
  366. sqlite3_clear_bindings()
  367. sqlite3_close()
  368. sqlite3_collation_needed()
  369. sqlite3_column_xxx()
  370. sqlite3_column_bytes()
  371. sqlite3_column_count()
  372. sqlite3_column_database_name()
  373. sqlite3_column_decltype()
  374. sqlite3_column_name()
  375. sqlite3_column_origin_name()
  376. sqlite3_column_table_name()
  377. sqlite3_column_type()
  378. sqlite3_commit_hook()
  379. sqlite3_compileoption_get()
  380. sqlite3_compileoption_used()
  381. sqlite3_complete()
  382. sqlite3_config()
  383. sqlite3_context_db_handle()
  384. sqlite3_create_collation()
  385. sqlite3_create_function()
  386. sqlite3_create_module()
  387. sqlite3_data_count()
  388. sqlite3_db_config()
  389. sqlite3_db_handle()
  390. sqlite3_db_mutex()
  391. sqlite3_db_status()
  392. sqlite3_declare_vtab()
  393. sqlite3_enable_load_extension()
  394. sqlite3_enable_shared_cache()
  395. sqlite3_errcode()
  396. sqlite3_errmsg()
  397. sqlite3_exec()
  398. sqlite3_extended_errcode()
  399. sqlite3_extended_result_codes()
  400. sqlite3_file_control()
  401. sqlite3_finalize()
  402. sqlite3_free()
  403. sqlite3_free_table()
  404. sqlite3_get_autocommit()
  405. sqlite3_get_auxdata()
  406. sqlite3_get_table()
  407. sqlite3_initialize()
  408. sqlite3_interrupt()
  409. sqlite3_last_insert_rowid()
  410. sqlite3_libversion()
  411. sqlite3_libversion_number()
  412. sqlite3_limit()
  413. sqlite3_load_extension()
  414. sqlite3_log()
  415. sqlite3_malloc()
  416. sqlite3_memory_highwater()
  417. sqlite3_memory_used()
  418. sqlite3_mprintf()
  419. sqlite3_mutex_alloc()
  420. sqlite3_mutex_enter()
  421. sqlite3_mutex_free()
  422. sqlite3_mutex_held()
  423. sqlite3_mutex_leave()
  424. sqlite3_mutex_notheld()
  425. sqlite3_mutex_try()
  426. sqlite3_next_stmt()
  427. sqlite3_open()
  428. sqlite3_open_v2()
  429. sqlite3_overload_function()
  430. sqlite3_prepare_xxx()
  431. sqlite3_profile()
  432. sqlite3_progress_handler()
  433. sqlite3_randomness()
  434. sqlite3_realloc()
  435. sqlite3_release_memory()
  436. sqlite3_reset()
  437. sqlite3_reset_auto_extension()
  438. sqlite3_result_xxx()
  439. sqlite3_result_error_xxx()
  440. sqlite3_rollback_hook()
  441. sqlite3_set_authorizer()
  442. sqlite3_set_auxdata()
  443. sqlite3_shutdown()
  444. sqlite3_sleep()
  445. sqlite3_snprintf()
  446. sqlite3_soft_heap_limit()
  447. sqlite3_sourceid()
  448. sqlite3_sql()
  449. sqlite3_status()
  450. sqlite3_step()
  451. sqlite3_stmt_status()
  452. sqlite3_strnicmp()
  453. sqlite3_table_column_metadata()
  454. sqlite3_threadsafe()
  455. sqlite3_total_changes()
  456. sqlite3_trace()
  457. sqlite3_unlock_notify()
  458. sqlite3_update_hook()
  459. sqlite3_user_data()
  460. sqlite3_value_xxx()
  461. sqlite3_value_bytes()
  462. sqlite3_value_numeric_type()
  463. sqlite3_value_type()
  464. sqlite3_version[]
  465. sqlite3_vfs_find()
  466. sqlite3_vfs_register()
  467. sqlite3_vfs_unregister()
  468. sqlite3_vmprintf()
  469. Index
  470. About the Author
  471. Colophon
  472. Copyright

Indexes

Indexes (or indices) are auxiliary data structures used by the database system to enforce unique constraints, speed up sort operations, and provide faster access to specific records. They are often created in hopes of making queries run faster by avoiding table scans and providing a more direct lookup method.

Understanding where and when to create indexes can be an important factor in achieving good performance, especially as datasets grow. Without proper indexes in place, the database system has no option but to do full table scans for every lookup. Table scans can be especially expensive when joining tables together.

Indexes are not without cost, however. Each index must maintain a one-to-one correspondence between index entries and table rows. If a new row is inserted, updated, or deleted from a table, that same modification must be made to all associated indexes. Each new index will add additional overhead to the INSERT, UPDATE, and DELETE commands. Indexes also consume space, both on disk as well as in the SQLite page cache. A proper, well-placed index is worth the cost, but it isn’t wise to just create random indexes, in hopes that one of them will prove useful. A poorly placed index still has all the costs, and can actually slow queries down. You can have too much of a good thing.

How They Work

Each index is associated with a specific table. Indexes can be either single column or multicolumn, but all the columns of a given index must belong to the same table. There is no limit to the number of indexes a table can have, nor is there a limit on the number of indexes a column can belong to. You cannot create an index on a view or on a virtual table.

Internally, the rows of a normal table are stored in an indexed structure. SQLite uses a B-Tree for this purpose, which is a specific type of multi-child, balanced tree. The details are unimportant, other than understanding that as rows are inserted into the tree, the rows are sorted, organized, and optimized, so that a row with a specific, known ROWID can be retrieved relatively directly and quickly.

When you create an index, the database system creates another tree structure to hold the index data. Rather than using the ROWID column as the sort key, the tree is sorted and organized using the column or columns you’ve specified in the index definition. The index entry consists of a copy of the values from each of the indexed columns, as well as a copy of the corresponding ROWID value. This allows an indexed entry to be found very quickly using the values from the indexed columns.

If we have a table like this:

CREATE TABLE tbl ( a, b, c, d );

And then create an index that looks like this:

CREATE INDEX idx_tbl_a_b ON tbl ( a, b );

The database generates an internal data structure that is conceptually similar to:

SELECT a, b, ROWID FROM tbl ORDER BY a, b;

If SQLite needs to quickly find all the rows where, for example, a = 45, it can use the sorted index to quickly jump to that range of values and extract the relevant index entries. If it’s looking for the value of b, it can simply extract that from the index and be done. If we need any other value in the row, it needs to fetch the full row. This is done by looking up the ROWID. The last value of any index entry is the ROWID of its corresponding table row. Once SQLite has a list of ROWID values for all rows where a = 45, it can efficiently look up those rows in the original table and retrieve the full row. If everything works correctly, the process of looking up a small set of index entries, and then using those to look up a small set of table rows, will be much faster and more efficient than doing a full table scan.

Must Be Diverse

To have a positive impact on query performance, indexes must be diverse. The cost of fetching a single row through an index is significantly higher than fetching a single row through a table scan. To reduce the overall cost, an index must overcome this overhead by eliminating the vast majority of row fetches. By significantly reducing the number of row lookups, the total query cost can be reduced, even if the individual row lookups are more expensive.

The break-even point for index performance is somewhere in the 10% to 20% range. Any query that fetches more rows from a table will do better with a table scan, while any query that fetches fewer rows will see an improvement by using an index. If an index cannot isolate rows to this level, there is no reason for it to be there. In fact, if the query optimizer mistakenly uses an index that returns a large percentage of rows, the index can reduce performance and slow things down.

This creates two concerns. First, if you’re trying to improve the performance of a query that fetches a moderate percentage of rows, adding an index is unlikely to help. An index can only improve performance if it is used to target a focused number of rows.

Second, even if the query is asking for a specific value, this can still lead to a higher percentage of fetched rows if the indexed columns are not reasonably unique. For example, if a column only has four unique values, a successful query will never fetch fewer than approximately 25% of the rows (assuming a reasonable distribution of values or queries). Adding an index to this type of column will not improve performance. Creating an index on a true/false column would be even worse.

When the query optimizer is trying to figure out if it should use an index or not, it typically has very little information to go on. For the most part, it assumes that if an index exists, it must be a good index, and it will tend to use it. The ANALYZE command can help mitigate this by providing statistical information to the query optimizer (see ANALYZE for more details), but keeping that updated and well maintained can be difficult in many environments.

To avoid problems, you should avoid creating indexes on columns that are not reasonably unique. Indexing such columns rarely provides any benefit, and it can confuse the query optimizer.

INTEGER PRIMARY KEYs

When you declare one or more columns to be a PRIMARY KEY, the database system automatically creates a unique index over those columns. The fundamental purpose of this index is to enforce the UNIQUE constraint that is implied with every PRIMARY KEY. It also happens that many database operations typically involve the primary key, such as natural joins, or conditional lookups in UPDATE or DELETE commands. Even if the index wasn’t required to enforce the UNIQUE constraint, chances are good you would want an index over those columns anyway.

There are further advantages to using an INTEGER PRIMARY KEY. As we’ve discussed, when an INTEGER PRIMARY KEY is declared, that column replaces the automatic ROWID column, and becomes the root column for the tree. In essence, the column becomes the index used to store the table itself, eliminating the need for a second, external index.

INTEGER PRIMARY KEY columns also provide better performance than standard indexes. INTEGER PRIMARY KEYs have direct access to the full set of row data. A normal index simply references the ROWID value, which is then looked up in the table root. INTEGER PRIMARY KEYs can provide indexed lookups, but skip this second lookup step, and directly access the table data.

If you’re using generic row ID values, it is worth the effort to define them as INTEGER PRIMARY KEY columns. Not only will this reduce the size of the database, it can make your queries faster and more efficient.

Order Matters

When creating a multicolumn index for performance purposes, the column order is very important. If an index has only one column, that column is used as the sort key. If additional columns are specified in the index definition, the additional columns will be used as secondary, tertiary, etc., sort keys. All of the data is sorted by the first column, then any groups of duplicate values are further sorted by the second column, and so on. This is very similar to a typical phonebook. Most phone books sort by last name. Any group of common last names is then sorted by first name, and then by middle name or initial.

In order to utilize a multicolumn index, a query must contain conditions that are able to utilize the sort keys in the same order they appear in the index definition. If the query does not contain a condition that keys off the first column, the index cannot be used.

Consider looking up a name in the phonebook. You can use a phonebook to quickly find the phone number for “Jennifer T. Smith.” First, you look up the last name “Smith.” Then you refine the search, looking for the first name, “Jennifer,” and finally the middle initial “T” (if required). This sequence should allow you to focus in on a specific entry very quickly.

A phonebook isn’t much use to quickly find all the people with the first name “Jennifer.” Even though the first name is part of the phonebook index, it isn’t the first column of the index. If our query cannot provide a specific condition for the first column of a multicolumn index, the index cannot be used, and our only option is to do a full scan. At that point it is usually more efficient to do a full scan on the table itself, rather than the index.

It is not necessary to utilize every column, only that you utilize the columns in order. If you’re looking up a very unique name in the phone book, you may not need to search off the first name or middle initial. Or, perhaps your query is to find every “Smith.” In that case, the conditions of the query are satisfied with just the first column.

In the end, you need to be very careful when considering the order of a multicolumn index. The additional columns should be viewed as a refinement on the first column, not a replacement. A single multicolumn index is very different from multiple single-column indexes. If you have different queries that use different columns as the main lookup condition, you may be better off with multiple small indexes, rather than one large multicolumn index.

One at a Time

Multicolumn indexes are very useful in some situations, but there are limitations on when they can be effectively used. In some cases, it is more appropriate to create multiple single-column indexes on the same table.

This too has limitations. You cannot create a single-column index on every column of a table and expect the query system to mix and match whatever indexes it needs. Once a subset of rows is extracted from a table (via index or full table scan), that set of rows conceptually becomes an independent temporary table that is no longer part of the original table. At that point, any index associated with the source table becomes unavailable.

In general, the query optimizer can only utilize one index per table-instance per query. Multiple indexes on a single table only make sense if you have a different queries that extract rows using different columns (or have multiple UNIQUE constraints). Different queries that key off different columns can pick the index that is most appropriate, but each individual query will only use one index per table-instance. If you want a single query to utilize multiple indexed columns, you need a multicolumn index.

The major exception is a series of OR conditions. If a WHERE clause has a chain of OR conditions on one or more columns of the same table, then there are cases when SQLite may go ahead and use multiple indexes for the same table in a single query.

Index Summary

For all their power, indexes are often a source of great frustration. The right index will result in a huge performance boost, while the wrong index can significantly slow things down. All indexes add cost to table modifications and add bulk to the database size. A well-placed index is often worth the overhead costs, but it can be difficult to understand what makes a well-placed index.

To complicate matters, you don’t get to tell the query optimizer when or how to use your indexes—the query planner makes its own decisions. Having everything be automatic means that queries will work no matter what, but it can also make it difficult to determine if an index is being used. It can be even more difficult to figure out why an index is being ignored.

This essentially leaves the database designer in the position of second-guessing the query optimizer. Changes and optimizations must be searched for on something of a trial-and-error basis. To make things worse, as your application changes and evolves, changes in the query structure or patterns can cause a change in index use. All in all, it can be a difficult situation to manage.

Traditionally, this is where the role of the DBA, or Database Administrator, comes in. This person is responsible for the care and feeding of a large database server. They do things like monitor query efficiency, adjust tuning parameters and indexes, and make sure all the statistical data is kept up to date. Unfortunately, that whole idea is somewhat contrary to what SQLite is all about.

So how do you maintain performance? For starters, have a solid design. A well-crafted and normalized database is going to go a long way toward defining your access patterns. Taking the time to correctly identify and define your primary keys will allow the database system to create appropriate indexes and give the query optimizer (and future developers) some insight into your design and intentions.

Also remember that (with the exception of UNIQUE constraints) a database will operate correctly without any indexes. It might be slow, but all the answers will be correct. This lets you worry about design first, and performance later. Once you have a working design, it is always possible to add indexes after the fact, without needing to alter your table structure or your query commands.

The queries of a fully normalized database will typically have quite a few JOIN operations. Nearly all of these joins are across primary and foreign keys. Primary keys are automatically indexed, but in most cases you need to manually create indexes on your foreign keys. With those in place, your database should already have the most critical indexes defined.

Start from here. Measure the performance of the different types of queries your application uses and find the problem areas. Looking at these queries, try to find columns where an index might boost performance. Look for any constraints and conditions that may benefit from an index, especially in join conditions that reference nonkey columns. Use EXPLAIN and EXPLAIN QUERY PLAN to understand how SQLite is accessing the data. These commands can also be used to verify if a query is using an index or not. For more information, see EXPLAIN in Appendix C. You can also use the sqlite3_stmt_status() function to get a more measured understanding of statement efficiency. See sqlite3_stmt_status() in Appendix G for more details.

All in all, you shouldn’t get too hung up on precise index placement. You may need a few well-placed indexes to deal with larger tables, but the stock PRIMARY KEY indexes do surprisingly well in most cases. Further index placement should be considered performance tuning and shouldn’t be worried about until an actual need is identified. Never forget that indexes have cost, so you shouldn’t create them unless you can identify a need.