The SELECT command is used to query the database
and return a result. The SELECT command is the only SQL command capable
of returning a user-generated result, be it a table query or a
simple expression. Most consider SELECT to be the most complex SQL command.
Although the basic format is fairly easy to understand, it does
take some experience to understand its full power.
All of Chapter 5 is devoted to the SELECT command.
The core SELECT command follows a simple pattern that can be roughly
described as SELECT
. The
output section describes the data that makes up the result
set, the input section describes what tables, views,
subqueries, etc., will act as data sources, and the filter
section sets up conditions on which rows are part of the
result set and which rows are filtered out.output FROM
input WHERE
filter
A SELECT can either be SELECT ALL (default) or
SELECT DISTINCT.
The ALL keyword returns
all rows in the result set, regardless of their
composition. The DISTINCT keyword will force the select
statement to eliminate duplicate results in the result
set. There is usually a considerable performance penalty
for calculating larger DISTINCT results.
The result set columns are
defined by a series of comma-separated expressions. Every
SELECT statement
must have at least one result expression. These
expressions often consist of only a source column name,
although they can be any general expression. The character
* means “return
all columns,” and will include all standard table columns
from all of the source tables. All the standard columns of
a specific source table can be returned with the format
.
In both cases, the table_name.*ROWID column will not be included,
although ROWID alias
columns will be included. Virtual tables can also mark
some columns as hidden. Like the ROWID column, hidden columns will not be
returned by default, but can be explicitly named as a
result column.
Result columns can be given
explicit names with an optional AS clause (the actual AS keyword is optional as
well). Unless an AS
clause is given, the name of the output column is at the
discretion of the database engine. If an application
depends on matching the names of specific output columns,
the columns should be given explicit names with an
AS clause.
The FROM clause defines where the data comes
from and how it is shuffled together. If no FROM clause is given, the
SELECT statement
will return only one row. Each source is joined together
with a comma or a JOIN
operation. The comma acts as an unconditional CROSS JOIN. Different
sources, including tables, subqueries, or other JOIN statements, can be
grouped together into a large transitory table, which is
fed through the rest of the SELECT statement, and ultimately used to
produce the result set. For more information on the
specific JOIN
operators, see FROM Clause.
Each data source, be it a
named table or a subquery, can be given an optional
AS clause.
Similar to result set columns, the actual AS keyword is optional.
The AS clause allows an
alias to be assigned to a given source. This is important
to disambiguate table instances (for example, in a
self-join).
The WHERE clause is used to filter rows. Conceptually, the
FROM clause,
complete with joins, is used to define a large table that
consists of every possible row combination. The WHERE clause is evaluated
against each of those rows, passing only those rows that
evaluate to true. The WHERE clause can also be used to define
join conditions, by effectively having the FROM clause produce the
Cartesian product of the two tables, and use the WHERE clause to filter out
only those rows that meet the join condition.
Beyond SELECT, FROM, and WHERE, the SELECT statement can do
additional processing with GROUP
BY, HAVING, ORDER
BY, and LIMIT.
The GROUP BY clause allows sets of rows in the result set to be
collapsed into single rows. Groups of rows that share
equivalent values in all of the expressions listed in the
GROUP BY clause
will be condensed to a single row. Normally, every source
column reference in the result set expressions should be a
column or expression included in the GROUP BY clause, or the
column should appear as a parameter of an aggregate
function. The value of any other source column is the
value of the last row in the group to be processed,
effectively making the result undefined. If a GROUP BY expression is a
literal integer, it is assumed to be a column index to the
result set. For example, the clause GROUP BY 2 would group the
result set using the values in the second result
column.
A HAVING clause can only be used in
conjunction with a GROUP
BY clause. Like the WHERE clause, a HAVING expression is used
as a row filter. The key difference is that the HAVING expression is
applied after any GROUP
BY manipulation. This sequence allows the
HAVING expression
to filter aggregate outputs. Be aware that the WHERE clause is usually
more efficient, since it can eliminate rows earlier in the
SELECT pipeline.
If possible, filtering should be done in the WHERE clause, saving the
HAVING clause to
filter aggregate results.
The ORDER BY clause sorts the result set into
a specific order. Typically, the output ordering is not
defined. Rows are returned as they become available, and
no attempt is made to return the results in any specific
order. The ORDER BY
clause can be used to enforce a specific output ordering.
Output is sorted by each expression in the clause, in
turn, from most specific to least specific. The fact that
the output of a SELECT
can be ordered is one of the key differences between an
SQL table and a result set. As with GROUP BY, if one of the
ORDER BY
expressions is a literal integer, it is assumed to be a
column index to the result set.
Finally, the LIMIT clause can be used
to control how many rows are returned, starting at a
specific offset. If no offset is provided, the LIMIT will start from the
beginning of the result set. Note that the two syntax
variations (comma or OFFSET) provide the parameters in the
opposite order.
Since the row order of a
result is undefined, a LIMIT is most frequently used in
conjunction with an ORDER
BY clause. Although it is not strictly
required, including an ORDER
BY brings some meaning to the limit and
offset values. There are very few cases when it makes
sense to use a LIMIT
without some type of imposed ordering.
Compound statements
allow one or more SELECT...FROM...WHERE...GROUP BY...HAVING
substatements to be brought together using set operations.
SQLite supports the UNION, UNION
ALL, INTERSECT, and EXCEPT compound operators. Each SELECT statement in a
compound SELECT must
return the same number of columns. The names of the result
set columns will be taken from the first SELECT statement.
The UNION operator returns the union of the
SELECT
statements. By default, the UNION operator is a proper set operator
and will only return distinct rows (including those from a
single table) . UNION
ALL, by contrast, will return the full set
of rows returned by each SELECT. The
UNION ALL operator is significantly less
expensive than the UNION operator, so the use of UNION ALL is encouraged,
when possible.
The INTERSECT command will return the set of
rows that appear in both SELECT statements. Like UNION, the INTERSECT operator is a
proper set operation and will only return one instance of
each unique row, no matter how many times that row appears
in both result sets of the individual SELECT statements.
The EXCEPT compound operator acts as a
set-wise subtraction operator. All unique rows in the
first SELECT that do
not appear in the second SELECT will be returned.