The SELECT command is
very complex, and it can be difficult to see how these different
clauses can be fit together into something useful. Some of this will become
more obvious in the next chapter, when we look at standard database design
practices, but to get you started, we’re going to look at several
examples.
All of these examples will use this data:
CREATE TABLE x ( a, b ); INSERT INTO x VALUES ( 1, 'Alice' ); INSERT INTO x VALUES ( 2, 'Bob' ); INSERT INTO x VALUES ( 3, 'Charlie' ); CREATE TABLE y ( c, d ); INSERT INTO y VALUES ( 1, 3.14159 ); INSERT INTO y VALUES ( 1, 2.71828 ); INSERT INTO y VALUES ( 2, 1.61803 ); CREATE TABLE z ( a, e ); INSERT INTO z VALUES ( 1, 100 ); INSERT INTO z VALUES ( 1, 150 ); INSERT INTO z VALUES ( 3, 300 ); INSERT INTO z VALUES ( 9, 900 );
These examples show the sqlite3 command-line tool. The following dot-commands were
issued to make the output easier to understand. The last command will cause
sqlite3 to print the string
[NULL] whenever a NULL is
encountered. Normally, a NULL will produce a blank output that is
indistinguishable from an empty string:
.headers on .mode column .nullvalue [NULL]
This dataset is available on the book’s download page on
the O’Reilly website, as both an SQL file and an SQLite database. I suggest
you sit down with a copy of sqlite3 and
try these commands out. Try experimenting with different variations.
If one of these examples doesn’t quite make sense, just
break the SELECT statement down into its
individual parts and step through them bit by bit.
Let’s start with a simple select that returns all
of the columns and rows in table x.
The SELECT * syntax returns all
columns by default:
sqlite> SELECT * FROM x;
a b
---------- ----------
1 Alice
2 Bob
3 Charlie We can also return expressions, rather than just columns:
sqlite> SELECT d, d*d AS dSquared FROM y;
d dSquared
---------- ------------
3.14159 9.8695877281
2.71828 7.3890461584
1.61803 2.6180210809Now some joins. By default, the bare keyword
JOIN indicates an
INNER JOIN. However, when no
additional condition is put on the
JOIN, it reverts to a
CROSS JOIN. As a result, all three
of these queries produce the same results. The last line uses the
implicit join notation.
sqlite>SELECT * FROM x JOIN y;sqlite>SELECT * FROM x CROSS JOIN y;sqlite>SELECT * FROM x, y;a b c d ---------- ---------- ---------- ---------- 1 Alice 1 3.14159 1 Alice 1 2.71828 1 Alice 2 1.61803 2 Bob 1 3.14159 2 Bob 1 2.71828 2 Bob 2 1.61803 3 Charlie 1 3.14159 3 Charlie 1 2.71828 3 Charlie 2 1.61803
In the case of a cross join, every row in table a is matched to every row in table
y. Since both tables had
three rows and two columns, the result set has nine rows (3·3) and
four columns (2+2).
Next, a fairly simple inner join using a basic
ON join condition:
sqlite> SELECT * FROM x JOIN y ON a = c;
a b c d
---------- ---------- ---------- ----------
1 Alice 1 3.14159
1 Alice 1 2.71828
2 Bob 2 1.61803
This query still generates four columns, but only those rows that fulfill the join condition are included in the result set.
The following statement requires the columns to be
qualified, since both table x and
table z have an a column. Notice that two different
a columns are returned, one
from each source table:
sqlite> SELECT * FROM x JOIN z ON x.a = z.a;
a b a e
---------- ---------- ---------- ----------
1 Alice 1 100
1 Alice 1 150
3 Charlie 3 300 If we use a
NATURAL JOIN or the USING syntax, the duplicate column
will be eliminated. Since both table x and table z have
only column a in common, both of
these statements produce the same output:
sqlite>SELECT * FROM x JOIN z USING ( a );sqlite>SELECT * FROM x NATURAL JOIN z;a b e ---------- ---------- ---------- 1 Alice 100 1 Alice 150 3 Charlie 300
A LEFT OUTER
JOIN
will return the same results as an INNER JOIN, but will also include rows
from table x (the left/first table)
that were not matched:
sqlite> SELECT * FROM x LEFT OUTER JOIN z USING ( a );
a b e
---------- ---------- ----------
1 Alice 100
1 Alice 150
2 Bob [NULL]
3 Charlie 300 In this case, the Bob row from
table x has no matching row in
table z. Those column values
normally provided by table z are
padded out with NULL, and the row is then included in the result
set.
It is also possible to JOIN multiple tables together. In this case we join
table x to table y, and then join the result to table
z:
sqlite> SELECT * FROM x JOIN y ON x.a = y.c LEFT OUTER JOIN z ON y.c = z.a;
a b c d a e
---------- ---------- ---------- ---------- ---------- ----------
1 Alice 1 3.14159 1 100
1 Alice 1 3.14159 1 150
1 Alice 1 2.71828 1 100
1 Alice 1 2.71828 1 150
2 Bob 2 1.61803 [NULL] [NULL] If you don’t see what is going on here, work through the joins one at
a time. First look at what FROM x JOIN y ON
x.a = y.c will produce (shown in one of the previous
examples). Then look at how this result set would combine with table
z using a LEFT OUTER JOIN.
Our last join example shows a self-join, where a table is joined against itself. This creates two unique instances of the same table and necessitates the use of table aliases:
sqlite> SELECT * FROM x AS x1 JOIN x AS x2 ON x1.a + 1 = x2.a;
a b a b
---------- ---------- ---------- ----------
1 Alice 2 Bob
2 Bob 3 CharlieAlso, notice that the join condition is a more arbitrary expression, rather than being a simple test of column references.
Moving on, the WHERE clause is
used to filter rows. We can pick out a specific
row:
sqlite> SELECT * FROM x WHERE b = 'Alice';
a b
---------- ----------
1 Alice Or a range of values:
sqlite> SELECT * FROM y WHERE d BETWEEN 1.0 AND 3.0;
c d
---------- ----------
1 2.71828
2 1.61803 In this case, the WHERE expression
references the output column by its assigned name:
sqlite> SELECT c, d, c+d AS sum FROM y WHERE sum < 4.0;
c d sum
---------- ---------- ----------
1 2.71828 3.71828
2 1.61803 3.61803 Now let’s look at a few GROUP BY
statements. Here we group table z by the a column.
Since there are three unique values in z.a, the output has three rows. Only the grouping
a=1 has more than one row,
however. We can see this in the count() values returned by the second column:
sqlite> SELECT a, count(a) AS count FROM z GROUP BY a;
a count
---------- ----------
1 2
3 1
9 1 This is a similar query, only now the second
output column represents the sum of all the z.e values in each group:
sqlite> SELECT a, sum(e) AS total FROM z GROUP BY a;
a total
---------- ----------
1 250
3 300
9 900 We can even compute our own average and compare
that to the
avg() aggregate:
sqlite>SELECT a, sum(e), count(e),...>sum(e)/count(e) AS expr, avg(e) AS agg...>FROM z GROUP BY a;a sum(e) count(e) expr agg ---------- ---------- ---------- ---------- ---------- 1 250 2 125 125.0 3 300 1 300 300.0 9 900 1 900 900.0
A HAVING clause
can be used to filter rows based off the results of the
sum() aggregation:
sqlite> SELECT a, sum(e) AS total FROM z GROUP BY a HAVING total > 500;
a total
---------- ----------
9 900 The output can also be sorted. Most of these
examples already look sorted, but that’s mostly by chance.
The
ORDER BY clause enforced a specific
order:
sqlite> SELECT * FROM y ORDER BY d;
c d
---------- ----------
2 1.61803
1 2.71828
1 3.14159 Limits and offsets can also be applied to pick out specific rows from an ordered result. Conceptually, these are fairly simple, however.
These tables and queries are available as part of the book download.
Feel free to load the data into sqlite3 and try out different queries. Don’t worry
about creating SELECT statements
that use every available clause. Start with simple queries where you
can understand all the steps, then start to combine clauses to build
larger and more complex queries.