Chapter 10. Joins Revisited

By now, you should be comfortable with the concept of the inner join, which I introduced in Chapter 5. This chapter focuses on other ways in which you can join tables, including the outer join and the cross join.

Outer Joins

In all the examples thus far that have included multiple tables, we haven’t been concerned that the join conditions might fail to find matches for all the rows in the tables. For example, when joining the account table to the customer table, I did not mention the possibility that a value in the cust_id column of the account table might not match a value in the cust_id column of the customer table. If that were the case, then some of the rows in one table or the other would be left out of the result set.

Just to be sure, let’s check the data in the tables. Here are the account_id and cust_id columns from the account table:

mysql> SELECT account_id, cust_id
    -> FROM account;
+------------+---------+
| account_id | cust_id |
+------------+---------+
|          1 |       1 |
|          2 |       1 |
|          3 |       1 |
|          4 |       2 |
|          5 |       2 |
|          7 |       3 |
|          8 |       3 |
|         10 |       4 |
|         11 |       4 |
|         12 |       4 |
|         13 |       5 |
|         14 |       6 |
|         15 |       6 |
|         17 |       7 |
|         18 |       8 |
|         19 |       8 |
|         21 |       9 |
|         22 |       9 |
|         23 |       9 |
|         24 |      10 |
|         25 |      10 |
|         27 |      11 |
|         28 |      12 |
|         29 |      13 |
+------------+---------+
24 rows in set (1.50 sec)

There are 24 accounts spanning 13 different customers, with customer IDs 1 through 13 having at least one account. Here’s the set of customer IDs from the customer table:

mysql> SELECT cust_id
    -> FROM customer;

+---------+
| cust_id |
+---------+
|       1 |
|       2 |
|       3 |
|       4 |
|       5 |
|       6 |
|       7 |
|       8 |
|       9 |
|      10 |
|      11 |
|      12 |
|      13 |
+---------+
13 rows in set (0.02 sec)

There are 13 rows in the customer table with IDs 1 through 13, so every customer ID is included at least once in the account table. When the two tables are joined on the cust_id column, therefore, you would expect all 24 rows to be included in the result set (barring any other filter conditions):

mysql> SELECT a.account_id, c.cust_id
    -> FROM account a INNER JOIN customer c
    ->   ON a.cust_id = c.cust_id;
+------------+---------+
| account_id | cust_id |
+------------+---------+
|          1 |       1 |
|          2 |       1 |
|          3 |       1 |
|          4 |       2 |
|          5 |       2 |
|          7 |       3 |
|          8 |       3 |
|         10 |       4 |
|         11 |       4 |
|         12 |       4 |
|         13 |       5 |
|         14 |       6 |
|         15 |       6 |
|         17 |       7 |
|         18 |       8 |
|         19 |       8 |
|         21 |       9 |
|         22 |       9 |
|         23 |       9 |
|         24 |      10 |
|         25 |      10 |
|         27 |      11 |
|         28 |      12 |
|         29 |      13 |
+------------+---------+
24 rows in set (0.06 sec)

As expected, all 24 accounts are present in the result set. But what happens if you join the account table to one of the specialized customer tables, such as the business table?

mysql> SELECT a.account_id, b.cust_id, b.name
    -> FROM account a INNER JOIN business b
    ->   ON a.cust_id = b.cust_id;
+------------+---------+------------------------+
| account_id | cust_id | name                   |
+------------+---------+------------------------+
|         24 |      10 | Chilton Engineering    |
|         25 |      10 | Chilton Engineering    |
|         27 |      11 | Northeast Cooling Inc. |
|         28 |      12 | Superior Auto Body     |
|         29 |      13 | AAA Insurance Inc.     |
+------------+---------+------------------------+
5 rows in set (0.10 sec)

Instead of 24 rows in the result set, there are now only five. Let’s look in the business table to see why this is:

mysql> SELECT cust_id, name
    -> FROM business;
+---------+------------------------+
| cust_id | name                   |
+---------+------------------------+
|      10 | Chilton Engineering    |
|      11 | Northeast Cooling Inc. |
|      12 | Superior Auto Body     |
|      13 | AAA Insurance Inc.     |
+---------+------------------------+
4 rows in set (0.01 sec)

Of the 13 rows in the customer table, only four are business customers, and since one of the business customers has two accounts, a total of five rows in the account table are linked to business customers.

But what if you want your query to return all the accounts, but to include the business name only if the account is linked to a business customer? This is an example where you would need an outer join between the account and business tables, as in:

mysql> SELECT a.account_id, a.cust_id, b.name
    -> FROM account a LEFT OUTER JOIN business b
    ->   ON a.cust_id = b.cust_id;
+------------+---------+------------------------+
| account_id | cust_id | name                   |
+------------+---------+------------------------+
|          1 |       1 | NULL                   |
|          2 |       1 | NULL                   |
|          3 |       1 | NULL                   |
|          4 |       2 | NULL                   |
|          5 |       2 | NULL                   |
|          7 |       3 | NULL                   |
|          8 |       3 | NULL                   |
|         10 |       4 | NULL                   |
|         11 |       4 | NULL                   |
|         12 |       4 | NULL                   |
|         13 |       5 | NULL                   |
|         14 |       6 | NULL                   |
|         15 |       6 | NULL                   |
|         17 |       7 | NULL                   |
|         18 |       8 | NULL                   |
|         19 |       8 | NULL                   |
|         21 |       9 | NULL                   |
|         22 |       9 | NULL                   |
|         23 |       9 | NULL                   |
|         24 |      10 | Chilton Engineering    |
|         25 |      10 | Chilton Engineering    |
|         27 |      11 | Northeast Cooling Inc. |
|         28 |      12 |  Superior Auto Body     |
|         29 |      13 | AAA Insurance Inc.     |
+------------+---------+------------------------+
24 rows in set (0.04 sec)

An outer join includes all of the rows from one table and includes data from the second table only if matching rows are found. In this case, all rows from the account table are included, since I specified left outer join and the account table is on the left side of the join definition. The name column is null for all rows except for the four business customers (cust_ids 10, 11, 12, and 13). Here’s a similar query with an outer join to the individual table instead of the business table:

mysql> SELECT a.account_id, a.cust_id, i.fname, i.lname
    -> FROM account a LEFT OUTER JOIN individual i
    ->   ON a.cust_id = i.cust_id;
+------------+---------+----------+---------+
| account_id | cust_id | fname    | lname   |
+------------+---------+----------+---------+
|          1 |       1 | James    | Hadley  |
|          2 |       1 | James    | Hadley  |
|          3 |       1 | James    | Hadley  |
|          4 |       2 | Susan    | Tingley |
|          5 |       2 | Susan    | Tingley |
|          7 |       3 | Frank    | Tucker  |
|          8 |       3 | Frank    | Tucker  |
|         10 |       4 | John     | Hayward |
|         11 |       4 | John     | Hayward |
|         12 |       4 | John     | Hayward |
|         13 |       5 | Charles  | Frasier |
|         14 |       6 | John     | Spencer |
|         15 |       6 | John     | Spencer |
|         17 |       7 | Margaret | Young   |
|         18 |       8 | George   | Blake   |
|         19 |       8 | George   | Blake   |
|         21 |       9 | Richard  | Farley  |
|         22 |       9 | Richard  | Farley  |
|         23 |       9 | Richard  | Farley  |
|         24 |      10 | NULL     | NULL    |
|         25 |      10 | NULL     | NULL    |
|         27 |      11 | NULL     | NULL    |
|         28 |      12 | NULL     | NULL    |
|         29 |      13 | NULL     | NULL    |
+------------+---------+----------+---------+
24 rows in set (0.09 sec)

This query is essentially the reverse of the previous query: first and last names are supplied for the individual customers, whereas the columns are null for the business customers.

Left Versus Right Outer Joins

In each of the outer join examples in the previous section, I specified left outer join. The keyword left indicates that the table on the left side of the join is responsible for determining the number of rows in the result set, whereas the table on the right side is used to provide column values whenever a match is found. Consider the following query:

mysql> SELECT c.cust_id, b.name
    -> FROM customer c LEFT OUTER JOIN business b
    ->   ON c.cust_id = b.cust_id;
+---------+------------------------+
| cust_id | name                   |
+---------+------------------------+
|       1 | NULL                   |
|       2 | NULL                   |
|       3 | NULL                   |
|       4 | NULL                   |
|       5 | NULL                   |
|       6 | NULL                   |
|       7 | NULL                   |
|       8 | NULL                   |
|       9 | NULL                   |
|      10 | Chilton Engineering    |
|      11 | Northeast Cooling Inc. |
|      12 | Superior Auto Body     |
|      13 | AAA Insurance Inc.     |
+---------+------------------------+
13 rows in set (0.00 sec)

The from clause specifies a left outer join, so all 13 rows from the customer table are included in the result set, with the business table contributing values to the second column in the result set for the four business customers. If you execute the same query, but indicate right outer join, you would see the following results:

mysql> SELECT c.cust_id, b.name
    -> FROM customer c RIGHT OUTER JOIN business b
    ->   ON c.cust_id = b.cust_id;
+---------+------------------------+
| cust_id | name                   |
+---------+------------------------+
|      10 | Chilton Engineering    |
|      11 | Northeast Cooling Inc. |
|      12 | Superior Auto Body     |
|      13 | AAA Insurance Inc.     |
+---------+------------------------+
4 rows in set (0.00 sec)

The number of rows in the result set is now determined by the number of rows in the business table, which is why there are only four rows in the result set.

Keep in mind that both queries are performing outer joins; the keywords left and right are there just to tell the server which table is allowed to have gaps in the data. If you want to outer-join tables A and B and you want all rows from A with additional columns from B whenever there is matching data, you can specify either A left outer join B or B right outer join A.

Three-Way Outer Joins

In some cases, you may want to outer-join one table with two other tables. For example, you may want a list of all accounts showing either the customer’s first and last names for individuals or the business name for business customers, as in:

mysql> SELECT a.account_id, a.product_cd,
    ->   CONCAT(i.fname, ' ', i.lname) person_name,
    ->   b.name business_name
    -> FROM account a LEFT OUTER JOIN individual i
    ->   ON a.cust_id = i.cust_id
    ->   LEFT OUTER JOIN business b
    ->   ON a.cust_id = b.cust_id;
+------------+------------+-----------------+------------------------+
| account_id | product_cd | person_name     | business_name          |
+------------+------------+-----------------+------------------------+
|          1 | CHK        | James Hadley    | NULL                   |
|          2 | SAV        | James Hadley    | NULL                   |
|          3 | CD         | James Hadley    | NULL                   |
|          4 | CHK        | Susan Tingley   | NULL                   |
|          5 | SAV        | Susan Tingley   | NULL                   |
|          7 | CHK        | Frank Tucker    | NULL                   |
|          8 | MM         | Frank Tucker    | NULL                   |
|         10 | CHK        | John Hayward    | NULL                   |
|         11 | SAV        | John Hayward    | NULL                   |
|         12 | MM         | John Hayward    | NULL                   |
|         13 | CHK        | Charles Frasier | NULL                   |
|         14 | CHK        | John Spencer    | NULL                   |
|         15 | CD         | John Spencer    | NULL                   |
|         17 | CD         | Margaret Young  | NULL                   |
|         18 | CHK        | George Blake    | NULL                   |
|         19 | SAV        | George Blake    | NULL                   |
|         21 | CHK        | Richard Farley  | NULL                   |
|         22 | MM         | Richard Farley  | NULL                   |
|         23 | CD         | Richard Farley  | NULL                   |
|         24 | CHK        | NULL            | Chilton Engineering    |
|         25 | BUS        | NULL            | Chilton Engineering    |
|         27 | BUS        | NULL            | Northeast Cooling Inc. |
|         28 | CHK        | NULL            | Superior Auto Body     |
|         29 | SBL        | NULL            | AAA Insurance Inc.     |
+------------+------------+-----------------+------------------------+
24 rows in set (0.08 sec)

The results include all 24 rows from the account table, along with either a person’s name or a business name coming from the two outer-joined tables.

I don’t know of any restrictions with MySQL regarding the number of tables that can be outer-joined to the same table, but you can always use subqueries to limit the number of joins in your query. For instance, you can rewrite the previous example as follows:

mysql> SELECT account_ind.account_id, account_ind.product_cd,
    ->   account_ind.person_name,
    ->   b.name business_name
    -> FROM
    ->  (SELECT a.account_id, a.product_cd, a.cust_id,
    ->     CONCAT(i.fname, ' ', i.lname) person_name
    ->   FROM account a LEFT OUTER JOIN individual i
    ->     ON a.cust_id = i.cust_id) account_ind
    ->   LEFT OUTER JOIN business b
    ->   ON account_ind.cust_id = b.cust_id;
+------------+------------+-----------------+------------------------+
| account_id | product_cd | person_name     | business_name          |
+------------+------------+-----------------+------------------------+
|          1 | CHK        | James Hadley    | NULL                   |
|          2 | SAV        | James Hadley    | NULL                   |
|          3 | CD         | James Hadley    | NULL                   |
|          4 | CHK        | Susan Tingley   | NULL                   |
|          5 | SAV        | Susan Tingley   | NULL                   |
|          7 | CHK        | Frank Tucker    | NULL                   |
|          8 | MM         | Frank Tucker    | NULL                   |
|         10 | CHK        | John Hayward    | NULL                   |
|         11 | SAV        | John Hayward    | NULL                   |
|         12 | MM         | John Hayward    | NULL                   |
|         13 | CHK        | Charles Frasier | NULL                   |
|         14 | CHK        | John Spencer    | NULL                   |
|         15 | CD         | John Spencer    | NULL                   |
|         17 | CD         | Margaret Young  | NULL                   |
|         18 | CHK        | George Blake    | NULL                   |
|         19 | SAV        | George Blake    | NULL                   |
|         21 | CHK        | Richard Farley  | NULL                   |
|         22 | MM         | Richard Farley  | NULL                   |
|         23 | CD         | Richard Farley  | NULL                   |
|         24 | CHK        | NULL            | Chilton Engineering    |
|         25 | BUS        | NULL            | Chilton Engineering    |
|         27 | BUS        | NULL            | Northeast Cooling Inc. |
|         28 | CHK        | NULL            | Superior Auto Body     |
|         29 | SBL        | NULL            | AAA Insurance Inc.     |
+------------+------------+-----------------+------------------------+
24 rows in set (0.08 sec)

In this version of the query, the individual table is outer-joined to the account table within a subquery named account_ind, the results of which are then outer-joined to the business table. Thus, each query (the subquery and the containing query) uses only a single outer join. If you are using a database other than MySQL, you may need to utilize this strategy if you want to outer-join more than one table.

Self Outer Joins

In Chapter 5, I introduced you to the concept of the self-join, where a table is joined to itself. Here’s a self-join example from Chapter 5, which joins the employee table to itself to generate a list of employees and their supervisors:

mysql> SELECT e.fname, e.lname,
    ->   e_mgr.fname mgr_fname, e_mgr.lname mgr_lname
    -> FROM employee e INNER JOIN employee e_mgr
    ->   ON e.superior_emp_id = e_mgr.emp_id;
+----------+-----------+-----------+-----------+
| fname    | lname     | mgr_fname | mgr_lname |
+----------+-----------+-----------+-----------+
| Susan    | Barker    | Michael   | Smith     |
| Robert   | Tyler     | Michael   | Smith     |
| Susan    | Hawthorne | Robert    | Tyler     |
| John     | Gooding   | Susan     | Hawthorne |
| Helen    | Fleming   | Susan     | Hawthorne |
| Chris    | Tucker    | Helen     | Fleming   |
| Sarah    | Parker    | Helen     | Fleming   |
| Jane     | Grossman  | Helen     | Fleming   |
| Paula    | Roberts   | Susan     | Hawthorne |
| Thomas   | Ziegler   | Paula     | Roberts   |
| Samantha | Jameson   | Paula     | Roberts   |
| John     | Blake     | Susan     | Hawthorne |
| Cindy    | Mason     | John      | Blake     |
| Frank    | Portman   | John      | Blake     |
| Theresa  | Markham   | Susan     | Hawthorne |
| Beth     | Fowler    | Theresa   | Markham   |
| Rick     | Tulman    | Theresa   | Markham   |
+----------+-----------+-----------+-----------+
17 rows in set (0.02 sec)

This query works fine except for one small issue: employees who don’t have a supervisor are left out of the result set. By changing the join from an inner join to an outer join, however, the result set will include all employees, including those without supervisors:

mysql> SELECT e.fname, e.lname,
    ->   e_mgr.fname mgr_fname, e_mgr.lname mgr_lname
    -> FROM employee e LEFT OUTER JOIN employee e_mgr
    ->   ON e.superior_emp_id = e_mgr.emp_id;
+----------+-----------+-----------+-----------+
| fname    | lname     | mgr_fname | mgr_lname |
+----------+-----------+-----------+-----------+
| Michael  | Smith     | NULL      | NULL      |
| Susan    | Barker    | Michael   | Smith     |
| Robert   | Tyler     | Michael   | Smith     |
| Susan    | Hawthorne | Robert    | Tyler     |
| John     | Gooding   | Susan     | Hawthorne |
| Helen    | Fleming   | Susan     | Hawthorne |
| Chris    | Tucker    | Helen     | Fleming   |
| Sarah    | Parker    | Helen     | Fleming   |
| Jane     | Grossman  | Helen     | Fleming   |
| Paula    | Roberts   | Susan     | Hawthorne |
| Thomas   | Ziegler   | Paula     | Roberts   |
| Samantha | Jameson   | Paula     | Roberts   |
| John     | Blake     | Susan     | Hawthorne |
| Cindy    | Mason     | John      | Blake     |
| Frank    | Portman   | John      | Blake     |
| Theresa  | Markham   | Susan     | Hawthorne |
| Beth     | Fowler    | Theresa   | Markham   |
| Rick     | Tulman    | Theresa   | Markham   |
+----------+-----------+-----------+-----------+
18 rows in set (0.00 sec)

The result set now includes Michael Smith, who is the president of the bank and, therefore, does not have a supervisor. The query utilizes a left outer join to generate a list of all employees and, if applicable, their supervisor. If you change the join to be a right outer join, you would see the following results:

mysql>  SELECT e.fname, e.lname,
    ->    e_mgr.fname mgr_fname, e_mgr.lname mgr_lname
    ->  FROM employee e RIGHT OUTER JOIN employee e_mgr
    ->   ON e.superior_emp_id = e_mgr.emp_id;
+----------+-----------+-----------+-----------+
| fname    | lname     | mgr_fname | mgr_lname |
+----------+-----------+-----------+-----------+
| Susan    | Barker    | Michael   | Smith     |
| Robert   | Tyler     | Michael   | Smith     |
| NULL     | NULL      | Susan     | Barker    |
| Susan    | Hawthorne | Robert    | Tyler     |
| John     | Gooding   | Susan     | Hawthorne |
| Helen    | Fleming   | Susan     | Hawthorne |
| Paula    | Roberts   | Susan     | Hawthorne |
| John     | Blake     | Susan     | Hawthorne |
| Theresa  | Markham   | Susan     | Hawthorne |
| NULL     | NULL      | John      | Gooding   |
| Chris    | Tucker    | Helen     | Fleming   |
| Sarah    | Parker    | Helen     | Fleming   |
| Jane     | Grossman  | Helen     | Fleming   |
| NULL     | NULL      | Chris     | Tucker    |
| NULL     | NULL      | Sarah     | Parker    |
| NULL     | NULL      | Jane      | Grossman  |
| Thomas   | Ziegler   | Paula     | Roberts   |
| Samantha | Jameson   | Paula     | Roberts   |
| NULL     | NULL      | Thomas    | Ziegler   |
| NULL     | NULL      | Samantha  | Jameson   |
| Cindy    | Mason     | John      | Blake     |
| Frank    | Portman   | John      | Blake     |
| NULL     | NULL      | Cindy     | Mason     |
| NULL     | NULL      | Frank     | Portman   |
| Beth     | Fowler    | Theresa   | Markham   |
| Rick     | Tulman    | Theresa   | Markham   |
| NULL     | NULL      | Beth      | Fowler    |
| NULL     | NULL      | Rick      | Tulman    |
+----------+-----------+-----------+-----------+
28 rows in set (0.00 sec)

This query shows each supervisor (still the third and fourth columns) along with the set of employees he or she supervises. Therefore, Michael Smith appears twice as supervisor to Susan Barker and Robert Tyler; Susan Barker appears once as a supervisor to nobody (null values in the first and second columns). All 18 employees appear at least once in the third and fourth columns, with some appearing more than once if they supervise more than one employee, making a total of 28 rows in the result set. This is a very different outcome from the previous query, and it was prompted by changing only a single keyword (left to right). Therefore, when using outer joins, make sure you think carefully about whether to specify a left or right outer join.

Cross Joins

Back in Chapter 5, I introduced the concept of a Cartesian product, which is essentially the result of joining multiple tables without specifying any join conditions. Cartesian products are used fairly frequently by accident (e.g., forgetting to add the join condition to the from clause) but are not so common otherwise. If, however, you do intend to generate the Cartesian product of two tables, you should specify a cross join, as in:

mysql> SELECT pt.name, p.product_cd, p.name
    -> FROM product p CROSS JOIN product_type pt;
+-------------------------------+------------+-------------------------+
| name                          | product_cd | name                    |
+-------------------------------+------------+-------------------------+
| Customer Accounts             | AUT        | auto loan               |
| Customer Accounts             | BUS        | business line of credit |
| Customer Accounts             | CD         | certificate of deposit  |
| Customer Accounts             | CHK        | checking account        |
| Customer Accounts             | MM         | money market account    |
| Customer Accounts             | MRT        | home mortgage           |
| Customer Accounts             | SAV        | savings account         |
| Customer Accounts             | SBL        | small business loan     |
| Insurance Offerings           | AUT        | auto loan               |
| Insurance Offerings           | BUS        | business line of credit |
| Insurance Offerings           | CD         | certificate of deposit  |
| Insurance Offerings           | CHK        | checking account        |
| Insurance Offerings           | MM         | money market account    |
| Insurance Offerings           | MRT        | home mortgage           |
| Insurance Offerings           | SAV        | savings account         |
| Insurance Offerings           | SBL        | small business loan     |
| Individual and Business Loans | AUT        | auto loan               |
| Individual and Business Loans | BUS        | business line of credit |
| Individual and Business Loans | CD         | certificate of deposit  |
| Individual and Business Loans | CHK        | checking account        |
| Individual and Business Loans | MM         | money market account    |
| Individual and Business Loans | MRT        | home mortgage           |
| Individual and Business Loans | SAV        | savings account         |
| Individual and Business Loans | SBL        | small business loan     |
+-------------------------------+------------+-------------------------+
24 rows in set (0.00 sec)

This query generates the Cartesian product of the product and product_type tables, resulting in 24 rows (8 product rows × 3 product_type rows). But now that you know what a cross join is and how to specify it, what is it used for? Most SQL books will describe what a cross join is and then tell you that it is seldom useful, but I would like to share with you a situation in which I find the cross join to be quite helpful.

In Chapter 9, I discussed how to use subqueries to fabricate tables. The example I used showed how to build a three-row table that could be joined to other tables. Here’s the fabricated table from the example:

mysql> SELECT 'Small Fry' name, 0 low_limit, 4999.99 high_limit
    -> UNION ALL
    -> SELECT 'Average Joes' name, 5000 low_limit, 9999.99 high_limit
    -> UNION ALL
    -> SELECT 'Heavy Hitters' name, 10000 low_limit, 9999999.99 high_limit;
+---------------+-----------+------------+
| name          | low_limit | high_limit |
+---------------+-----------+------------+
| Small Fry     |         0 |    4999.99 |
| Average Joes  |      5000 |    9999.99 |
| Heavy Hitters |     10000 | 9999999.99 |
+---------------+-----------+------------+
3 rows in set (0.00 sec)

While this table was exactly what was needed for placing customers into three groups based on their aggregate account balance, this strategy of merging single-row tables using the set operator union all doesn’t work very well if you need to fabricate a large table.

Say, for example, that you want to create a query that generates a row for every day in the year 2008, but you don’t have a table in your database that contains a row for every day. Using the strategy from the example in Chapter 9, you could do something like the following:

SELECT '2008-01-01' dt
UNION ALL
SELECT '2008-01-02' dt
UNION ALL
SELECT '2008-01-03' dt
UNION ALL
...
...
...
SELECT '2008-12-29' dt
UNION ALL
SELECT '2008-12-30' dt
UNION ALL
SELECT '2008-12-31' dt

Building a query that merges together the results of 366 queries is a bit tedious, so maybe a different strategy is needed. What if you generate a table with 366 rows (2008 was a leap year) with a single column containing a number between 0 and 366, and then add that number of days to January 1, 2008? Here’s one possible method to generate such a table:

mysql> SELECT ones.num + tens.num + hundreds.num
    -> FROM
    ->  (SELECT 0 num UNION ALL
    ->   SELECT 1 num UNION ALL
    ->   SELECT 2 num UNION ALL
    ->   SELECT 3 num UNION ALL
    ->   SELECT 4 num UNION ALL
    ->   SELECT 5 num UNION ALL
    ->   SELECT 6 num UNION ALL
    ->   SELECT 7 num UNION ALL
    ->   SELECT 8 num UNION ALL
    ->   SELECT 9 num) ones
    ->   CROSS JOIN
    ->  (SELECT 0 num UNION ALL
    ->   SELECT 10 num UNION ALL
    ->   SELECT 20 num UNION ALL
    ->   SELECT 30 num UNION ALL
    ->   SELECT 40 num UNION ALL
    ->   SELECT 50 num UNION ALL
    ->   SELECT 60 num UNION ALL
    ->   SELECT 70 num UNION ALL
    ->   SELECT 80 num UNION ALL
    ->   SELECT 90 num) tens
    ->   CROSS JOIN
    ->  (SELECT 0 num UNION ALL
    ->   SELECT 100 num UNION ALL
    ->   SELECT 200 num UNION ALL
    ->   SELECT 300 num) hundreds;
+------------------------------------+
| ones.num + tens.num + hundreds.num |
+------------------------------------+
|                                  0 |
|                                  1 |
|                                  2 |
|                                  3 |
|                                  4 |
|                                  5 |
|                                  6 |
|                                  7 |
|                                  8 |
|                                  9 |
|                                 10 |
|                                 11 |
|                                 12 |
...
...
...
|                                391 |
|                                392 |
|                                393 |
|                                394 |
|                                395 |
|                                396 |
|                                397 |
|                                398 |
|                                399 |
+------------------------------------+
400 rows in set (0.00 sec)

If you take the Cartesian product of the three sets {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 10, 20, 30, 40, 50, 60, 70, 80, 90}, and {0, 100, 200, 300} and add the values in the three columns, you get a 400-row result set containing all numbers between 0 and 399. While this is more than the 366 rows needed to generate the set of days in 2008, it’s easy enough to get rid of the excess rows, and I’ll show you how shortly.

The next step is to convert the set of numbers to a set of dates. To do this, I will use the date_add() function to add each number in the result set to January 1, 2008. Then I’ll add a filter condition to throw away any dates that venture into 2009:

mysql> SELECT DATE_ADD('2008-01-01',
    ->   INTERVAL (ones.num + tens.num + hundreds.num) DAY) dt
    -> FROM
    ->  (SELECT 0 num UNION ALL
    ->   SELECT 1 num UNION ALL
    ->   SELECT 2 num UNION ALL
    ->   SELECT 3 num UNION ALL
    ->   SELECT 4 num UNION ALL
    ->   SELECT 5 num UNION ALL
    ->   SELECT 6 num UNION ALL
    ->   SELECT 7 num UNION ALL
    ->   SELECT 8 num UNION ALL
    ->   SELECT 9 num) ones
    ->   CROSS JOIN
    ->  (SELECT 0 num UNION ALL
    ->   SELECT 10 num UNION ALL
    ->   SELECT 20 num UNION ALL
    ->   SELECT 30 num UNION ALL
    ->   SELECT 40 num UNION ALL
    ->   SELECT 50 num UNION ALL
    ->   SELECT 60 num UNION ALL
    ->   SELECT 70 num UNION ALL
    ->   SELECT 80 num UNION ALL
    ->   SELECT 90 num) tens
    ->   CROSS JOIN
    ->  (SELECT 0 num UNION ALL
    ->   SELECT 100 num UNION ALL
    ->   SELECT 200 num UNION ALL
    ->   SELECT 300 num) hundreds
    -> WHERE DATE_ADD('2008-01-01',
    ->   INTERVAL (ones.num + tens.num + hundreds.num) DAY) < '2009-01-01'
    -> ORDER BY 1;
+------------+
| dt         |
+------------+
| 2008-01-01 |
| 2008-01-02 |
| 2008-01-03 |
| 2008-01-04 |
| 2008-01-05 |
| 2008-01-06 |
| 2008-01-07 |
| 2008-01-08 |
| 2008-01-09 |
| 2008-01-10 |
...
...
...
| 2008-02-20 |
| 2008-02-21 |
| 2008-02-22 |
| 2008-02-23 |
| 2008-02-24 |
| 2008-02-25 |
| 2008-02-26 |
| 2008-02-27 |
| 2008-02-28 |
| 2008-02-29 |
| 2008-03-01 |
...
...
...
| 2008-12-20 |
| 2008-12-21 |
| 2008-12-22 |
| 2008-12-23 |
| 2008-12-24 |
| 2008-12-25 |
| 2008-12-26 |
| 2008-12-27 |
| 2008-12-28 |
| 2008-12-29 |
| 2008-12-30 |
| 2008-12-31 |
+------------+
366 rows in set (0.01 sec)

The nice thing about this approach is that the result set automatically includes the extra leap day (February 29) without your intervention, since the database server figures it out when it adds 59 days to January 1, 2008.

Now that you have a mechanism for fabricating all the days in 2008, what should you do with it? Well, you might be asked to generate a query that shows every day in 2008 along with the number of banking transactions conducted on that day, the number of accounts opened on that day, and so forth. Here’s an example that answers the first question:

mysql> SELECT days.dt, COUNT(t.txn_id)
    -> FROM transaction t RIGHT OUTER JOIN
    ->  (SELECT DATE_ADD('2008-01-01',
    ->     INTERVAL (ones.num + tens.num + hundreds.num) DAY) dt
    ->   FROM
    ->    (SELECT 0 num UNION ALL
    ->     SELECT 1 num UNION ALL
    ->     SELECT 2 num UNION ALL
    ->     SELECT 3 num UNION ALL
    ->     SELECT 4 num UNION ALL
    ->     SELECT 5 num UNION ALL
    ->     SELECT 6 num UNION ALL
    ->     SELECT 7 num UNION ALL
    ->     SELECT 8 num UNION ALL
    ->     SELECT 9 num) ones
    ->     CROSS JOIN
    ->    (SELECT 0 num UNION ALL
    ->     SELECT 10 num UNION ALL
    ->     SELECT 20 num UNION ALL
    ->     SELECT 30 num UNION ALL
    ->     SELECT 40 num UNION ALL
    ->     SELECT 50 num UNION ALL
    ->     SELECT 60 num UNION ALL
    ->     SELECT 70 num UNION ALL
    ->     SELECT 80 num UNION ALL
    ->     SELECT 90 num) tens
    ->     CROSS JOIN
    ->    (SELECT 0 num UNION ALL
    ->     SELECT 100 num UNION ALL
    ->     SELECT 200 num UNION ALL
    ->     SELECT 300 num) hundreds
    ->   WHERE DATE_ADD('2008-01-01',
    ->     INTERVAL (ones.num + tens.num + hundreds.num) DAY) <
    ->       '2009-01-01') days
    ->   ON days.dt = t.txn_date
    -> GROUP BY days.dt
    -> ORDER BY 1;
+------------+-----------------+
| dt         | COUNT(t.txn_id) |
+------------+-----------------+
| 2008-01-01 |               0 |
| 2008-01-02 |               0 |
| 2008-01-03 |               0 |
| 2008-01-04 |               0 |
| 2008-01-05 |              21 |
| 2008-01-06 |               0 |
| 2008-01-07 |               0 |
| 2008-01-08 |               0 |
| 2008-01-09 |               0 |
| 2008-01-10 |               0 |
| 2008-01-11 |               0 |
| 2008-01-12 |               0 |
| 2008-01-13 |               0 |
| 2008-01-14 |               0 |
| 2008-01-15 |               0 |
...
| 2008-12-31 |               0 |
+------------+-----------------+
366 rows in set (0.03 sec)

This is one of the more interesting queries thus far in the book, in that it includes cross joins, outer joins, a date function, grouping, set operations (union all), and an aggregate function (count()). It is also not the most elegant solution to the given problem, but it should serve as an example of how, with a little creativity and a firm grasp on the language, you can make even a seldom-used feature like cross joins a potent tool in your SQL toolkit.

Natural Joins

If you are lazy (and aren’t we all), you can choose a join type that allows you to name the tables to be joined but lets the database server determine what the join conditions need to be. Known as the natural join, this join type relies on identical column names across multiple tables to infer the proper join conditions. For example, the account table includes a column named cust_id, which is the foreign key to the customer table, whose primary key is also named cust_id. Thus, you can write a query that uses natural join to join the two tables:

mysql> SELECT a.account_id, a.cust_id, c.cust_type_cd, c.fed_id
    -> FROM account a NATURAL JOIN customer c;
+------------+---------+--------------+-------------+
| account_id | cust_id | cust_type_cd | fed_id      |
+------------+---------+--------------+-------------+
|          1 |       1 | I            | 111-11-1111 |
|          2 |       1 | I            | 111-11-1111 |
|          3 |       1 | I            | 111-11-1111 |
|          4 |       2 | I            | 222-22-2222 |
|          5 |       2 | I            | 222-22-2222 |
|          6 |       3 | I            | 333-33-3333 |
|          7 |       3 | I            | 333-33-3333 |
|          8 |       4 | I            | 444-44-4444 |
|          9 |       4 | I            | 444-44-4444 |
|         10 |       4 | I            | 444-44-4444 |
|         11 |       5 | I            | 555-55-5555 |
|         12 |       6 | I            | 666-66-6666 |
|         13 |       6 | I            | 666-66-6666 |
|         14 |       7 | I            | 777-77-7777 |
|         15 |       8 | I            | 888-88-8888 |
|         16 |       8 | I            | 888-88-8888 |
|         17 |       9 | I            | 999-99-9999 |
|         18 |       9 | I            | 999-99-9999 |
|         19 |       9 | I            | 999-99-9999 |
|         20 |      10 | B            | 04-1111111  |
|         21 |      10 | B            | 04-1111111  |
|         22 |      11 | B            | 04-2222222  |
|         23 |      12 | B            | 04-3333333  |
|         24 |      13 | B            | 04-4444444  |
+------------+---------+--------------+-------------+
24 rows in set (0.02 sec)

Because you specified a natural join, the server inspected the table definitions and added the join condition a.cust_id = c.cust_id to join the two tables.

This is all well and good, but what if the columns don’t have the same name across the tables? For example, the account table also has a foreign key to the branch table, but the column in the account table is named open_branch_id instead of just branch_id. Let’s see what happens if I use natural join between the account and branch tables:

mysql> SELECT a.account_id, a.cust_id, a.open_branch_id, b.name
    -> FROM account a NATURAL JOIN branch b;
+------------+---------+----------------+---------------+
| account_id | cust_id | open_branch_id | name          |
+------------+---------+----------------+---------------+
|          1 |       1 |              2 | Headquarters  |
|          1 |       1 |              2 | Woburn Branch |
|          1 |       1 |              2 | Quincy Branch |
|          1 |       1 |              2 | So. NH Branch |
|          2 |       1 |              2 | Headquarters  |
|          2 |       1 |              2 | Woburn Branch |
|          2 |       1 |              2 | Quincy Branch |
|          2 |       1 |              2 | So. NH Branch |
|          3 |       1 |              2 | Headquarters  |
|          3 |       1 |              2 | Woburn Branch |
|          3 |       1 |              2 | Quincy Branch |
|          3 |       1 |              2 | So. NH Branch |
|          4 |       2 |              2 | Headquarters  |
|          4 |       2 |              2 | Woburn Branch |
|          4 |       2 |              2 | Quincy Branch |
|          4 |       2 |              2 | So. NH Branch |
|          5 |       2 |              2 | Headquarters  |
|          5 |       2 |              2 | Woburn Branch |
|          5 |       2 |              2 | Quincy Branch |
|          5 |       2 |              2 | So. NH Branch |
|          7 |       3 |              3 | Headquarters  |
|          7 |       3 |              3 | Woburn Branch |
|          7 |       3 |              3 | Quincy Branch |
|          7 |       3 |              3 | So. NH Branch |
|          8 |       3 |              3 | Headquarters  |
|          8 |       3 |              3 | Woburn Branch |
|          8 |       3 |              3 | Quincy Branch |
|          8 |       3 |              3 | So. NH Branch |
|         10 |       4 |              1 | Headquarters  |
|         10 |       4 |              1 | Woburn Branch |
|         10 |       4 |              1 | Quincy Branch |
|         10 |       4 |              1 | So. NH Branch |
...
...
...
|         24 |      10 |              4 | Headquarters  |
|         24 |      10 |              4 | Woburn Branch |
|         24 |      10 |              4 | Quincy Branch |
|         24 |      10 |              4 | So. NH Branch |
|         25 |      10 |              4 | Headquarters  |
|         25 |      10 |              4 | Woburn Branch |
|         25 |      10 |              4 | Quincy Branch |
|         25 |      10 |              4 | So. NH Branch |
|         27 |      11 |              2 | Headquarters  |
|         27 |      11 |              2 | Woburn Branch |
|         27 |      11 |              2 | Quincy Branch |
|         27 |      11 |              2 | So. NH Branch |
|         28 |      12 |              4 | Headquarters  |
|         28 |      12 |              4 | Woburn Branch |
|         28 |      12 |              4 | Quincy Branch |
|         28 |      12 |              4 | So. NH Branch |
|         29 |      13 |              3 | Headquarters  |
|         29 |      13 |              3 | Woburn Branch |
|         29 |      13 |              3 | Quincy Branch |
|         29 |      13 |              3 | So. NH Branch |
+------------+---------+----------------+---------------+
96 rows in set (0.07 sec)

It looks like something has gone wrong; the query should return no more than 24 rows, since there are 24 rows in the account table. What has happened is that, since the server couldn’t find two identically named columns in the two tables, no join condition was generated and the two tables were cross-joined instead, resulting in 96 rows (24 accounts × 4 branches).

So, is the reduced wear and tear on the old fingers from not having to type the join condition worth the trouble? Absolutely not; you should avoid this join type and use inner joins with explicit join conditions.

Test Your Knowledge

The following exercises test your understanding of outer and cross joins. Please see Appendix C for solutions.

Exercise 10-1

Write a query that returns all product names along with the accounts based on that product (use the product_cd column in the account table to link to the product table). Include all products, even if no accounts have been opened for that product.

Exercise 10-2

Reformulate your query from Exercise 10-1 to use the other outer join type (e.g., if you used a left outer join in Exercise 10-1, use a right outer join this time) such that the results are identical to Exercise 10-1.

Exercise 10-3

Outer-join the account table to both the individual and business tables (via the account.cust_id column) such that the result set contains one row per account. Columns to include are account.account_id, account.product_cd, individual.fname, individual.lname, and business.name.

Exercise 10-4 (Extra Credit)

Devise a query that will generate the set {1, 2, 3,..., 99, 100}. (Hint: use a cross join with at least two from clause subqueries.)