By now, you should be comfortable with the concept of the inner join, which I introduced in Chapter 5. This chapter focuses on other ways in which you can join tables, including the outer join and the cross join.
In all the examples thus far that have included multiple tables, we haven’t been
concerned that the join conditions might fail to find matches for all the rows in
the tables. For example, when joining the account
table to the customer table, I did not mention
the possibility that a value in the cust_id
column of the account table might not match a
value in the cust_id column of the customer table. If that were the case, then some of
the rows in one table or the other would be left out of the result set.
Just to be sure, let’s check the data in the tables. Here are the account_id and cust_id columns from the account
table:
mysql>SELECT account_id, cust_id->FROM account;+------------+---------+ | account_id | cust_id | +------------+---------+ | 1 | 1 | | 2 | 1 | | 3 | 1 | | 4 | 2 | | 5 | 2 | | 7 | 3 | | 8 | 3 | | 10 | 4 | | 11 | 4 | | 12 | 4 | | 13 | 5 | | 14 | 6 | | 15 | 6 | | 17 | 7 | | 18 | 8 | | 19 | 8 | | 21 | 9 | | 22 | 9 | | 23 | 9 | | 24 | 10 | | 25 | 10 | | 27 | 11 | | 28 | 12 | | 29 | 13 | +------------+---------+ 24 rows in set (1.50 sec)
There are 24 accounts spanning 13 different customers, with customer IDs 1 through
13 having at least one account. Here’s the set of customer IDs from the customer table:
mysql>SELECT cust_id->FROM customer;+---------+ | cust_id | +---------+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | | 11 | | 12 | | 13 | +---------+ 13 rows in set (0.02 sec)
There are 13 rows in the customer table with
IDs 1 through 13, so every customer ID is included at least once in the account table. When the two tables are joined on the
cust_id column, therefore, you would expect
all 24 rows to be included in the result set (barring any other filter
conditions):
mysql>SELECT a.account_id, c.cust_id->FROM account a INNER JOIN customer c->ON a.cust_id = c.cust_id;+------------+---------+ | account_id | cust_id | +------------+---------+ | 1 | 1 | | 2 | 1 | | 3 | 1 | | 4 | 2 | | 5 | 2 | | 7 | 3 | | 8 | 3 | | 10 | 4 | | 11 | 4 | | 12 | 4 | | 13 | 5 | | 14 | 6 | | 15 | 6 | | 17 | 7 | | 18 | 8 | | 19 | 8 | | 21 | 9 | | 22 | 9 | | 23 | 9 | | 24 | 10 | | 25 | 10 | | 27 | 11 | | 28 | 12 | | 29 | 13 | +------------+---------+ 24 rows in set (0.06 sec)
As expected, all 24 accounts are present in the result set. But what happens if
you join the account table to one of the
specialized customer tables, such as the business
table?
mysql>SELECT a.account_id, b.cust_id, b.name->FROM account a INNER JOIN business b->ON a.cust_id = b.cust_id;+------------+---------+------------------------+ | account_id | cust_id | name | +------------+---------+------------------------+ | 24 | 10 | Chilton Engineering | | 25 | 10 | Chilton Engineering | | 27 | 11 | Northeast Cooling Inc. | | 28 | 12 | Superior Auto Body | | 29 | 13 | AAA Insurance Inc. | +------------+---------+------------------------+ 5 rows in set (0.10 sec)
Instead of 24 rows in the result set, there are now only five. Let’s look in the
business table to see why this is:
mysql>SELECT cust_id, name->FROM business;+---------+------------------------+ | cust_id | name | +---------+------------------------+ | 10 | Chilton Engineering | | 11 | Northeast Cooling Inc. | | 12 | Superior Auto Body | | 13 | AAA Insurance Inc. | +---------+------------------------+ 4 rows in set (0.01 sec)
Of the 13 rows in the customer table, only four
are business customers, and since one of the business customers has two accounts, a
total of five rows in the account table are
linked to business customers.
But what if you want your query to return all the accounts,
but to include the business name only if the account is linked to a business
customer? This is an example where you would need an outer join
between the account and business tables, as in:
mysql>SELECT a.account_id, a.cust_id, b.name->FROM account a LEFT OUTER JOIN business b->ON a.cust_id = b.cust_id;+------------+---------+------------------------+ | account_id | cust_id | name | +------------+---------+------------------------+ | 1 | 1 | NULL | | 2 | 1 | NULL | | 3 | 1 | NULL | | 4 | 2 | NULL | | 5 | 2 | NULL | | 7 | 3 | NULL | | 8 | 3 | NULL | | 10 | 4 | NULL | | 11 | 4 | NULL | | 12 | 4 | NULL | | 13 | 5 | NULL | | 14 | 6 | NULL | | 15 | 6 | NULL | | 17 | 7 | NULL | | 18 | 8 | NULL | | 19 | 8 | NULL | | 21 | 9 | NULL | | 22 | 9 | NULL | | 23 | 9 | NULL | | 24 | 10 | Chilton Engineering | | 25 | 10 | Chilton Engineering | | 27 | 11 | Northeast Cooling Inc. | | 28 | 12 | Superior Auto Body | | 29 | 13 | AAA Insurance Inc. | +------------+---------+------------------------+ 24 rows in set (0.04 sec)
An outer join includes all of the rows from one table and includes data from the
second table only if matching rows are found. In this case, all rows from the
account table are included, since I specified
left outer join and the account table is on the left side of the join
definition. The name column is null for all rows except for the four business
customers (cust_ids 10, 11, 12, and 13). Here’s a
similar query with an outer join to the individual table instead of the business table:
mysql>SELECT a.account_id, a.cust_id, i.fname, i.lname->FROM account a LEFT OUTER JOIN individual i->ON a.cust_id = i.cust_id;+------------+---------+----------+---------+ | account_id | cust_id | fname | lname | +------------+---------+----------+---------+ | 1 | 1 | James | Hadley | | 2 | 1 | James | Hadley | | 3 | 1 | James | Hadley | | 4 | 2 | Susan | Tingley | | 5 | 2 | Susan | Tingley | | 7 | 3 | Frank | Tucker | | 8 | 3 | Frank | Tucker | | 10 | 4 | John | Hayward | | 11 | 4 | John | Hayward | | 12 | 4 | John | Hayward | | 13 | 5 | Charles | Frasier | | 14 | 6 | John | Spencer | | 15 | 6 | John | Spencer | | 17 | 7 | Margaret | Young | | 18 | 8 | George | Blake | | 19 | 8 | George | Blake | | 21 | 9 | Richard | Farley | | 22 | 9 | Richard | Farley | | 23 | 9 | Richard | Farley | | 24 | 10 | NULL | NULL | | 25 | 10 | NULL | NULL | | 27 | 11 | NULL | NULL | | 28 | 12 | NULL | NULL | | 29 | 13 | NULL | NULL | +------------+---------+----------+---------+ 24 rows in set (0.09 sec)
This query is essentially the reverse of the previous query: first and last names
are supplied for the individual customers, whereas the columns are null for the business customers.
In each of the outer join examples in the previous section, I specified
left outer join. The keyword left indicates that the table on the left side of
the join is responsible for determining the
number of rows in the result set, whereas the table on the right side is used to
provide column values whenever a match is found. Consider the following
query:
mysql>SELECT c.cust_id, b.name->FROM customer c LEFT OUTER JOIN business b->ON c.cust_id = b.cust_id;+---------+------------------------+ | cust_id | name | +---------+------------------------+ | 1 | NULL | | 2 | NULL | | 3 | NULL | | 4 | NULL | | 5 | NULL | | 6 | NULL | | 7 | NULL | | 8 | NULL | | 9 | NULL | | 10 | Chilton Engineering | | 11 | Northeast Cooling Inc. | | 12 | Superior Auto Body | | 13 | AAA Insurance Inc. | +---------+------------------------+ 13 rows in set (0.00 sec)
The from clause specifies a left outer
join, so all 13 rows from the customer table
are included in the result set, with the business table contributing values to the second column in the
result set for the four business customers. If you execute the same query, but
indicate right outer join, you would see the
following results:
mysql>SELECT c.cust_id, b.name->FROM customer c RIGHT OUTER JOIN business b->ON c.cust_id = b.cust_id;+---------+------------------------+ | cust_id | name | +---------+------------------------+ | 10 | Chilton Engineering | | 11 | Northeast Cooling Inc. | | 12 | Superior Auto Body | | 13 | AAA Insurance Inc. | +---------+------------------------+ 4 rows in set (0.00 sec)
The number of rows in the result set is now determined by the number of rows
in the business table, which is why there are
only four rows in the result set.
Keep in mind that both queries are performing outer joins; the keywords
left and right are there just to tell the server which table is allowed to
have gaps in the data. If you want to outer-join tables A and B and you want all
rows from A with additional columns from B whenever there is matching data, you
can specify either A left outer join B or
B right outer join A.
In some cases, you may want to outer-join one table with two other tables. For example, you may want a list of all accounts showing either the customer’s first and last names for individuals or the business name for business customers, as in:
mysql>SELECT a.account_id, a.product_cd,->CONCAT(i.fname, ' ', i.lname) person_name,->b.name business_name->FROM account a LEFT OUTER JOIN individual i->ON a.cust_id = i.cust_id->LEFT OUTER JOIN business b->ON a.cust_id = b.cust_id;+------------+------------+-----------------+------------------------+ | account_id | product_cd | person_name | business_name | +------------+------------+-----------------+------------------------+ | 1 | CHK | James Hadley | NULL | | 2 | SAV | James Hadley | NULL | | 3 | CD | James Hadley | NULL | | 4 | CHK | Susan Tingley | NULL | | 5 | SAV | Susan Tingley | NULL | | 7 | CHK | Frank Tucker | NULL | | 8 | MM | Frank Tucker | NULL | | 10 | CHK | John Hayward | NULL | | 11 | SAV | John Hayward | NULL | | 12 | MM | John Hayward | NULL | | 13 | CHK | Charles Frasier | NULL | | 14 | CHK | John Spencer | NULL | | 15 | CD | John Spencer | NULL | | 17 | CD | Margaret Young | NULL | | 18 | CHK | George Blake | NULL | | 19 | SAV | George Blake | NULL | | 21 | CHK | Richard Farley | NULL | | 22 | MM | Richard Farley | NULL | | 23 | CD | Richard Farley | NULL | | 24 | CHK | NULL | Chilton Engineering | | 25 | BUS | NULL | Chilton Engineering | | 27 | BUS | NULL | Northeast Cooling Inc. | | 28 | CHK | NULL | Superior Auto Body | | 29 | SBL | NULL | AAA Insurance Inc. | +------------+------------+-----------------+------------------------+ 24 rows in set (0.08 sec)
The results include all 24 rows from the account table, along with either a person’s name or a business
name coming from the two outer-joined tables.
I don’t know of any restrictions with MySQL regarding the number of tables that can be outer-joined to the same table, but you can always use subqueries to limit the number of joins in your query. For instance, you can rewrite the previous example as follows:
mysql>SELECT account_ind.account_id, account_ind.product_cd,->account_ind.person_name,->b.name business_name->FROM->(SELECT a.account_id, a.product_cd, a.cust_id,->CONCAT(i.fname, ' ', i.lname) person_name->FROM account a LEFT OUTER JOIN individual i->ON a.cust_id = i.cust_id) account_ind->LEFT OUTER JOIN business b->ON account_ind.cust_id = b.cust_id;+------------+------------+-----------------+------------------------+ | account_id | product_cd | person_name | business_name | +------------+------------+-----------------+------------------------+ | 1 | CHK | James Hadley | NULL | | 2 | SAV | James Hadley | NULL | | 3 | CD | James Hadley | NULL | | 4 | CHK | Susan Tingley | NULL | | 5 | SAV | Susan Tingley | NULL | | 7 | CHK | Frank Tucker | NULL | | 8 | MM | Frank Tucker | NULL | | 10 | CHK | John Hayward | NULL | | 11 | SAV | John Hayward | NULL | | 12 | MM | John Hayward | NULL | | 13 | CHK | Charles Frasier | NULL | | 14 | CHK | John Spencer | NULL | | 15 | CD | John Spencer | NULL | | 17 | CD | Margaret Young | NULL | | 18 | CHK | George Blake | NULL | | 19 | SAV | George Blake | NULL | | 21 | CHK | Richard Farley | NULL | | 22 | MM | Richard Farley | NULL | | 23 | CD | Richard Farley | NULL | | 24 | CHK | NULL | Chilton Engineering | | 25 | BUS | NULL | Chilton Engineering | | 27 | BUS | NULL | Northeast Cooling Inc. | | 28 | CHK | NULL | Superior Auto Body | | 29 | SBL | NULL | AAA Insurance Inc. | +------------+------------+-----------------+------------------------+ 24 rows in set (0.08 sec)
In this version of the query, the individual table is outer-joined to the account table within a subquery named account_ind, the results of which are then outer-joined to the
business table. Thus, each query (the
subquery and the containing query) uses only a single outer join. If you are
using a database other than MySQL, you may need to utilize this strategy if you
want to outer-join more than one table.
In Chapter 5, I introduced you to the concept
of the self-join, where a table is joined to itself. Here’s a self-join example
from Chapter 5, which joins the employee table to itself to generate a list of
employees and their supervisors:
mysql>SELECT e.fname, e.lname,->e_mgr.fname mgr_fname, e_mgr.lname mgr_lname->FROM employee e INNER JOIN employee e_mgr->ON e.superior_emp_id = e_mgr.emp_id;+----------+-----------+-----------+-----------+ | fname | lname | mgr_fname | mgr_lname | +----------+-----------+-----------+-----------+ | Susan | Barker | Michael | Smith | | Robert | Tyler | Michael | Smith | | Susan | Hawthorne | Robert | Tyler | | John | Gooding | Susan | Hawthorne | | Helen | Fleming | Susan | Hawthorne | | Chris | Tucker | Helen | Fleming | | Sarah | Parker | Helen | Fleming | | Jane | Grossman | Helen | Fleming | | Paula | Roberts | Susan | Hawthorne | | Thomas | Ziegler | Paula | Roberts | | Samantha | Jameson | Paula | Roberts | | John | Blake | Susan | Hawthorne | | Cindy | Mason | John | Blake | | Frank | Portman | John | Blake | | Theresa | Markham | Susan | Hawthorne | | Beth | Fowler | Theresa | Markham | | Rick | Tulman | Theresa | Markham | +----------+-----------+-----------+-----------+ 17 rows in set (0.02 sec)
This query works fine except for one small issue: employees who don’t have a supervisor are left out of the result set. By changing the join from an inner join to an outer join, however, the result set will include all employees, including those without supervisors:
mysql>SELECT e.fname, e.lname,->e_mgr.fname mgr_fname, e_mgr.lname mgr_lname->FROM employee e LEFT OUTER JOIN employee e_mgr->ON e.superior_emp_id = e_mgr.emp_id;+----------+-----------+-----------+-----------+ | fname | lname | mgr_fname | mgr_lname | +----------+-----------+-----------+-----------+ | Michael | Smith | NULL | NULL | | Susan | Barker | Michael | Smith | | Robert | Tyler | Michael | Smith | | Susan | Hawthorne | Robert | Tyler | | John | Gooding | Susan | Hawthorne | | Helen | Fleming | Susan | Hawthorne | | Chris | Tucker | Helen | Fleming | | Sarah | Parker | Helen | Fleming | | Jane | Grossman | Helen | Fleming | | Paula | Roberts | Susan | Hawthorne | | Thomas | Ziegler | Paula | Roberts | | Samantha | Jameson | Paula | Roberts | | John | Blake | Susan | Hawthorne | | Cindy | Mason | John | Blake | | Frank | Portman | John | Blake | | Theresa | Markham | Susan | Hawthorne | | Beth | Fowler | Theresa | Markham | | Rick | Tulman | Theresa | Markham | +----------+-----------+-----------+-----------+ 18 rows in set (0.00 sec)
The result set now includes Michael Smith, who is the president of the bank and, therefore, does not have a supervisor. The query utilizes a left outer join to generate a list of all employees and, if applicable, their supervisor. If you change the join to be a right outer join, you would see the following results:
mysql>SELECT e.fname, e.lname,->e_mgr.fname mgr_fname, e_mgr.lname mgr_lname->FROM employee e RIGHT OUTER JOIN employee e_mgr->ON e.superior_emp_id = e_mgr.emp_id;+----------+-----------+-----------+-----------+ | fname | lname | mgr_fname | mgr_lname | +----------+-----------+-----------+-----------+ | Susan | Barker | Michael | Smith | | Robert | Tyler | Michael | Smith | | NULL | NULL | Susan | Barker | | Susan | Hawthorne | Robert | Tyler | | John | Gooding | Susan | Hawthorne | | Helen | Fleming | Susan | Hawthorne | | Paula | Roberts | Susan | Hawthorne | | John | Blake | Susan | Hawthorne | | Theresa | Markham | Susan | Hawthorne | | NULL | NULL | John | Gooding | | Chris | Tucker | Helen | Fleming | | Sarah | Parker | Helen | Fleming | | Jane | Grossman | Helen | Fleming | | NULL | NULL | Chris | Tucker | | NULL | NULL | Sarah | Parker | | NULL | NULL | Jane | Grossman | | Thomas | Ziegler | Paula | Roberts | | Samantha | Jameson | Paula | Roberts | | NULL | NULL | Thomas | Ziegler | | NULL | NULL | Samantha | Jameson | | Cindy | Mason | John | Blake | | Frank | Portman | John | Blake | | NULL | NULL | Cindy | Mason | | NULL | NULL | Frank | Portman | | Beth | Fowler | Theresa | Markham | | Rick | Tulman | Theresa | Markham | | NULL | NULL | Beth | Fowler | | NULL | NULL | Rick | Tulman | +----------+-----------+-----------+-----------+ 28 rows in set (0.00 sec)
This query shows each supervisor (still the third and fourth columns) along
with the set of employees he or she supervises. Therefore, Michael Smith appears
twice as supervisor to Susan Barker and Robert Tyler; Susan Barker appears once
as a supervisor to nobody (null values in the
first and second columns). All 18 employees appear at least once in the third
and fourth columns, with some appearing more than once if they supervise more
than one employee, making a total of 28 rows in the result set. This is a very
different outcome from the previous query, and it was prompted by changing only
a single keyword (left to right). Therefore, when using outer joins, make
sure you think carefully about whether to specify a left or right outer
join.
Back in Chapter 5, I introduced the concept of a
Cartesian product, which is essentially the result of joining multiple tables
without specifying any join conditions. Cartesian products are used fairly
frequently by accident (e.g., forgetting to add the join condition to the from clause) but are not so common otherwise. If,
however, you do intend to generate the Cartesian product of two
tables, you should specify a cross join, as in:
mysql>SELECT pt.name, p.product_cd, p.name->FROM product p CROSS JOIN product_type pt;+-------------------------------+------------+-------------------------+ | name | product_cd | name | +-------------------------------+------------+-------------------------+ | Customer Accounts | AUT | auto loan | | Customer Accounts | BUS | business line of credit | | Customer Accounts | CD | certificate of deposit | | Customer Accounts | CHK | checking account | | Customer Accounts | MM | money market account | | Customer Accounts | MRT | home mortgage | | Customer Accounts | SAV | savings account | | Customer Accounts | SBL | small business loan | | Insurance Offerings | AUT | auto loan | | Insurance Offerings | BUS | business line of credit | | Insurance Offerings | CD | certificate of deposit | | Insurance Offerings | CHK | checking account | | Insurance Offerings | MM | money market account | | Insurance Offerings | MRT | home mortgage | | Insurance Offerings | SAV | savings account | | Insurance Offerings | SBL | small business loan | | Individual and Business Loans | AUT | auto loan | | Individual and Business Loans | BUS | business line of credit | | Individual and Business Loans | CD | certificate of deposit | | Individual and Business Loans | CHK | checking account | | Individual and Business Loans | MM | money market account | | Individual and Business Loans | MRT | home mortgage | | Individual and Business Loans | SAV | savings account | | Individual and Business Loans | SBL | small business loan | +-------------------------------+------------+-------------------------+ 24 rows in set (0.00 sec)
This query generates the Cartesian product of the product and product_type tables,
resulting in 24 rows (8 product rows × 3 product_type rows). But now that you know what a cross
join is and how to specify it, what is it used for? Most SQL books will describe
what a cross join is and then tell you that it is seldom useful, but I would like to
share with you a situation in which I find the cross join to be quite
helpful.
In Chapter 9, I discussed how to use subqueries to fabricate tables. The example I used showed how to build a three-row table that could be joined to other tables. Here’s the fabricated table from the example:
mysql>SELECT 'Small Fry' name, 0 low_limit, 4999.99 high_limit->UNION ALL->SELECT 'Average Joes' name, 5000 low_limit, 9999.99 high_limit->UNION ALL->SELECT 'Heavy Hitters' name, 10000 low_limit, 9999999.99 high_limit;+---------------+-----------+------------+ | name | low_limit | high_limit | +---------------+-----------+------------+ | Small Fry | 0 | 4999.99 | | Average Joes | 5000 | 9999.99 | | Heavy Hitters | 10000 | 9999999.99 | +---------------+-----------+------------+ 3 rows in set (0.00 sec)
While this table was exactly what was needed for placing customers into three
groups based on their aggregate account balance, this strategy of merging single-row
tables using the set operator union all doesn’t
work very well if you need to fabricate a large table.
Say, for example, that you want to create a query that generates a row for every day in the year 2008, but you don’t have a table in your database that contains a row for every day. Using the strategy from the example in Chapter 9, you could do something like the following:
SELECT '2008-01-01' dt UNION ALL SELECT '2008-01-02' dt UNION ALL SELECT '2008-01-03' dt UNION ALL ... ... ... SELECT '2008-12-29' dt UNION ALL SELECT '2008-12-30' dt UNION ALL SELECT '2008-12-31' dt
Building a query that merges together the results of 366 queries is a bit tedious, so maybe a different strategy is needed. What if you generate a table with 366 rows (2008 was a leap year) with a single column containing a number between 0 and 366, and then add that number of days to January 1, 2008? Here’s one possible method to generate such a table:
mysql>SELECT ones.num + tens.num + hundreds.num->FROM->(SELECT 0 num UNION ALL->SELECT 1 num UNION ALL->SELECT 2 num UNION ALL->SELECT 3 num UNION ALL->SELECT 4 num UNION ALL->SELECT 5 num UNION ALL->SELECT 6 num UNION ALL->SELECT 7 num UNION ALL->SELECT 8 num UNION ALL->SELECT 9 num) ones->CROSS JOIN->(SELECT 0 num UNION ALL->SELECT 10 num UNION ALL->SELECT 20 num UNION ALL->SELECT 30 num UNION ALL->SELECT 40 num UNION ALL->SELECT 50 num UNION ALL->SELECT 60 num UNION ALL->SELECT 70 num UNION ALL->SELECT 80 num UNION ALL->SELECT 90 num) tens->CROSS JOIN->(SELECT 0 num UNION ALL->SELECT 100 num UNION ALL->SELECT 200 num UNION ALL->SELECT 300 num) hundreds;+------------------------------------+ | ones.num + tens.num + hundreds.num | +------------------------------------+ | 0 | | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | | 11 | | 12 | ... ... ... | 391 | | 392 | | 393 | | 394 | | 395 | | 396 | | 397 | | 398 | | 399 | +------------------------------------+ 400 rows in set (0.00 sec)
If you take the Cartesian product of the three sets {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 10, 20, 30, 40, 50, 60, 70, 80, 90}, and {0, 100, 200, 300} and add the values in the three columns, you get a 400-row result set containing all numbers between 0 and 399. While this is more than the 366 rows needed to generate the set of days in 2008, it’s easy enough to get rid of the excess rows, and I’ll show you how shortly.
The next step is to convert the set of numbers to a set of dates. To do this, I
will use the date_add() function to add each
number in the result set to January 1, 2008. Then I’ll add a filter condition to
throw away any dates that venture into 2009:
mysql>SELECT DATE_ADD('2008-01-01',->INTERVAL (ones.num + tens.num + hundreds.num) DAY) dt->FROM->(SELECT 0 num UNION ALL->SELECT 1 num UNION ALL->SELECT 2 num UNION ALL->SELECT 3 num UNION ALL->SELECT 4 num UNION ALL->SELECT 5 num UNION ALL->SELECT 6 num UNION ALL->SELECT 7 num UNION ALL->SELECT 8 num UNION ALL->SELECT 9 num) ones->CROSS JOIN->(SELECT 0 num UNION ALL->SELECT 10 num UNION ALL->SELECT 20 num UNION ALL->SELECT 30 num UNION ALL->SELECT 40 num UNION ALL->SELECT 50 num UNION ALL->SELECT 60 num UNION ALL->SELECT 70 num UNION ALL->SELECT 80 num UNION ALL->SELECT 90 num) tens->CROSS JOIN->(SELECT 0 num UNION ALL->SELECT 100 num UNION ALL->SELECT 200 num UNION ALL->SELECT 300 num) hundreds->WHERE DATE_ADD('2008-01-01',->INTERVAL (ones.num + tens.num + hundreds.num) DAY) < '2009-01-01'->ORDER BY 1;+------------+ | dt | +------------+ | 2008-01-01 | | 2008-01-02 | | 2008-01-03 | | 2008-01-04 | | 2008-01-05 | | 2008-01-06 | | 2008-01-07 | | 2008-01-08 | | 2008-01-09 | | 2008-01-10 | ... ... ... | 2008-02-20 | | 2008-02-21 | | 2008-02-22 | | 2008-02-23 | | 2008-02-24 | | 2008-02-25 | | 2008-02-26 | | 2008-02-27 | | 2008-02-28 | | 2008-02-29 | | 2008-03-01 | ... ... ... | 2008-12-20 | | 2008-12-21 | | 2008-12-22 | | 2008-12-23 | | 2008-12-24 | | 2008-12-25 | | 2008-12-26 | | 2008-12-27 | | 2008-12-28 | | 2008-12-29 | | 2008-12-30 | | 2008-12-31 | +------------+ 366 rows in set (0.01 sec)
The nice thing about this approach is that the result set automatically includes the extra leap day (February 29) without your intervention, since the database server figures it out when it adds 59 days to January 1, 2008.
Now that you have a mechanism for fabricating all the days in 2008, what should you do with it? Well, you might be asked to generate a query that shows every day in 2008 along with the number of banking transactions conducted on that day, the number of accounts opened on that day, and so forth. Here’s an example that answers the first question:
mysql>SELECT days.dt, COUNT(t.txn_id)->FROM transaction t RIGHT OUTER JOIN->(SELECT DATE_ADD('2008-01-01',->INTERVAL (ones.num + tens.num + hundreds.num) DAY) dt->FROM->(SELECT 0 num UNION ALL->SELECT 1 num UNION ALL->SELECT 2 num UNION ALL->SELECT 3 num UNION ALL->SELECT 4 num UNION ALL->SELECT 5 num UNION ALL->SELECT 6 num UNION ALL->SELECT 7 num UNION ALL->SELECT 8 num UNION ALL->SELECT 9 num) ones->CROSS JOIN->(SELECT 0 num UNION ALL->SELECT 10 num UNION ALL->SELECT 20 num UNION ALL->SELECT 30 num UNION ALL->SELECT 40 num UNION ALL->SELECT 50 num UNION ALL->SELECT 60 num UNION ALL->SELECT 70 num UNION ALL->SELECT 80 num UNION ALL->SELECT 90 num) tens->CROSS JOIN->(SELECT 0 num UNION ALL->SELECT 100 num UNION ALL->SELECT 200 num UNION ALL->SELECT 300 num) hundreds->WHERE DATE_ADD('2008-01-01',->INTERVAL (ones.num + tens.num + hundreds.num) DAY) <->'2009-01-01') days->ON days.dt = t.txn_date->GROUP BY days.dt->ORDER BY 1;+------------+-----------------+ | dt | COUNT(t.txn_id) | +------------+-----------------+ | 2008-01-01 | 0 | | 2008-01-02 | 0 | | 2008-01-03 | 0 | | 2008-01-04 | 0 | | 2008-01-05 | 21 | | 2008-01-06 | 0 | | 2008-01-07 | 0 | | 2008-01-08 | 0 | | 2008-01-09 | 0 | | 2008-01-10 | 0 | | 2008-01-11 | 0 | | 2008-01-12 | 0 | | 2008-01-13 | 0 | | 2008-01-14 | 0 | | 2008-01-15 | 0 | ... | 2008-12-31 | 0 | +------------+-----------------+ 366 rows in set (0.03 sec)
This is one of the more interesting queries thus far in the book, in that it
includes cross joins, outer joins, a date function, grouping, set operations
(union all), and an aggregate function
(count()). It is also not the most elegant
solution to the given problem, but it should serve as an example of how, with a
little creativity and a firm grasp on the language, you can make even a seldom-used
feature like cross joins a potent tool in your SQL toolkit.
If you are lazy (and aren’t we all), you can choose a join type that allows you to
name the tables to be joined but lets the database server determine what the join
conditions need to be. Known as the natural join, this join
type relies on identical column names across multiple tables to infer the proper
join conditions. For example, the account table
includes a column named cust_id, which is the
foreign key to the customer table, whose primary
key is also named cust_id. Thus, you can write a
query that uses natural join to join the two
tables:
mysql>SELECT a.account_id, a.cust_id, c.cust_type_cd, c.fed_id->FROM account a NATURAL JOIN customer c;+------------+---------+--------------+-------------+ | account_id | cust_id | cust_type_cd | fed_id | +------------+---------+--------------+-------------+ | 1 | 1 | I | 111-11-1111 | | 2 | 1 | I | 111-11-1111 | | 3 | 1 | I | 111-11-1111 | | 4 | 2 | I | 222-22-2222 | | 5 | 2 | I | 222-22-2222 | | 6 | 3 | I | 333-33-3333 | | 7 | 3 | I | 333-33-3333 | | 8 | 4 | I | 444-44-4444 | | 9 | 4 | I | 444-44-4444 | | 10 | 4 | I | 444-44-4444 | | 11 | 5 | I | 555-55-5555 | | 12 | 6 | I | 666-66-6666 | | 13 | 6 | I | 666-66-6666 | | 14 | 7 | I | 777-77-7777 | | 15 | 8 | I | 888-88-8888 | | 16 | 8 | I | 888-88-8888 | | 17 | 9 | I | 999-99-9999 | | 18 | 9 | I | 999-99-9999 | | 19 | 9 | I | 999-99-9999 | | 20 | 10 | B | 04-1111111 | | 21 | 10 | B | 04-1111111 | | 22 | 11 | B | 04-2222222 | | 23 | 12 | B | 04-3333333 | | 24 | 13 | B | 04-4444444 | +------------+---------+--------------+-------------+ 24 rows in set (0.02 sec)
Because you specified a natural join, the server inspected the table definitions
and added the join condition a.cust_id =
c.cust_id to join the two tables.
This is all well and good, but what if the columns don’t have the same name across
the tables? For example, the account table also
has a foreign key to the branch table, but the
column in the account table is named open_branch_id instead of just branch_id. Let’s see what happens if I use natural join between the account and branch tables:
mysql>SELECT a.account_id, a.cust_id, a.open_branch_id, b.name->FROM account a NATURAL JOIN branch b;+------------+---------+----------------+---------------+ | account_id | cust_id | open_branch_id | name | +------------+---------+----------------+---------------+ | 1 | 1 | 2 | Headquarters | | 1 | 1 | 2 | Woburn Branch | | 1 | 1 | 2 | Quincy Branch | | 1 | 1 | 2 | So. NH Branch | | 2 | 1 | 2 | Headquarters | | 2 | 1 | 2 | Woburn Branch | | 2 | 1 | 2 | Quincy Branch | | 2 | 1 | 2 | So. NH Branch | | 3 | 1 | 2 | Headquarters | | 3 | 1 | 2 | Woburn Branch | | 3 | 1 | 2 | Quincy Branch | | 3 | 1 | 2 | So. NH Branch | | 4 | 2 | 2 | Headquarters | | 4 | 2 | 2 | Woburn Branch | | 4 | 2 | 2 | Quincy Branch | | 4 | 2 | 2 | So. NH Branch | | 5 | 2 | 2 | Headquarters | | 5 | 2 | 2 | Woburn Branch | | 5 | 2 | 2 | Quincy Branch | | 5 | 2 | 2 | So. NH Branch | | 7 | 3 | 3 | Headquarters | | 7 | 3 | 3 | Woburn Branch | | 7 | 3 | 3 | Quincy Branch | | 7 | 3 | 3 | So. NH Branch | | 8 | 3 | 3 | Headquarters | | 8 | 3 | 3 | Woburn Branch | | 8 | 3 | 3 | Quincy Branch | | 8 | 3 | 3 | So. NH Branch | | 10 | 4 | 1 | Headquarters | | 10 | 4 | 1 | Woburn Branch | | 10 | 4 | 1 | Quincy Branch | | 10 | 4 | 1 | So. NH Branch | ... ... ... | 24 | 10 | 4 | Headquarters | | 24 | 10 | 4 | Woburn Branch | | 24 | 10 | 4 | Quincy Branch | | 24 | 10 | 4 | So. NH Branch | | 25 | 10 | 4 | Headquarters | | 25 | 10 | 4 | Woburn Branch | | 25 | 10 | 4 | Quincy Branch | | 25 | 10 | 4 | So. NH Branch | | 27 | 11 | 2 | Headquarters | | 27 | 11 | 2 | Woburn Branch | | 27 | 11 | 2 | Quincy Branch | | 27 | 11 | 2 | So. NH Branch | | 28 | 12 | 4 | Headquarters | | 28 | 12 | 4 | Woburn Branch | | 28 | 12 | 4 | Quincy Branch | | 28 | 12 | 4 | So. NH Branch | | 29 | 13 | 3 | Headquarters | | 29 | 13 | 3 | Woburn Branch | | 29 | 13 | 3 | Quincy Branch | | 29 | 13 | 3 | So. NH Branch | +------------+---------+----------------+---------------+ 96 rows in set (0.07 sec)
It looks like something has gone wrong; the query should return no more than 24
rows, since there are 24 rows in the account
table. What has happened is that, since the server couldn’t find two identically
named columns in the two tables, no join condition was generated and the two tables
were cross-joined instead, resulting in 96 rows (24 accounts × 4 branches).
So, is the reduced wear and tear on the old fingers from not having to type the join condition worth the trouble? Absolutely not; you should avoid this join type and use inner joins with explicit join conditions.
The following exercises test your understanding of outer and cross joins. Please see Appendix C for solutions.
Write a query that returns all product names along with the accounts based on
that product (use the product_cd column in
the account table to link to the product table). Include all products, even if no
accounts have been opened for that product.
Reformulate your query from Exercise 10-1 to use the other outer join type (e.g., if you used a left outer join in Exercise 10-1, use a right outer join this time) such that the results are identical to Exercise 10-1.
Outer-join the account table to both the
individual and business tables (via the account.cust_id column) such that the result set contains one row
per account. Columns to include are account.account_id, account.product_cd, individual.fname, individual.lname, and business.name.