Learning SQL, 2nd Edition

Chapter 5. Querying Multiple Tables

Back in Chapter 2, I demonstrated how related concepts are broken into separate pieces through a process known as normalization. The end result of this exercise was two tables: person and favorite_food. If, however, you want to generate a single report showing a person’s name, address, and favorite foods, you will need a mechanism to bring the data from these two tables back together again; this mechanism is known as a join, and this chapter concentrates on the simplest and most common join, the inner join. Chapter 10 demonstrates all of the different join types.

What Is a Join?

Queries against a single table are certainly not rare, but you will find that most of your queries will require two, three, or even more tables. To illustrate, let’s look at the definitions for the employee and department tables and then define a query that retrieves data from both tables:

mysql> DESC employee;
+--------------------+----------------------+------+-----+---------+
| Field              | Type                 | Null | Key | Default |
+--------------------+----------------------+------+-----+---------+
| emp_id             | smallint(5) unsigned | NO   | PRI | NULL    |
| fname              | varchar(20)          | NO   |     | NULL    |
| lname              | varchar(20)          | NO   |     | NULL    |
| start_date         | date                 | NO   |     | NULL    |
| end_date           | date                 | YES  |     | NULL    |
| superior_emp_id    | smallint(5) unsigned | YES  | MUL | NULL    |
| dept_id            | smallint(5) unsigned | YES  | MUL | NULL    |
| title              | varchar(20)          | YES  |     | NULL    |
| assigned_branch_id | smallint(5) unsigned | YES  | MUL | NULL    |
+--------------------+----------------------+------+-----+---------+
9 rows in set (0.11 sec)

 mysql> DESC department;
+---------+----------------------+------+-----+---------+
| Field   | Type                 | Null | Key | Default |
+---------+----------------------+------+-----+---------+
| dept_id | smallint(5) unsigned | No   | PRI | NULL    |
| name    | varchar(20)          | No   |     | NULL    |
+---------+----------------------+------+-----+---------+
2 rows in set (0.03 sec)

Let’s say you want to retrieve the first and last names of each employee along with the name of the department to which each employee is assigned. Your query will therefore need to retrieve the employee.fname, employee.lname, and department.name columns. But how can you retrieve data from both tables in the same query? The answer lies in the employee.dept_id column, which holds the ID of the department to which each employee is assigned (in more formal terms, the employee.dept_id column is the foreign key to the department table). The query, which you will see shortly, instructs the server to use the employee.dept_id column as the bridge between the employee and department tables, thereby allowing columns from both tables to be included in the query’s result set. This type of operation is known as a join.

Cartesian Product

The easiest way to start is to put the employee and department tables into the from clause of a query and see what happens. Here’s a query that retrieves the employee’s first and last names along with the department name, with a from clause naming both tables separated by the join keyword:

mysql> SELECT e.fname, e.lname, d.name
    -> FROM employee e JOIN department d;
+----------+-----------+----------------+
| fname    | lname     | name           |
+----------+-----------+----------------+
| Michael  | Smith     | Operations     |
| Michael  | Smith     | Loans          |
| Michael  | Smith     | Administration |
| Susan    | Barker    | Operations     |
| Susan    | Barker    | Loans          |
| Susan    | Barker    | Administration |
| Robert   | Tyler     | Operations     |
| Robert   | Tyler     | Loans          |
| Robert   | Tyler     | Administration |
| Susan    | Hawthorne | Operations     |
| Susan    | Hawthorne | Loans          |
| Susan    | Hawthorne | Administration |
| John     | Gooding   | Operations     |
| John     | Gooding   | Loans          |
| John     | Gooding   | Administration |
| Helen    | Fleming   | Operations     |
| Helen    | Fleming   | Loans          |
| Helen    | Fleming   | Administration |
| Chris    | Tucker    | Operations     |
| Chris    | Tucker    | Loans          |
| Chris    | Tucker    | Administration |
| Sarah    | Parker    | Operations     |
| Sarah    | Parker    | Loans          |
| Sarah    | Parker    | Administration |
| Jane     | Grossman  | Operations     |
| Jane     | Grossman  | Loans          |
| Jane     | Grossman  | Administration |
| Paula    | Roberts   | Operations     |
| Paula    | Roberts   | Loans          |
| Paula    | Roberts   | Administration |
| Thomas   | Ziegler   | Operations     |
| Thomas   | Ziegler   | Loans          |
| Thomas   | Ziegler   | Administration |
| Samantha | Jameson   | Operations     |
| Samantha | Jameson   | Loans          |
| Samantha | Jameson   | Administration |
| John     | Blake     | Operations     |
| John     | Blake     | Loans          |
| John     | Blake     | Administration |
| Cindy    | Mason     | Operations     |
| Cindy    | Mason     | Loans          |
| Cindy    | Mason     | Administration |
| Frank    | Portman   | Operations     |
| Frank    | Portman   | Loans          |
| Frank    | Portman   | Administration |
| Theresa  | Markham   | Operations     |
| Theresa  | Markham   | Loans          |
| Theresa  | Markham   | Administration |
| Beth     | Fowler    | Operations     |
| Beth     | Fowler    | Loans          |
| Beth     | Fowler    | Administration |
| Rick     | Tulman    | Operations     |
| Rick     | Tulman    | Loans          |
| Rick     | Tulman    | Administration |
+----------+-----------+----------------+
54 rows in set (0.23 sec)

Hmmm…there are only 18 employees and 3 different departments, so how did the result set end up with 54 rows? Looking more closely, you can see that the set of 18 employees is repeated three times, with all the data identical except for the department name. Because the query didn’t specify how the two tables should be joined, the database server generated the Cartesian product, which is every permutation of the two tables (18 employees × 3 departments = 54 permutations). This type of join is known as a cross join, and it is rarely used (on purpose, at least). Cross joins are one of the join types that we study in Chapter 10.

Inner Joins

To modify the previous query so that only 18 rows are included in the result set (one for each employee), you need to describe how the two tables are related. Earlier, I showed that the employee.dept_id column serves as the link between the two tables, so this information needs to be added to the on subclause of the from clause:

mysql> SELECT e.fname, e.lname, d.name
    -> FROM employee e JOIN department d
    ->   ON e.dept_id = d.dept_id;
+----------+-----------+----------------+
| fname    | lname     | name           |
+----------+-----------+----------------+
| Michael  | Smith     | Administration |
| Susan    | Barker    | Administration |
| Robert   | Tyler     | Administration |
| Susan    | Hawthorne | Operations     |
| John     | Gooding   | Loans          |
| Helen    | Fleming   | Operations     |
| Chris    | Tucker    | Operations     |
| Sarah    | Parker    | Operations     |
| Jane     | Grossman  | Operations     |
| Paula    | Roberts   | Operations     |
| Thomas   | Ziegler   | Operations     |
| Samantha | Jameson   | Operations     |
| John     | Blake     | Operations     |
| Cindy    | Mason     | Operations     |
| Frank    | Portman   | Operations     |
| Theresa  | Markham   | Operations     |
| Beth     | Fowler    | Operations     |
| Rick     | Tulman    | Operations     |
+----------+-----------+----------------+
18 rows in set (0.00 sec)

Instead of 54 rows, you now have the expected 18 rows due to the addition of the on subclause, which instructs the server to join the employee and department tables by using the dept_id column to traverse from one table to the other. For example, Susan Hawthorne’s row in the employee table contains a value of 1 in the dept_id column (not shown in the example). The server uses this value to look up the row in the department table having a value of 1 in its dept_id column and then retrieves the value 'Operations' from the name column in that row.

If a value exists for the dept_id column in one table but not the other, then the join fails for the rows containing that value and those rows are excluded from the result set. This type of join is known as an inner join, and it is the most commonly used type of join. To clarify, if the department table contains a fourth row for the marketing department, but no employees have been assigned to that department, then the marketing department would not be included in the result set. Likewise, if some of the employees had been assigned to department ID 99, which doesn’t exist in the department table, then these employees would be left out of the result set. If you want to include all rows from one table or the other regardless of whether a match exists, you need to specify an outer join, but we cover this later in the book.

In the previous example, I did not specify in the from clause which type of join to use. However, when you wish to join two tables using an inner join, you should explicitly specify this in your from clause; here’s the same example, with the addition of the join type (note the keyword INNER):

mysql> SELECT e.fname, e.lname, d.name
    -> FROM employee e INNER JOIN department d
    ->   ON e.dept_id = d.dept_id;
+----------+-----------+----------------+
| fname    | lname     | name           |
+----------+-----------+----------------+
| Michael  | Smith     | Administration |
| Susan    | Barker    | Administration |
| Robert   | Tyler     | Administration |
| Susan    | Hawthorne | Operations     |
| John     | Gooding   | Loans          |
| Helen    | Fleming   | Operations     |
| Chris    | Tucker    | Operations     |
| Sarah    | Parker    | Operations     |
| Jane     | Grossman  | Operations     |
| Paula    | Roberts   | Operations     |
| Thomas   | Ziegler   | Operations     |
| Samantha | Jameson   | Operations     |
| John     | Blake     | Operations     |
| Cindy    | Mason     | Operations     |
| Frank    | Portman   | Operations     |
| Theresa  | Markham   | Operations     |
| Beth     | Fowler    | Operations     |
| Rick     | Tulman    | Operations     |
+----------+-----------+----------------+
18 rows in set (0.00 sec)

If you do not specify the type of join, then the server will do an inner join by default. As you will see later in the book, however, there are several types of joins, so you should get in the habit of specifying the exact type of join that you require.

If the names of the columns used to join the two tables are identical, which is true in the previous query, you can use the using subclause instead of the on subclause, as in:

mysql> SELECT e.fname, e.lname, d.name
    -> FROM employee e INNER JOIN department d
    ->   USING (dept_id);
+----------+-----------+----------------+
| fname    | lname     | name           |
+----------+-----------+----------------+
| Michael  | Smith     | Administration |
| Susan    | Barker    | Administration |
| Robert   | Tyler     | Administration |
| Susan    | Hawthorne | Operations     |
| John     | Gooding   | Loans          |
| Helen    | Fleming   | Operations     |
| Chris    | Tucker    | Operations     |
| Sarah    | Parker    | Operations     |
| Jane     | Grossman  | Operations     |
| Paula    | Roberts   | Operations     |
| Thomas   | Ziegler   | Operations     |
| Samantha | Jameson   | Operations     |
| John     | Blake     | Operations     |
| Cindy    | Mason     | Operations     |
| Frank    | Portman   | Operations     |
| Theresa  | Markham   | Operations     |
| Beth     | Fowler    | Operations     |
| Rick     | Tulman    | Operations     |
+----------+-----------+----------------+
18 rows in set (0.01 sec)

Since using is a shorthand notation that you can use in only a specific situation, I prefer always to use the on subclause to avoid confusion.

The ANSI Join Syntax

The notation used throughout this book for joining tables was introduced in the SQL92 version of the ANSI SQL standard. All the major databases (Oracle Database, Microsoft SQL Server, MySQL, IBM DB2 Universal Database, and Sybase Adaptive Server) have adopted the SQL92 join syntax. Because most of these servers have been around since before the release of the SQL92 specification, they all include an older join syntax as well. For example, all these servers would understand the following variation of the previous query:

mysql> SELECT e.fname, e.lname, d.name
    -> FROM employee e, department d
    -> WHERE e.dept_id = d.dept_id;
+----------+-----------+----------------+
| fname    | lname     | name           |
+----------+-----------+----------------+
| Michael  | Smith     | Administration |
| Susan    | Barker    | Administration |
| Robert   | Tyler     | Administration |
| Susan    | Hawthorne | Operations     |
| John     | Gooding   | Loans          |
| Helen    | Fleming   | Operations     |
| Chris    | Tucker    | Operations     |
| Sarah    | Parker    | Operations     |
| Jane     | Grossman  | Operations     |
| Paula    | Roberts   | Operations     |
| Thomas   | Ziegler   | Operations     |
| Samantha | Jameson   | Operations     |
| John     | Blake     | Operations     |
| Cindy    | Mason     | Operations     |
| Frank    | Portman   | Operations     |
| Theresa  | Markham   | Operations     |
| Beth     | Fowler    | Operations     |
| Rick     | Tulman    | Operations     |
+----------+-----------+----------------+
18 rows in set (0.01 sec)

This older method of specifying joins does not include the on subclause; instead, tables are named in the from clause separated by commas, and join conditions are included in the where clause. While you may decide to ignore the SQL92 syntax in favor of the older join syntax, the ANSI join syntax has the following advantages:

Join conditions and filter conditions are separated into two different clauses (the on subclause and the where clause, respectively), making a query easier to understand.
The join conditions for each pair of tables are contained in their own on clause, making it less likely that part of a join will be mistakenly omitted.
Queries that use the SQL92 join syntax are portable across database servers, whereas the older syntax is slightly different across the different servers.

The benefits of the SQL92 join syntax are easier to identify for complex queries that include both join and filter conditions. Consider the following query, which returns all accounts opened by experienced tellers (hired prior to 2007) currently assigned to the Woburn branch:

mysql> SELECT a.account_id, a.cust_id, a.open_date, a.product_cd
    -> FROM account a, branch b, employee e
    -> WHERE a.open_emp_id = e.emp_id
    ->   AND e.start_date < '2007-01-01'
    ->   AND e.assigned_branch_id = b.branch_id
    ->   AND (e.title = 'Teller' OR e.title = 'Head Teller')
    ->   AND b.name = 'Woburn Branch';
+------------+---------+------------+------------+
| account_id | cust_id | open_date  | product_cd |
+------------+---------+------------+------------+
|          1 |       1 | 2000-01-15 | CHK        |
|          2 |       1 | 2000-01-15 | SAV        |
|          3 |       1 | 2004-06-30 | CD         |
|          4 |       2 | 2001-03-12 | CHK        |
|          5 |       2 | 2001-03-12 | SAV        |
|         17 |       7 | 2004-01-12 | CD         |
|         27 |      11 | 2004-03-22 | BUS        |
+------------+---------+------------+------------+
7 rows in set (0.00 sec)

With this query, it is not so easy to determine which conditions in the where clause are join conditions and which are filter conditions. It is also not readily apparent which type of join is being employed (to identify the type of join, you would need to look closely at the join conditions in the where clause to see whether any special characters are employed), nor is it easy to determine whether any join conditions have been mistakenly left out. Here’s the same query using the SQL92 join syntax:

mysql> SELECT a.account_id, a.cust_id, a.open_date, a.product_cd
    -> FROM account a INNER JOIN employee e
    ->   ON a.open_emp_id = e.emp_id
    ->   INNER JOIN branch b
    ->   ON e.assigned_branch_id = b.branch_id
    -> WHERE e.start_date < '2007-01-01'
    ->   AND (e.title = 'Teller' OR e.title = 'Head Teller')
    ->   AND b.name = 'Woburn Branch';
+------------+---------+------------+------------+
| account_id | cust_id | open_date  | product_cd |
+------------+---------+------------+------------+
|          1 |       1 | 2000-01-15 | CHK        |
|          2 |       1 | 2000-01-15 | SAV        |
|          3 |       1 | 2004-06-30 | CD         |
|          4 |       2 | 2001-03-12 | CHK        |
|          5 |       2 | 2001-03-12 | SAV        |
|         17 |       7 | 2004-01-12 | CD         |
|         27 |      11 | 2004-03-22 | BUS        |
+------------+---------+------------+------------+
7 rows in set (0.05 sec)

Hopefully, you will agree that the version using SQL92 join syntax is easier to understand.

Joining Three or More Tables

Joining three tables is similar to joining two tables, but with one slight wrinkle. With a two-table join, there are two tables and one join type in the from clause, and a single on subclause to define how the tables are joined. With a three-table join, there are three tables and two join types in the from clause, and two on subclauses. Here’s another example of a query with a two-table join:

mysql> SELECT a.account_id, c.fed_id
    -> FROM account a INNER JOIN customer c
    ->   ON a.cust_id = c.cust_id
    -> WHERE c.cust_type_cd = 'B';
+------------+------------+
| account_id | fed_id     |
+------------+------------+
|         24 | 04-1111111 |
|         25 | 04-1111111 |
|         27 | 04-2222222 |
|         28 | 04-3333333 |
|         29 | 04-4444444 |
+------------+------------+
5 rows in set (0.15 sec)

This query, which returns the account ID and federal tax number for all business accounts, should look fairly straightforward by now. If, however, you add the employee table to the query to also retrieve the name of the teller who opened each account, it looks as follows:

mysql> SELECT a.account_id, c.fed_id, e.fname, e.lname
    -> FROM account a INNER JOIN customer c
    ->   ON a.cust_id = c.cust_id
    ->   INNER JOIN employee e
    ->   ON a.open_emp_id = e.emp_id
    -> WHERE c.cust_type_cd = 'B';
+------------+------------+---------+---------+
| account_id | fed_id     | fname   | lname   |
+------------+------------+---------+---------+
|         24 | 04-1111111 | Theresa | Markham |
|         25 | 04-1111111 | Theresa | Markham |
|         27 | 04-2222222 | Paula   | Roberts |
|         28 | 04-3333333 | Theresa | Markham |
|         29 | 04-4444444 | John    | Blake   |
+------------+------------+---------+---------+
5 rows in set (0.00 sec)

Now three tables, two join types, and two on subclauses are listed in the from clause, so things have gotten quite a bit busier. At first glance, the order in which the tables are named might cause you to think that the employee table is being joined to the customer table, since the account table is named first, followed by the customer table, and then the employee table. If you switch the order in which the first two tables appear, however, you will get the exact same results:

mysql> SELECT a.account_id, c.fed_id, e.fname, e.lname
    -> FROM customer c INNER JOIN account a
    ->   ON a.cust_id = c.cust_id
    ->   INNER JOIN employee e
    ->   ON a.open_emp_id = e.emp_id
    -> WHERE c.cust_type_cd = 'B';
+------------+------------+---------+---------+
| account_id | fed_id     | fname   | lname   |
+------------+------------+---------+---------+
|         24 | 04-1111111 | Theresa | Markham |
|         25 | 04-1111111 | Theresa | Markham |
|         27 | 04-2222222 | Paula   | Roberts |
|         28 | 04-3333333 | Theresa | Markham |
|         29 | 04-4444444 | John    | Blake   |
+------------+------------+---------+---------+
5 rows in set (0.09 sec)

The customer table is now listed first, followed by the account table and then the employee table. Since the on subclauses haven’t changed, the results are the same. For the sake of completeness, here’s the same query one last time, but with the table order completely reversed (employee to account to customer):

mysql> SELECT a.account_id, c.fed_id, e.fname, e.lname
    -> FROM employee e INNER JOIN account a
    ->   ON e.emp_id = a.open_emp_id
    ->   INNER JOIN customer c
    ->   ON a.cust_id = c.cust_id
    -> WHERE c.cust_type_cd = 'B';
+------------+------------+---------+---------+
| account_id | fed_id     | fname   | lname   |
+------------+------------+---------+---------+
|         24 | 04-1111111 | Theresa | Markham |
|         25 | 04-1111111 | Theresa | Markham |
|         27 | 04-2222222 | Paula   | Roberts |
|         28 | 04-3333333 | Theresa | Markham |
|         29 | 04-4444444 | John    | Blake   |
+------------+------------+---------+---------+
5 rows in set (0.00 sec)

If you are confused about why all three versions of the account/employee/customer query yield the same results, keep in mind that SQL is a nonprocedural language, meaning that you describe what you want to retrieve and which database objects need to be involved, but it is up to the database server to determine how best to execute your query. Using statistics gathered from your database objects, the server must pick one of three tables as a starting point (the chosen table is thereafter known as the driving table), and then decide in which order to join the remaining tables. Therefore, the order in which tables appear in your from clause is not significant.

If, however, you believe that the tables in your query should always be joined in a particular order, you can place the tables in the desired order and then specify the keyword STRAIGHT_JOIN in MySQL, request the FORCE ORDER option in SQL Server, or use either the ORDERED or the LEADING optimizer hint in Oracle Database. For example, to tell the MySQL server to use the customer table as the driving table and to then join the account and employee tables, you could do the following:

mysql> SELECT STRAIGHT_JOIN a.account_id, c.fed_id, e.fname, e.lname
    -> FROM customer c INNER JOIN account a
    ->   ON a.cust_id = c.cust_id
    ->   INNER JOIN employee e
    ->   ON a.open_emp_id = e.emp_id
    -> WHERE c.cust_type_cd = 'B';

One way to think of a query that uses three or more tables is as a snowball rolling down a hill. The first two tables get the ball rolling, and each subsequent table gets tacked on to the snowball as it heads downhill. You can think of the snowball as the intermediate result set, which is picking up more and more columns as subsequent tables are joined. Therefore, the employee table is not really being joined to the account table, but rather the intermediate result set created when the customer and account tables were joined. (In case you were wondering why I chose a snowball analogy, I wrote this chapter in the midst of a New England winter: 110 inches so far, and more coming tomorrow. Oh joy.)

Using Subqueries As Tables

You have already seen several examples of queries that use three tables, but there is one variation worth mentioning: what to do if some of the data sets are generated by subqueries. Subqueries is the focus of Chapter 9, but I already introduced the concept of a subquery in the from clause in the previous chapter. Here’s another version of an earlier query (find all accounts opened by experienced tellers currently assigned to the Woburn branch) that joins the account table to subqueries against the branch and employee tables:

1 SELECT a.account_id, a.cust_id, a.open_date, a.product_cd
2 FROM account a INNER JOIN
3   (SELECT emp_id, assigned_branch_id
4     FROM employee
5     WHERE start_date < '2007-01-01'
6       AND (title = 'Teller' OR title = 'Head Teller')) e
7   ON a.open_emp_id = e.emp_id
8   INNER JOIN
9    (SELECT branch_id
10     FROM branch
11     WHERE name = 'Woburn Branch') b
12   ON e.assigned_branch_id = b.branch_id;

The first subquery, which starts on line 3 and is given the alias e, finds all experienced tellers. The second subquery, which starts on line 9 and is given the alias b, finds the ID of the Woburn branch. First, the account table is joined to the experienced-teller subquery using the employee ID and then the table that results is joined to the Woburn branch subquery using the branch ID. The results are the same as those of the previous version of the query (try it and see for yourself), but the queries look very different from one another.

There isn’t really anything shocking here, but it might take a minute to figure out what’s going on. Notice, for example, the lack of a where clause in the main query; since all the filter conditions are against the employee and branch tables, the filter conditions are all inside the subqueries, so there is no need for any filter conditions in the main query. One way to visualize what is going on is to run the subqueries and look at the result sets. Here are the results of the first subquery against the employee table:

mysql> SELECT emp_id, assigned_branch_id
    -> FROM employee
    -> WHERE start_date < '2007-01-01'
    ->   AND (title = 'Teller' OR title = 'Head Teller');
+--------+--------------------+
| emp_id | assigned_branch_id |
+--------+--------------------+
|      8 |                  1 |
|      9 |                  1 |
|     10 |                  2 |
|     11 |                  2 |
|     13 |                  3 |
|     14 |                  3 |
|     16 |                  4 |
|     17 |                  4 |
|     18 |                  4 |
+--------+--------------------+
9 rows in set (0.03 sec)

Thus, this result set consists of a set of employee IDs and their corresponding branch IDs. When they are joined to the account table via the emp_id column, you now have an intermediate result set consisting of all rows from the account table with the additional column holding the branch ID of the employee that opened each account. Here are the results of the second subquery against the branch table:

mysql> SELECT branch_id
    -> FROM branch
    -> WHERE name = 'Woburn Branch';
+-----------+
| branch_id |
+-----------+
|         2 |
+-----------+
1 row in set (0.02 sec)

This query returns a single row containing a single column: the ID of the Woburn branch. This table is joined to the assigned_branch_id column of the intermediate result set, causing all accounts opened by non-Woburn-based employees to be filtered out of the final result set.

Using the Same Table Twice

If you are joining multiple tables, you might find that you need to join the same table more than once. In the sample database, for example, there are foreign keys to the branch table from both the account table (the branch at which the account was opened) and the employee table (the branch at which the employee works). If you want to include both branches in your result set, you can include the branch table twice in the from clause, joined once to the employee table and once to the account table. For this to work, you will need to give each instance of the branch table a different alias so that the server knows which one you are referring to in the various clauses, as in:

mysql> SELECT a.account_id, e.emp_id,
    ->   b_a.name open_branch, b_e.name emp_branch
    -> FROM account a INNER JOIN branch b_a
    ->   ON a.open_branch_id = b_a.branch_id
    ->   INNER JOIN employee e
    ->   ON a.open_emp_id = e.emp_id
    ->   INNER JOIN branch b_e
    ->   ON e.assigned_branch_id = b_e.branch_id
    -> WHERE a.product_cd = 'CHK';
+------------+--------+---------------+---------------+
| account_id | emp_id | open_branch   | emp_branch    |
+------------+--------+---------------+---------------+
|         10 |      1 | Headquarters  | Headquarters  |
|         14 |      1 | Headquarters  | Headquarters  |
|         21 |      1 | Headquarters  | Headquarters  |
|          1 |     10 | Woburn Branch | Woburn Branch |
|          4 |     10 | Woburn Branch | Woburn Branch |
|          7 |     13 | Quincy Branch | Quincy Branch |
|         13 |     16 | So. NH Branch | So. NH Branch |
|         18 |     16 | So. NH Branch | So. NH Branch |
|         24 |     16 | So. NH Branch | So. NH Branch |
|         28 |     16 | So. NH Branch | So. NH Branch |
+------------+--------+---------------+---------------+
10 rows in set (0.16 sec)

This query shows who opened each checking account, what branch it was opened at, and to which branch the employee who opened the account is currently assigned. The branch table is included twice, with aliases b_a and b_e. By assigning different aliases to each instance of the branch table, the server is able to understand which instance you are referring to: the one joined to the account table, or the one joined to the employee table. Therefore, this is one example of a query that requires the use of table aliases.

Self-Joins

Not only can you include the same table more than once in the same query, but you can actually join a table to itself. This might seem like a strange thing to do at first, but there are valid reasons for doing so. The employee table, for example, includes a self-referencing foreign key, which means that it includes a column (superior_emp_id) that points to the primary key within the same table. This column points to the employee’s manager (unless the employee is the head honcho, in which case the column is null). Using a self-join, you can write a query that lists every employee’s name along with the name of his or her manager:

mysql> SELECT e.fname, e.lname,
    ->   e_mgr.fname mgr_fname, e_mgr.lname mgr_lname
    -> FROM employee e INNER JOIN employee e_mgr
    ->   ON e.superior_emp_id = e_mgr.emp_id;
+----------+-----------+-----------+-----------+
| fname    | lname     | mgr_fname | mgr_lname |
+----------+-----------+-----------+-----------+
| Susan    | Barker    | Michael   | Smith     |
| Robert   | Tyler     | Michael   | Smith     |
| Susan    | Hawthorne | Robert    | Tyler     |
| John     | Gooding   | Susan     | Hawthorne |
| Helen    | Fleming   | Susan     | Hawthorne |
| Chris    | Tucker    | Helen     | Fleming   |
| Sarah    | Parker    | Helen     | Fleming   |
| Jane     | Grossman  | Helen     | Fleming   |
| Paula    | Roberts   | Susan     | Hawthorne |
| Thomas   | Ziegler   | Paula     | Roberts   |
| Samantha | Jameson   | Paula     | Roberts   |
| John     | Blake     | Susan     | Hawthorne |
| Cindy    | Mason     | John      | Blake     |
| Frank    | Portman   | John      | Blake     |
| Theresa  | Markham   | Susan     | Hawthorne |
| Beth     | Fowler    | Theresa   | Markham   |
| Rick     | Tulman    | Theresa   | Markham   |
+----------+-----------+-----------+-----------+
17 rows in set (0.00 sec)

This query includes two instances of the employee table: one to provide employee names (with the table alias e), and the other to provide manager names (with the table alias e_mgr). The on subclause uses these aliases to join the employee table to itself via the superior_emp_id foreign key. This is another example of a query for which table aliases are required; otherwise, the server wouldn’t know whether you are referring to an employee or an employee’s manager.

While there are 18 rows in the employee table, the query returned only 17 rows; the president of the bank, Michael Smith, has no superior (his superior_emp_id column is null), so the join failed for his row. To include Michael Smith in the result set, you would need to use an outer join, which we cover in Chapter 10.

Equi-Joins Versus Non-Equi-Joins

All of the multitable queries shown thus far have employed equi-joins, meaning that values from the two tables must match for the join to succeed. An equi-join always employs an equals sign, as in:

ON e.assigned_branch_id = b.branch_id

While the majority of your queries will employ equi-joins, you can also join your tables via ranges of values, which are referred to as non-equi-joins. Here’s an example of a query that joins by a range of values:

SELECT e.emp_id, e.fname, e.lname, e.start_date
FROM employee e INNER JOIN product p
  ON e.start_date >= p.date_offered
  AND e.start_date <= p.date_retired
WHERE p.name = 'no-fee checking';

This query joins two tables that have no foreign key relationships. The intent is to find all employees who began working for the bank while the No-Fee Checking product was being offered. Thus, an employee’s start date must be between the date the product was offered and the date the product was retired.

You may also find a need for a self-non-equi-join, meaning that a table is joined to itself using a non-equi-join. For example, let’s say that the operations manager has decided to have a chess tournament for all bank tellers. You have been asked to create a list of all the pairings. You might try joining the employee table to itself for all tellers (title = 'Teller') and return all rows where the emp_ids don’t match (since a person can’t play chess against himself):

mysql> SELECT e1.fname, e1.lname, 'VS' vs, e2.fname, e2.lname
    -> FROM employee e1 INNER JOIN employee e2
    ->   ON e1.emp_id != e2.emp_id
    -> WHERE e1.title = 'Teller' AND e2.title = 'Teller';
+----------+----------+----+----------+----------+
| fname    | lname    | vs | fname    | lname    |
+----------+----------+----+----------+----------+
| Sarah    | Parker   | VS | Chris    | Tucker   |
| Jane     | Grossman | VS | Chris    | Tucker   |
| Thomas   | Ziegler  | VS | Chris    | Tucker   |
| Samantha | Jameson  | VS | Chris    | Tucker   |
| Cindy    | Mason    | VS | Chris    | Tucker   |
| Frank    | Portman  | VS | Chris    | Tucker   |
| Beth     | Fowler   | VS | Chris    | Tucker   |
| Rick     | Tulman   | VS | Chris    | Tucker   |
| Chris    | Tucker   | VS | Sarah    | Parker   |
| Jane     | Grossman | VS | Sarah    | Parker   |
| Thomas   | Ziegler  | VS | Sarah    | Parker   |
| Samantha | Jameson  | VS | Sarah    | Parker   |
| Cindy    | Mason    | VS | Sarah    | Parker   |
| Frank    | Portman  | VS | Sarah    | Parker   |
| Beth     | Fowler   | VS | Sarah    | Parker   |
| Rick     | Tulman   | VS | Sarah    | Parker   |
...
| Chris    | Tucker   | VS | Rick     | Tulman   |
| Sarah    | Parker   | VS | Rick     | Tulman   |
| Jane     | Grossman | VS | Rick     | Tulman   |
| Thomas   | Ziegler  | VS | Rick     | Tulman   |
| Samantha | Jameson  | VS | Rick     | Tulman   |
| Cindy    | Mason    | VS | Rick     | Tulman   |
| Frank    | Portman  | VS | Rick     | Tulman   |
| Beth     | Fowler   | VS | Rick     | Tulman   |
+----------+----------+----+----------+----------+
72 rows in set (0.01 sec)

You’re on the right track, but the problem here is that for each pairing (e.g., Sarah Parker versus Chris Tucker), there is also a reverse pairing (e.g., Chris Tucker versus Sarah Parker). One way to achieve the desired results is to use the join condition e1.emp_id < e2.emp_id so that each teller is paired only with those tellers having a higher employee ID (you can also use e1.emp_id > e2.emp_id if you wish):

mysql> SELECT e1.fname, e1.lname, 'VS' vs, e2.fname, e2.lname
    -> FROM employee e1 INNER JOIN employee e2
    ->   ON e1.emp_id < e2.emp_id
    -> WHERE e1.title = 'Teller' AND e2.title = 'Teller';
+----------+----------+----+----------+----------+
| fname    | lname    | vs | fname    | lname    |
+----------+----------+----+----------+----------+
| Chris    | Tucker   | VS | Sarah    | Parker   |
| Chris    | Tucker   | VS | Jane     | Grossman |
| Sarah    | Parker   | VS | Jane     | Grossman |
| Chris    | Tucker   | VS | Thomas   | Ziegler  |
| Sarah    | Parker   | VS | Thomas   | Ziegler  |
| Jane     | Grossman | VS | Thomas   | Ziegler  |
| Chris    | Tucker   | VS | Samantha | Jameson  |
| Sarah    | Parker   | VS | Samantha | Jameson  |
| Jane     | Grossman | VS | Samantha | Jameson  |
| Thomas   | Ziegler  | VS | Samantha | Jameson  |
| Chris    | Tucker   | VS | Cindy    | Mason    |
| Sarah    | Parker   | VS | Cindy    | Mason    |
| Jane     | Grossman | VS | Cindy    | Mason    |
| Thomas   | Ziegler  | VS | Cindy    | Mason    |
| Samantha | Jameson  | VS | Cindy    | Mason    |
| Chris    | Tucker   | VS | Frank    | Portman  |
| Sarah    | Parker   | VS | Frank    | Portman  |
| Jane     | Grossman | VS | Frank    | Portman  |
| Thomas   | Ziegler  | VS | Frank    | Portman  |
| Samantha | Jameson  | VS | Frank    | Portman  |
| Cindy    | Mason    | VS | Frank    | Portman  |
| Chris    | Tucker   | VS | Beth     | Fowler   |
| Sarah    | Parker   | VS | Beth     | Fowler   |
| Jane     | Grossman | VS | Beth     | Fowler   |
| Thomas   | Ziegler  | VS | Beth     | Fowler   |
| Samantha | Jameson  | VS | Beth     | Fowler   |
| Cindy    | Mason    | VS | Beth     | Fowler   |
| Frank    | Portman  | VS | Beth     | Fowler   |
| Chris    | Tucker   | VS | Rick     | Tulman   |
| Sarah    | Parker   | VS | Rick     | Tulman   |
| Jane     | Grossman | VS | Rick     | Tulman   |
| Thomas   | Ziegler  | VS | Rick     | Tulman   |
| Samantha | Jameson  | VS | Rick     | Tulman   |
| Cindy    | Mason    | VS | Rick     | Tulman   |
| Frank    | Portman  | VS | Rick     | Tulman   |
| Beth     | Fowler   | VS | Rick     | Tulman   |
+----------+----------+----+----------+----------+
36 rows in set (0.00 sec)

You now have a list of 36 pairings, which is the correct number when choosing pairs of 9 distinct things.

Join Conditions Versus Filter Conditions

You are now familiar with the concept that join conditions belong in the on subclause, while filter conditions belong in the where clause. However, SQL is flexible as to where you place your conditions, so you will need to take care when constructing your queries. For example, the following query joins two tables using a single join condition, and also includes a single filter condition in the where clause:

mysql> SELECT a.account_id, a.product_cd, c.fed_id
    -> FROM account a INNER JOIN customer c
    ->   ON a.cust_id = c.cust_id
    -> WHERE c.cust_type_cd = 'B';
+------------+------------+------------+
| account_id | product_cd | fed_id     |
+------------+------------+------------+
|         24 | CHK        | 04-1111111 |
|         25 | BUS        | 04-1111111 |
|         27 | BUS        | 04-2222222 |
|         28 | CHK        | 04-3333333 |
|         29 | SBL        | 04-4444444 |
+------------+------------+------------+
5 rows in set (0.01 sec)

That was pretty straightforward, but what happens if you mistakenly put the filter condition in the on subclause instead of in the where clause?

mysql> SELECT a.account_id, a.product_cd, c.fed_id
    -> FROM account a INNER JOIN customer c
    ->   ON a.cust_id = c.cust_id
    ->     AND c.cust_type_cd = 'B';
+------------+------------+------------+
| account_id | product_cd | fed_id     |
+------------+------------+------------+
|         24 | CHK        | 04-1111111 |
|         25 | BUS        | 04-1111111 |
|         27 | BUS        | 04-2222222 |
|         28 | CHK        | 04-3333333 |
|         29 | SBL        | 04-4444444 |
+------------+------------+------------+
5 rows in set (0.01 sec)

As you can see, the second version, which has both conditions in the on subclause and has no where clause, generates the same results. What if both conditions are placed in the where clause but the from clause still uses the ANSI join syntax?

mysql> SELECT a.account_id, a.product_cd, c.fed_id
    -> FROM account a INNER JOIN customer c
    -> WHERE a.cust_id = c.cust_id
    ->   AND c.cust_type_cd = 'B';
+------------+------------+------------+
| account_id | product_cd | fed_id     |
+------------+------------+------------+
|         24 | CHK        | 04-1111111 |
|         25 | BUS        | 04-1111111 |
|         27 | BUS        | 04-2222222 |
|         28 | CHK        | 04-3333333 |
|         29 | SBL        | 04-4444444 |
+------------+------------+------------+
5 rows in set (0.01 sec)

Once again, the MySQL server has generated the same result set. It will be up to you to put your conditions in the proper place so that your queries are easy to understand and maintain.

Test Your Knowledge

The following exercises are designed to test your understanding of inner joins. Please see Appendix C for the solutions to these exercises.

Exercise 5-1

Fill in the blanks (denoted by <#>) for the following query to obtain the results that follow:

mysql> SELECT e.emp_id, e.fname, e.lname, b.name
    -> FROM employee e INNER JOIN <1> b
    ->   ON e.assigned_branch_id = b.<2>;
+--------+----------+-----------+---------------+
| emp_id | fname    | lname     | name          |
+--------+----------+-----------+---------------+
|      1 | Michael  | Smith     | Headquarters  |
|      2 | Susan    | Barker    | Headquarters  |
|      3 | Robert   | Tyler     | Headquarters  |
|      4 | Susan    | Hawthorne | Headquarters  |
|      5 | John     | Gooding   | Headquarters  |
|      6 | Helen    | Fleming   | Headquarters  |
|      7 | Chris    | Tucker    | Headquarters  |
|      8 | Sarah    | Parker    | Headquarters  |
|      9 | Jane     | Grossman  | Headquarters  |
|     10 | Paula    | Roberts   | Woburn Branch |
|     11 | Thomas   | Ziegler   | Woburn Branch |
|     12 | Samantha | Jameson   | Woburn Branch |
|     13 | John     | Blake     | Quincy Branch |
|     14 | Cindy    | Mason     | Quincy Branch |
|     15 | Frank    | Portman   | Quincy Branch |
|     16 | Theresa  | Markham   | So. NH Branch |
|     17 | Beth     | Fowler    | So. NH Branch |
|     18 | Rick     | Tulman    | So. NH Branch |
+--------+----------+-----------+---------------+
18 rows in set (0.03 sec)

Exercise 5-2

Write a query that returns the account ID for each nonbusiness customer (customer.cust_type_cd = 'I') with the customer’s federal ID (customer.fed_id) and the name of the product on which the account is based (product.name).

Exercise 5-3

Construct a query that finds all employees whose supervisor is assigned to a different department. Retrieve the employees’ ID, first name, and last name.