Back in Chapter 2, I demonstrated how
related concepts are broken into separate pieces through a process known as
normalization. The end result of this exercise was two tables: person and favorite_food. If, however,
you want to generate a single report showing a person’s name, address,
and favorite foods, you will need a mechanism to bring the data
from these two tables back together again; this mechanism is known as a
join, and this chapter concentrates on the simplest and most
common join, the inner join. Chapter 10
demonstrates all of the different join types.
Queries against a single table are certainly not rare, but you will find that most
of your queries will require two, three, or even more tables. To illustrate, let’s
look at the definitions for the employee and
department tables and then define a query
that retrieves data from both tables:
mysql>DESC employee;+--------------------+----------------------+------+-----+---------+ | Field | Type | Null | Key | Default | +--------------------+----------------------+------+-----+---------+ | emp_id | smallint(5) unsigned | NO | PRI | NULL | | fname | varchar(20) | NO | | NULL | | lname | varchar(20) | NO | | NULL | | start_date | date | NO | | NULL | | end_date | date | YES | | NULL | | superior_emp_id | smallint(5) unsigned | YES | MUL | NULL | |dept_id| smallint(5) unsigned | YES | MUL | NULL | | title | varchar(20) | YES | | NULL | | assigned_branch_id | smallint(5) unsigned | YES | MUL | NULL | +--------------------+----------------------+------+-----+---------+ 9 rows in set (0.11 sec) mysql>DESC department;+---------+----------------------+------+-----+---------+ | Field | Type | Null | Key | Default | +---------+----------------------+------+-----+---------+ | dept_id | smallint(5) unsigned | No | PRI | NULL | | name | varchar(20) | No | | NULL | +---------+----------------------+------+-----+---------+ 2 rows in set (0.03 sec)
Let’s say you want to retrieve the first and last names of each employee along
with the name of the department to which each employee is assigned. Your query will
therefore need to retrieve the employee.fname,
employee.lname, and department.name columns. But how can you retrieve data from both
tables in the same query? The answer lies in the employee.dept_id column, which holds the ID of the department to
which each employee is assigned (in more formal terms, the employee.dept_id column is the foreign key to
the department table). The query, which you will
see shortly, instructs the server to use the employee.dept_id column as the bridge between
the employee and department tables, thereby allowing columns from
both tables to be included in the query’s result set. This type of operation is
known as a join.
The easiest way to start is to put the employee and department tables
into the from clause of a query and see what
happens. Here’s a query that retrieves the employee’s first and last names along
with the department name, with a from clause
naming both tables separated by the join
keyword:
mysql>SELECT e.fname, e.lname, d.name->FROM employee e JOIN department d;+----------+-----------+----------------+ | fname | lname | name | +----------+-----------+----------------+ | Michael | Smith | Operations | | Michael | Smith | Loans | | Michael | Smith | Administration | | Susan | Barker | Operations | | Susan | Barker | Loans | | Susan | Barker | Administration | | Robert | Tyler | Operations | | Robert | Tyler | Loans | | Robert | Tyler | Administration | | Susan | Hawthorne | Operations | | Susan | Hawthorne | Loans | | Susan | Hawthorne | Administration | | John | Gooding | Operations | | John | Gooding | Loans | | John | Gooding | Administration | | Helen | Fleming | Operations | | Helen | Fleming | Loans | | Helen | Fleming | Administration | | Chris | Tucker | Operations | | Chris | Tucker | Loans | | Chris | Tucker | Administration | | Sarah | Parker | Operations | | Sarah | Parker | Loans | | Sarah | Parker | Administration | | Jane | Grossman | Operations | | Jane | Grossman | Loans | | Jane | Grossman | Administration | | Paula | Roberts | Operations | | Paula | Roberts | Loans | | Paula | Roberts | Administration | | Thomas | Ziegler | Operations | | Thomas | Ziegler | Loans | | Thomas | Ziegler | Administration | | Samantha | Jameson | Operations | | Samantha | Jameson | Loans | | Samantha | Jameson | Administration | | John | Blake | Operations | | John | Blake | Loans | | John | Blake | Administration | | Cindy | Mason | Operations | | Cindy | Mason | Loans | | Cindy | Mason | Administration | | Frank | Portman | Operations | | Frank | Portman | Loans | | Frank | Portman | Administration | | Theresa | Markham | Operations | | Theresa | Markham | Loans | | Theresa | Markham | Administration | | Beth | Fowler | Operations | | Beth | Fowler | Loans | | Beth | Fowler | Administration | | Rick | Tulman | Operations | | Rick | Tulman | Loans | | Rick | Tulman | Administration | +----------+-----------+----------------+ 54 rows in set (0.23 sec)
Hmmm…there are only 18 employees and 3 different departments, so how did the result set end up with 54 rows? Looking more closely, you can see that the set of 18 employees is repeated three times, with all the data identical except for the department name. Because the query didn’t specify how the two tables should be joined, the database server generated the Cartesian product, which is every permutation of the two tables (18 employees × 3 departments = 54 permutations). This type of join is known as a cross join, and it is rarely used (on purpose, at least). Cross joins are one of the join types that we study in Chapter 10.
To modify the previous query so that only 18 rows are included in the result
set (one for each employee), you need to describe how the two tables are
related. Earlier, I showed that the employee.dept_id column serves as the link between the two
tables, so this information needs to be added to the on subclause of the from
clause:
mysql>SELECT e.fname, e.lname, d.name->FROM employee e JOIN department d->ON e.dept_id = d.dept_id;+----------+-----------+----------------+ | fname | lname | name | +----------+-----------+----------------+ | Michael | Smith | Administration | | Susan | Barker | Administration | | Robert | Tyler | Administration | | Susan | Hawthorne | Operations | | John | Gooding | Loans | | Helen | Fleming | Operations | | Chris | Tucker | Operations | | Sarah | Parker | Operations | | Jane | Grossman | Operations | | Paula | Roberts | Operations | | Thomas | Ziegler | Operations | | Samantha | Jameson | Operations | | John | Blake | Operations | | Cindy | Mason | Operations | | Frank | Portman | Operations | | Theresa | Markham | Operations | | Beth | Fowler | Operations | | Rick | Tulman | Operations | +----------+-----------+----------------+ 18 rows in set (0.00 sec)
Instead of 54 rows, you now have the expected 18 rows due to the addition of
the on subclause, which instructs the server
to join the employee and department tables by using the dept_id column to traverse from one table to the
other. For example, Susan Hawthorne’s row in the employee table contains a value of 1 in the dept_id column (not
shown in the example). The server uses this value to look up the row in the
department table
having a value of 1 in its dept_id column and then retrieves the value
'Operations' from the name column in that row.
If a value exists for the dept_id column in
one table but not the other, then the join fails for the
rows containing that value and those rows are excluded from the result set. This
type of join is known as an inner join, and it is the most
commonly used type of join. To clarify, if the department table contains a fourth row for the marketing
department, but no employees have been assigned to that department, then the
marketing department would not be included in the result set. Likewise, if some
of the employees had been assigned to department ID 99, which doesn’t exist in
the department table, then these employees
would be left out of the result set. If you want to include all rows from one
table or the other regardless of whether a match exists, you need to specify an
outer join, but we cover this later in the
book.
In the previous example, I did not specify in the from clause which type of join to use. However, when you wish to
join two tables using an inner join, you should explicitly specify this in your
from clause; here’s the same example,
with the addition of the join type (note the keyword INNER):
mysql>SELECT e.fname, e.lname, d.name->FROM employee e INNER JOIN department d->ON e.dept_id = d.dept_id;+----------+-----------+----------------+ | fname | lname | name | +----------+-----------+----------------+ | Michael | Smith | Administration | | Susan | Barker | Administration | | Robert | Tyler | Administration | | Susan | Hawthorne | Operations | | John | Gooding | Loans | | Helen | Fleming | Operations | | Chris | Tucker | Operations | | Sarah | Parker | Operations | | Jane | Grossman | Operations | | Paula | Roberts | Operations | | Thomas | Ziegler | Operations | | Samantha | Jameson | Operations | | John | Blake | Operations | | Cindy | Mason | Operations | | Frank | Portman | Operations | | Theresa | Markham | Operations | | Beth | Fowler | Operations | | Rick | Tulman | Operations | +----------+-----------+----------------+ 18 rows in set (0.00 sec)
If you do not specify the type of join, then the server will do an inner join by default. As you will see later in the book, however, there are several types of joins, so you should get in the habit of specifying the exact type of join that you require.
If the names of the columns used to join the two tables are identical, which
is true in the previous query, you can use the using subclause instead of the on subclause, as in:
mysql>SELECT e.fname, e.lname, d.name->FROM employee e INNER JOIN department d->USING (dept_id);+----------+-----------+----------------+ | fname | lname | name | +----------+-----------+----------------+ | Michael | Smith | Administration | | Susan | Barker | Administration | | Robert | Tyler | Administration | | Susan | Hawthorne | Operations | | John | Gooding | Loans | | Helen | Fleming | Operations | | Chris | Tucker | Operations | | Sarah | Parker | Operations | | Jane | Grossman | Operations | | Paula | Roberts | Operations | | Thomas | Ziegler | Operations | | Samantha | Jameson | Operations | | John | Blake | Operations | | Cindy | Mason | Operations | | Frank | Portman | Operations | | Theresa | Markham | Operations | | Beth | Fowler | Operations | | Rick | Tulman | Operations | +----------+-----------+----------------+ 18 rows in set (0.01 sec)
Since using is a shorthand notation that
you can use in only a specific situation, I prefer always to use the on subclause to avoid confusion.
The notation used throughout this book for joining tables was introduced in the SQL92 version of the ANSI SQL standard. All the major databases (Oracle Database, Microsoft SQL Server, MySQL, IBM DB2 Universal Database, and Sybase Adaptive Server) have adopted the SQL92 join syntax. Because most of these servers have been around since before the release of the SQL92 specification, they all include an older join syntax as well. For example, all these servers would understand the following variation of the previous query:
mysql>SELECT e.fname, e.lname, d.name->FROM employee e, department d->WHERE e.dept_id = d.dept_id;+----------+-----------+----------------+ | fname | lname | name | +----------+-----------+----------------+ | Michael | Smith | Administration | | Susan | Barker | Administration | | Robert | Tyler | Administration | | Susan | Hawthorne | Operations | | John | Gooding | Loans | | Helen | Fleming | Operations | | Chris | Tucker | Operations | | Sarah | Parker | Operations | | Jane | Grossman | Operations | | Paula | Roberts | Operations | | Thomas | Ziegler | Operations | | Samantha | Jameson | Operations | | John | Blake | Operations | | Cindy | Mason | Operations | | Frank | Portman | Operations | | Theresa | Markham | Operations | | Beth | Fowler | Operations | | Rick | Tulman | Operations | +----------+-----------+----------------+ 18 rows in set (0.01 sec)
This older method of specifying joins does not include the on subclause; instead, tables are named in the
from clause separated by commas, and join
conditions are included in the where clause.
While you may decide to ignore the SQL92 syntax in favor of the older join
syntax, the ANSI join syntax has the following advantages:
Join conditions and filter conditions are separated into two different
clauses (the on subclause and the
where clause, respectively),
making a query easier to understand.
The join conditions for each pair of tables are contained in their own
on clause, making it less likely
that part of a join will be mistakenly omitted.
Queries that use the SQL92 join syntax are portable across database servers, whereas the older syntax is slightly different across the different servers.
The benefits of the SQL92 join syntax are easier to identify for complex queries that include both join and filter conditions. Consider the following query, which returns all accounts opened by experienced tellers (hired prior to 2007) currently assigned to the Woburn branch:
mysql>SELECT a.account_id, a.cust_id, a.open_date, a.product_cd->FROM account a, branch b, employee e->WHERE a.open_emp_id = e.emp_id->AND e.start_date < '2007-01-01'->AND e.assigned_branch_id = b.branch_id->AND (e.title = 'Teller' OR e.title = 'Head Teller')->AND b.name = 'Woburn Branch';+------------+---------+------------+------------+ | account_id | cust_id | open_date | product_cd | +------------+---------+------------+------------+ | 1 | 1 | 2000-01-15 | CHK | | 2 | 1 | 2000-01-15 | SAV | | 3 | 1 | 2004-06-30 | CD | | 4 | 2 | 2001-03-12 | CHK | | 5 | 2 | 2001-03-12 | SAV | | 17 | 7 | 2004-01-12 | CD | | 27 | 11 | 2004-03-22 | BUS | +------------+---------+------------+------------+ 7 rows in set (0.00 sec)
With this query, it is not so easy to determine which conditions in the
where clause are join conditions and
which are filter conditions. It is also not readily apparent which type of join
is being employed (to identify the type of join, you would need to look closely
at the join conditions in the where clause to
see whether any special characters are employed), nor is it easy to determine
whether any join conditions have been mistakenly left out. Here’s the same query
using the SQL92 join syntax:
mysql>SELECT a.account_id, a.cust_id, a.open_date, a.product_cd->FROM account a INNER JOIN employee e->ON a.open_emp_id = e.emp_id->INNER JOIN branch b->ON e.assigned_branch_id = b.branch_id->WHERE e.start_date < '2007-01-01'->AND (e.title = 'Teller' OR e.title = 'Head Teller')->AND b.name = 'Woburn Branch';+------------+---------+------------+------------+ | account_id | cust_id | open_date | product_cd | +------------+---------+------------+------------+ | 1 | 1 | 2000-01-15 | CHK | | 2 | 1 | 2000-01-15 | SAV | | 3 | 1 | 2004-06-30 | CD | | 4 | 2 | 2001-03-12 | CHK | | 5 | 2 | 2001-03-12 | SAV | | 17 | 7 | 2004-01-12 | CD | | 27 | 11 | 2004-03-22 | BUS | +------------+---------+------------+------------+ 7 rows in set (0.05 sec)
Hopefully, you will agree that the version using SQL92 join syntax is easier to understand.
Joining three tables is similar to joining two tables, but with one slight
wrinkle. With a two-table join, there are two tables and one join type in the
from clause, and a single on subclause to define how the tables are joined. With
a three-table join, there are three tables and two join types in the from clause, and two on subclauses. Here’s another example of a query with a two-table
join:
mysql>SELECT a.account_id, c.fed_id->FROM account a INNER JOIN customer c->ON a.cust_id = c.cust_id->WHERE c.cust_type_cd = 'B';+------------+------------+ | account_id | fed_id | +------------+------------+ | 24 | 04-1111111 | | 25 | 04-1111111 | | 27 | 04-2222222 | | 28 | 04-3333333 | | 29 | 04-4444444 | +------------+------------+ 5 rows in set (0.15 sec)
This query, which returns the account ID and federal tax number for all business
accounts, should look fairly straightforward by now. If, however, you add the
employee table to the query to also retrieve
the name of the teller who opened each account, it looks as follows:
mysql>SELECT a.account_id, c.fed_id, e.fname, e.lname->FROM account a INNER JOIN customer c->ON a.cust_id = c.cust_id->INNER JOIN employee e->ON a.open_emp_id = e.emp_id->WHERE c.cust_type_cd = 'B';+------------+------------+---------+---------+ | account_id | fed_id | fname | lname | +------------+------------+---------+---------+ | 24 | 04-1111111 | Theresa | Markham | | 25 | 04-1111111 | Theresa | Markham | | 27 | 04-2222222 | Paula | Roberts | | 28 | 04-3333333 | Theresa | Markham | | 29 | 04-4444444 | John | Blake | +------------+------------+---------+---------+ 5 rows in set (0.00 sec)
Now three tables, two join types, and two on
subclauses are listed in the from clause, so
things have gotten quite a bit busier. At first glance, the order in which the
tables are named might cause you to think that the employee table is being joined to the customer table, since the account
table is named first, followed by the customer
table, and then the employee table. If you switch
the order in which the first two tables appear, however, you will get the exact same
results:
mysql>SELECT a.account_id, c.fed_id, e.fname, e.lname->FROM customer c INNER JOIN account a->ON a.cust_id = c.cust_id->INNER JOIN employee e->ON a.open_emp_id = e.emp_id->WHERE c.cust_type_cd = 'B';+------------+------------+---------+---------+ | account_id | fed_id | fname | lname | +------------+------------+---------+---------+ | 24 | 04-1111111 | Theresa | Markham | | 25 | 04-1111111 | Theresa | Markham | | 27 | 04-2222222 | Paula | Roberts | | 28 | 04-3333333 | Theresa | Markham | | 29 | 04-4444444 | John | Blake | +------------+------------+---------+---------+ 5 rows in set (0.09 sec)
The customer table is now listed first,
followed by the account table and then the
employee table. Since the on subclauses haven’t changed, the results are the
same. For the sake of completeness, here’s the same query one last time, but with
the table order completely reversed (employee to
account to customer):
mysql>SELECT a.account_id, c.fed_id, e.fname, e.lname->FROM employee e INNER JOIN account a->ON e.emp_id = a.open_emp_id->INNER JOIN customer c->ON a.cust_id = c.cust_id->WHERE c.cust_type_cd = 'B';+------------+------------+---------+---------+ | account_id | fed_id | fname | lname | +------------+------------+---------+---------+ | 24 | 04-1111111 | Theresa | Markham | | 25 | 04-1111111 | Theresa | Markham | | 27 | 04-2222222 | Paula | Roberts | | 28 | 04-3333333 | Theresa | Markham | | 29 | 04-4444444 | John | Blake | +------------+------------+---------+---------+ 5 rows in set (0.00 sec)
One way to think of a query that uses three or more tables is as a snowball
rolling down a hill. The first two tables get the ball rolling, and each subsequent
table gets tacked on to the snowball as it heads downhill. You can think of the
snowball as the intermediate result set, which is picking up
more and more columns as subsequent tables are joined. Therefore, the employee table is not really being joined to the
account table, but rather the intermediate
result set created when the customer and account tables were joined. (In case you were
wondering why I chose a snowball analogy, I wrote this chapter in the midst of a New
England winter: 110 inches so far, and more coming tomorrow. Oh joy.)
You have already seen several examples of queries that use three tables, but
there is one variation worth mentioning: what to do if some of the data sets are
generated by subqueries. Subqueries is the focus of Chapter 9, but I already introduced the concept of a subquery in the from clause in the previous chapter. Here’s
another version of an earlier query (find all accounts opened by experienced
tellers currently assigned to the Woburn branch) that joins the account table to subqueries against the branch and employee tables:
1 SELECT a.account_id, a.cust_id, a.open_date, a.product_cd 2 FROM account a INNER JOIN 3 (SELECT emp_id, assigned_branch_id 4 FROM employee 5 WHERE start_date < '2007-01-01' 6 AND (title = 'Teller' OR title = 'Head Teller')) e 7 ON a.open_emp_id = e.emp_id 8 INNER JOIN 9 (SELECT branch_id 10 FROM branch 11 WHERE name = 'Woburn Branch') b 12 ON e.assigned_branch_id = b.branch_id;
The first subquery, which starts on line 3 and is given the alias e, finds all experienced tellers. The second
subquery, which starts on line 9 and is given the alias b, finds the ID of the Woburn branch. First, the account table is joined to the experienced-teller
subquery using the employee ID and then the table that results is joined to the
Woburn branch subquery using the branch ID. The results are the same as those of
the previous version of the query (try it and see for yourself), but the queries
look very different from one another.
There isn’t really anything shocking here, but it might take a minute to
figure out what’s going on. Notice, for example, the lack of a where clause in the main query; since all the
filter conditions are against the employee
and branch tables, the filter conditions are
all inside the subqueries, so there is no need for any filter conditions in the
main query. One way to visualize what is going on is to run the subqueries and
look at the result sets. Here are the results of the first subquery against the
employee table:
mysql>SELECT emp_id, assigned_branch_id->FROM employee->WHERE start_date < '2007-01-01'->AND (title = 'Teller' OR title = 'Head Teller');+--------+--------------------+ | emp_id | assigned_branch_id | +--------+--------------------+ | 8 | 1 | | 9 | 1 | | 10 | 2 | | 11 | 2 | | 13 | 3 | | 14 | 3 | | 16 | 4 | | 17 | 4 | | 18 | 4 | +--------+--------------------+ 9 rows in set (0.03 sec)
Thus, this result set consists of a set of employee IDs and their
corresponding branch IDs. When they are joined to the account table via the emp_id
column, you now have an intermediate result set consisting of all rows from the
account table with the additional column
holding the branch ID of the employee that opened each account. Here are the
results of the second subquery against the branch table:
mysql>SELECT branch_id->FROM branch->WHERE name = 'Woburn Branch';+-----------+ | branch_id | +-----------+ | 2 | +-----------+ 1 row in set (0.02 sec)
This query returns a single row containing a single column: the ID of the
Woburn branch. This table is joined to the assigned_branch_id column of the intermediate result set, causing
all accounts opened by non-Woburn-based employees to be filtered out of the
final result set.
If you are joining multiple tables, you might find that you need to join the
same table more than once. In the sample database, for example, there are
foreign keys to the branch table from both
the account table (the branch at which the
account was opened) and the employee table
(the branch at which the employee works). If you want to include
both branches in your result set, you can include the
branch table twice in the from clause, joined once to the employee table and once to the account table. For this to work, you will need to
give each instance of the branch table a
different alias so that the server knows which one you are referring to in the
various clauses, as in:
mysql>SELECT a.account_id, e.emp_id,->b_a.name open_branch, b_e.name emp_branch->FROM account a INNER JOIN branch b_a->ON a.open_branch_id = b_a.branch_id->INNER JOIN employee e->ON a.open_emp_id = e.emp_id->INNER JOIN branch b_e->ON e.assigned_branch_id = b_e.branch_id->WHERE a.product_cd = 'CHK';+------------+--------+---------------+---------------+ | account_id | emp_id | open_branch | emp_branch | +------------+--------+---------------+---------------+ | 10 | 1 | Headquarters | Headquarters | | 14 | 1 | Headquarters | Headquarters | | 21 | 1 | Headquarters | Headquarters | | 1 | 10 | Woburn Branch | Woburn Branch | | 4 | 10 | Woburn Branch | Woburn Branch | | 7 | 13 | Quincy Branch | Quincy Branch | | 13 | 16 | So. NH Branch | So. NH Branch | | 18 | 16 | So. NH Branch | So. NH Branch | | 24 | 16 | So. NH Branch | So. NH Branch | | 28 | 16 | So. NH Branch | So. NH Branch | +------------+--------+---------------+---------------+ 10 rows in set (0.16 sec)
This query shows who opened each checking account, what branch it was opened
at, and to which branch the employee who opened the account is currently
assigned. The branch table is included twice,
with aliases b_a and b_e. By assigning different aliases to each instance of the
branch table, the server is able to
understand which instance you are referring to: the one joined to the account table, or the one joined to the employee table. Therefore, this is one example of
a query that requires the use of table aliases.
Not only can you include the same table more than once in the same query, but you
can actually join a table to itself. This might seem like a strange thing to do at
first, but there are valid reasons for doing so. The employee table, for example, includes a self-referencing
foreign key, which means that it includes a column (superior_emp_id) that points to the primary key within
the same table. This column points to the employee’s manager (unless the employee is
the head honcho, in which case the column is null). Using a self-join, you can write a query that
lists every employee’s name along with the name of his or her manager:
mysql>SELECT e.fname, e.lname,->e_mgr.fname mgr_fname, e_mgr.lname mgr_lname->FROM employee e INNER JOIN employee e_mgr->ON e.superior_emp_id = e_mgr.emp_id;+----------+-----------+-----------+-----------+ | fname | lname | mgr_fname | mgr_lname | +----------+-----------+-----------+-----------+ | Susan | Barker | Michael | Smith | | Robert | Tyler | Michael | Smith | | Susan | Hawthorne | Robert | Tyler | | John | Gooding | Susan | Hawthorne | | Helen | Fleming | Susan | Hawthorne | | Chris | Tucker | Helen | Fleming | | Sarah | Parker | Helen | Fleming | | Jane | Grossman | Helen | Fleming | | Paula | Roberts | Susan | Hawthorne | | Thomas | Ziegler | Paula | Roberts | | Samantha | Jameson | Paula | Roberts | | John | Blake | Susan | Hawthorne | | Cindy | Mason | John | Blake | | Frank | Portman | John | Blake | | Theresa | Markham | Susan | Hawthorne | | Beth | Fowler | Theresa | Markham | | Rick | Tulman | Theresa | Markham | +----------+-----------+-----------+-----------+ 17 rows in set (0.00 sec)
This query includes two instances of the employee table: one to provide employee names (with the table alias
e), and the other to provide manager names
(with the table alias e_mgr). The on subclause uses these aliases to join the employee table to itself via the superior_emp_id foreign key. This is another example
of a query for which table aliases are required; otherwise, the server wouldn’t know
whether you are referring to an employee or an employee’s manager.
While there are 18 rows in the employee table,
the query returned only 17 rows; the president of the bank, Michael Smith, has no
superior (his superior_emp_id column is null), so the join failed for his row. To include
Michael Smith in the result set, you would need to use an outer join, which we cover
in Chapter 10.
All of the multitable queries shown thus far have employed equi-joins, meaning that values from the two tables must match for the join to succeed. An equi-join always employs an equals sign, as in:
ON e.assigned_branch_id = b.branch_id
While the majority of your queries will employ equi-joins, you can also join your tables via ranges of values, which are referred to as non-equi-joins. Here’s an example of a query that joins by a range of values:
SELECT e.emp_id, e.fname, e.lname, e.start_date FROM employee e INNER JOIN product p ON e.start_date >= p.date_offered AND e.start_date <= p.date_retired WHERE p.name = 'no-fee checking';
This query joins two tables that have no foreign key relationships. The intent is to find all employees who began working for the bank while the No-Fee Checking product was being offered. Thus, an employee’s start date must be between the date the product was offered and the date the product was retired.
You may also find a need for a self-non-equi-join, meaning
that a table is joined to itself using a non-equi-join. For example, let’s say that
the operations manager has decided to have a chess tournament for all bank tellers.
You have been asked to create a list of all the pairings. You might try joining the
employee table to itself for all tellers
(title = 'Teller') and return all rows where
the emp_ids don’t match (since a person can’t
play chess against himself):
mysql>SELECT e1.fname, e1.lname, 'VS' vs, e2.fname, e2.lname->FROM employee e1 INNER JOIN employee e2->ON e1.emp_id != e2.emp_id->WHERE e1.title = 'Teller' AND e2.title = 'Teller';+----------+----------+----+----------+----------+ | fname | lname | vs | fname | lname | +----------+----------+----+----------+----------+ | Sarah | Parker | VS | Chris | Tucker | | Jane | Grossman | VS | Chris | Tucker | | Thomas | Ziegler | VS | Chris | Tucker | | Samantha | Jameson | VS | Chris | Tucker | | Cindy | Mason | VS | Chris | Tucker | | Frank | Portman | VS | Chris | Tucker | | Beth | Fowler | VS | Chris | Tucker | | Rick | Tulman | VS | Chris | Tucker | | Chris | Tucker | VS | Sarah | Parker | | Jane | Grossman | VS | Sarah | Parker | | Thomas | Ziegler | VS | Sarah | Parker | | Samantha | Jameson | VS | Sarah | Parker | | Cindy | Mason | VS | Sarah | Parker | | Frank | Portman | VS | Sarah | Parker | | Beth | Fowler | VS | Sarah | Parker | | Rick | Tulman | VS | Sarah | Parker | ... | Chris | Tucker | VS | Rick | Tulman | | Sarah | Parker | VS | Rick | Tulman | | Jane | Grossman | VS | Rick | Tulman | | Thomas | Ziegler | VS | Rick | Tulman | | Samantha | Jameson | VS | Rick | Tulman | | Cindy | Mason | VS | Rick | Tulman | | Frank | Portman | VS | Rick | Tulman | | Beth | Fowler | VS | Rick | Tulman | +----------+----------+----+----------+----------+ 72 rows in set (0.01 sec)
You’re on the right track, but the problem here is that for each pairing (e.g.,
Sarah Parker versus Chris Tucker), there is also a reverse pairing (e.g., Chris
Tucker versus Sarah Parker). One way to achieve the desired results is to use the
join condition e1.emp_id < e2.emp_id so
that each teller is paired only with those tellers having a higher employee ID (you
can also use e1.emp_id > e2.emp_id if you
wish):
mysql>SELECT e1.fname, e1.lname, 'VS' vs, e2.fname, e2.lname->FROM employee e1 INNER JOIN employee e2->ON e1.emp_id < e2.emp_id->WHERE e1.title = 'Teller' AND e2.title = 'Teller';+----------+----------+----+----------+----------+ | fname | lname | vs | fname | lname | +----------+----------+----+----------+----------+ | Chris | Tucker | VS | Sarah | Parker | | Chris | Tucker | VS | Jane | Grossman | | Sarah | Parker | VS | Jane | Grossman | | Chris | Tucker | VS | Thomas | Ziegler | | Sarah | Parker | VS | Thomas | Ziegler | | Jane | Grossman | VS | Thomas | Ziegler | | Chris | Tucker | VS | Samantha | Jameson | | Sarah | Parker | VS | Samantha | Jameson | | Jane | Grossman | VS | Samantha | Jameson | | Thomas | Ziegler | VS | Samantha | Jameson | | Chris | Tucker | VS | Cindy | Mason | | Sarah | Parker | VS | Cindy | Mason | | Jane | Grossman | VS | Cindy | Mason | | Thomas | Ziegler | VS | Cindy | Mason | | Samantha | Jameson | VS | Cindy | Mason | | Chris | Tucker | VS | Frank | Portman | | Sarah | Parker | VS | Frank | Portman | | Jane | Grossman | VS | Frank | Portman | | Thomas | Ziegler | VS | Frank | Portman | | Samantha | Jameson | VS | Frank | Portman | | Cindy | Mason | VS | Frank | Portman | | Chris | Tucker | VS | Beth | Fowler | | Sarah | Parker | VS | Beth | Fowler | | Jane | Grossman | VS | Beth | Fowler | | Thomas | Ziegler | VS | Beth | Fowler | | Samantha | Jameson | VS | Beth | Fowler | | Cindy | Mason | VS | Beth | Fowler | | Frank | Portman | VS | Beth | Fowler | | Chris | Tucker | VS | Rick | Tulman | | Sarah | Parker | VS | Rick | Tulman | | Jane | Grossman | VS | Rick | Tulman | | Thomas | Ziegler | VS | Rick | Tulman | | Samantha | Jameson | VS | Rick | Tulman | | Cindy | Mason | VS | Rick | Tulman | | Frank | Portman | VS | Rick | Tulman | | Beth | Fowler | VS | Rick | Tulman | +----------+----------+----+----------+----------+ 36 rows in set (0.00 sec)
You now have a list of 36 pairings, which is the correct number when choosing pairs of 9 distinct things.
You are now familiar with the concept that join conditions belong in the on subclause, while filter conditions belong in the
where clause. However, SQL is flexible as to
where you place your conditions, so you will need to take care when constructing
your queries. For example, the following query joins two tables using a single join
condition, and also includes a single filter condition in the where clause:
mysql>SELECT a.account_id, a.product_cd, c.fed_id->FROM account a INNER JOIN customer c->ON a.cust_id = c.cust_id->WHERE c.cust_type_cd = 'B';+------------+------------+------------+ | account_id | product_cd | fed_id | +------------+------------+------------+ | 24 | CHK | 04-1111111 | | 25 | BUS | 04-1111111 | | 27 | BUS | 04-2222222 | | 28 | CHK | 04-3333333 | | 29 | SBL | 04-4444444 | +------------+------------+------------+ 5 rows in set (0.01 sec)
That was pretty straightforward, but what happens if you mistakenly put the filter
condition in the on subclause instead of in the
where clause?
mysql>SELECT a.account_id, a.product_cd, c.fed_id->FROM account a INNER JOIN customer c->ON a.cust_id = c.cust_id->AND c.cust_type_cd = 'B';+------------+------------+------------+ | account_id | product_cd | fed_id | +------------+------------+------------+ | 24 | CHK | 04-1111111 | | 25 | BUS | 04-1111111 | | 27 | BUS | 04-2222222 | | 28 | CHK | 04-3333333 | | 29 | SBL | 04-4444444 | +------------+------------+------------+ 5 rows in set (0.01 sec)
As you can see, the second version, which has both conditions
in the on subclause and has no where clause, generates the same results. What if both
conditions are placed in the where clause but the
from clause still uses the ANSI join
syntax?
mysql>SELECT a.account_id, a.product_cd, c.fed_id->FROM account a INNER JOIN customer c->WHERE a.cust_id = c.cust_id->AND c.cust_type_cd = 'B';+------------+------------+------------+ | account_id | product_cd | fed_id | +------------+------------+------------+ | 24 | CHK | 04-1111111 | | 25 | BUS | 04-1111111 | | 27 | BUS | 04-2222222 | | 28 | CHK | 04-3333333 | | 29 | SBL | 04-4444444 | +------------+------------+------------+ 5 rows in set (0.01 sec)
Once again, the MySQL server has generated the same result set. It will be up to you to put your conditions in the proper place so that your queries are easy to understand and maintain.
The following exercises are designed to test your understanding of inner joins. Please see Appendix C for the solutions to these exercises.
Fill in the blanks (denoted by <#>) for the following query to obtain the results that
follow:
mysql>SELECT e.emp_id, e.fname, e.lname, b.name->FROM employee e INNER JOIN <1> b->ON e.assigned_branch_id = b.<2>;+--------+----------+-----------+---------------+ | emp_id | fname | lname | name | +--------+----------+-----------+---------------+ | 1 | Michael | Smith | Headquarters | | 2 | Susan | Barker | Headquarters | | 3 | Robert | Tyler | Headquarters | | 4 | Susan | Hawthorne | Headquarters | | 5 | John | Gooding | Headquarters | | 6 | Helen | Fleming | Headquarters | | 7 | Chris | Tucker | Headquarters | | 8 | Sarah | Parker | Headquarters | | 9 | Jane | Grossman | Headquarters | | 10 | Paula | Roberts | Woburn Branch | | 11 | Thomas | Ziegler | Woburn Branch | | 12 | Samantha | Jameson | Woburn Branch | | 13 | John | Blake | Quincy Branch | | 14 | Cindy | Mason | Quincy Branch | | 15 | Frank | Portman | Quincy Branch | | 16 | Theresa | Markham | So. NH Branch | | 17 | Beth | Fowler | So. NH Branch | | 18 | Rick | Tulman | So. NH Branch | +--------+----------+-----------+---------------+ 18 rows in set (0.03 sec)
Write a query that returns the account ID for each nonbusiness customer
(customer.cust_type_cd =
'I') with the customer’s federal ID (customer.fed_id) and the name of the product
on which the account is based (product.name).
Construct a query that finds all employees whose supervisor is assigned to a different department. Retrieve the employees’ ID, first name, and last name.