Since this book uses the MySQL server for all the examples, I thought it would be
useful for readers who are planning to continue using MySQL to include an appendix on
MySQL’s extensions to the SQL language. This appendix explores some of MySQL’s
extensions to the select, insert, update, and delete statements that can be very useful in certain
situations.
MySQL’s implementation of the select statement
includes two additional clauses, which are discussed in the following
subsections.
In some situations, you may not be interested in all of
the rows returned by a query. For example, you might construct a query that
returns all of the bank tellers along with the number of accounts opened by each
teller. If your reason for executing the query is to determine the top three
tellers so that they can receive an award from the bank, then you don’t
necessarily need to know who came in fourth, fifth, and so on. To help with
these types of situations, MySQL’s select
statement includes the limit clause, which
allows you to restrict the number of rows returned by a query.
To demonstrate the utility of the limit
clause, I will begin by constructing a query to show the number of accounts
opened by each bank teller:
mysql>SELECT open_emp_id, COUNT(*) how_many->FROM account->GROUP BY open_emp_id;+-------------+----------+ | open_emp_id | how_many | +-------------+----------+ | 1 | 8 | | 10 | 7 | | 13 | 3 | | 16 | 6 | +-------------+----------+ 4 rows in set (0.31 sec)
The results show that four different tellers opened accounts; if you want to
limit the result set to only three records, you can add a limit clause specifying that only three records be
returned:
mysql>SELECT open_emp_id, COUNT(*) how_many->FROM account->GROUP BY open_emp_id->LIMIT 3;+-------------+----------+ | open_emp_id | how_many | +-------------+----------+ | 1 | 8 | | 10 | 7 | | 13 | 3 | +-------------+----------+ 3 rows in set (0.06 sec)
Thanks to the limit clause (the fourth line
of the query), the result set now includes exactly three records, and the fourth
teller (employee ID 16) has been discarded from the result set.
While the previous query returns three records, there’s one small problem;
you haven’t described which three of the four records
you are interested in. If you are looking for three
specific records, such as the three tellers who
opened the most accounts, you will need to use the limit clause in concert with an order by clause, as in:
mysql>SELECT open_emp_id, COUNT(*) how_many->FROM account->GROUP BY open_emp_id->ORDER BY how_many DESC->LIMIT 3;+-------------+----------+ | open_emp_id | how_many | +-------------+----------+ | 1 | 8 | | 10 | 7 | | 16 | 6 | +-------------+----------+ 3 rows in set (0.03 sec)
The difference between this query and the previous query is that the
limit clause is now being applied to
an ordered set, resulting in the three tellers with the most opened accounts
being included in the final result set. Unless you are interested in seeing
only an arbitrary sample of records, you will generally want to use an
order by clause along with a limit clause.
The limit clause is applied after
all filtering, grouping, and ordering have occurred, so it will never
change the outcome of your select
statement other than restricting the number of records returned by the
statement.
Instead of finding the top three tellers, let’s say your goal is to
identify all but the top two tellers (instead of giving awards to top
performers, the bank will be sending some of the less-productive tellers to
assertiveness training). For these types of situations, the limit clause allows for an optional second
parameter; when two parameters are used, the first designates at which
record to begin adding records to the final result set, and the second
designates how many records to include. When specifying a record by number,
remember that MySQL designates the first record as record 0. Therefore, if
your goal is to find the third-best performer, you can do the
following:
mysql>SELECT open_emp_id, COUNT(*) how_many->FROM account->GROUP BY open_emp_id->ORDER BY how_many DESC->LIMIT 2, 1;+-------------+----------+ | open_emp_id | how_many | +-------------+----------+ | 16 | 6 | +-------------+----------+ 1 row in set (0.00 sec)
In this example, the zeroth and first records are discarded, and records
are included starting at the second record. Since the second parameter in
the limit clause is 1, only a single record is included.
If you want to start at the second position and include
all the remaining records, you can make the second
argument to the limit clause large enough
to guarantee that all remaining records are included. If you do not know how
many tellers opened new accounts, therefore, you might do something like the
following to find all but the top two performers:
mysql>SELECT open_emp_id, COUNT(*) how_many->FROM account->GROUP BY open_emp_id->ORDER BY how_many DESC->LIMIT 2, 999999999;+-------------+----------+ | open_emp_id | how_many | +-------------+----------+ | 16 | 6 | | 13 | 3 | +-------------+----------+ 2 rows in set (0.00 sec)
In this version of the query, the zeroth and first records are discarded, and up to 999,999,999 records are included starting at the second record (in this case, there are only two more, but it’s better to go a bit overboard rather than taking a chance on excluding valid records from your final result set because you underestimated).
When used in conjunction with an order
by clause, queries that include a limit clause can be called ranking
queries because they allow you to rank your data. While I
have demonstrated how to rank bank tellers by the number of opened accounts,
ranking queries are used to answer many different types of business
questions, such as:
Who are our top five salespeople for 2005?
Who has the third-most home runs in the history of baseball?
Other than The Holy Bible and Quotations from Chairman Mao, what are the next 98 best-selling books of all time?
What are our two worst-selling flavors of ice cream?
So far, I have shown how to find the top three tellers, the third-best teller, and all but the top two tellers. If I want to do something analogous to the fourth example (i.e., find the worst performers), I need only reverse the sort order so that the results proceed from lowest number of accounts opened to highest number of accounts opened, as in:
mysql>SELECT open_emp_id, COUNT(*) how_many->FROM account->GROUP BY open_emp_id->ORDER BY how_many ASC->LIMIT 2;+-------------+----------+ | open_emp_id | how_many | +-------------+----------+ | 13 | 3 | | 16 | 6 | +-------------+----------+ 2 rows in set (0.24 sec)
By simply changing the sort order (from ORDER BY
how_many DESC to ORDER BY how_many
ASC), the query now returns the two worst-performing tellers.
Therefore, by using a limit clause with
either an ascending or descending sort order, you can produce ranking
queries to answer most types of business questions.
If you want the output from your query to be written to a file, you could
highlight the query results, copy them to the buffer, and paste them into your
favorite editor. However, if the query’s result set is sufficiently large, or if
the query is being executed from within a script, you will need a way to write
the results to a file without your intervention. To aid in such situations,
MySQL includes the into outfile clause to
allow you to provide the name of a file into which the results will be written.
Here’s an example that writes the query results to a file in my c:\temp directory:
mysql>SELECT emp_id, fname, lname, start_date->INTO OUTFILE 'C:\\TEMP\\emp_list.txt'->FROM employee;Query OK, 18 rows affected (0.20 sec)
If you remember from Chapter 7, the backslash is used to escape another character within a string. If you’re a Windows user, therefore, you will need to enter two backslashes in a row when building pathnames.
Rather than showing the query results on the screen, the result set has been written to the emp_list.txt file, which looks as follows:
1 Michael Smith 2001-06-22 2 Susan Barker 2002-09-12 3 Robert Tyler 2000-02-09 4 Susan Hawthorne 2002-04-24 ... 16 Theresa Markham 2001-03-15 17 Beth Fowler 2002-06-29 18 Rick Tulman 2002-12-12
The default format uses tabs ('\t') between
columns and newlines ('\n') after each
record. If you want more control over the format of the data, several additional
subclauses are available with the into
outfile clause. For example, if you want the data to be in what is
referred to as pipe-delimited format, you can use the
fields subclause to ask that the '|' character be placed between each column, as
in:
mysql>SELECT emp_id, fname, lname, start_date->INTO OUTFILE 'C:\\TEMP\\emp_list_delim.txt'->FIELDS TERMINATED BY '|'->FROM employee;Query OK, 18 rows affected (0.02 sec)
MySQL does not allow you to overwrite an existing file when using into outfile, so you will need to remove an
existing file first if you run the same query more than once.
The contents of the emp_list_delim.txt file look as follows:
1|Michael|Smith|2001-06-22 2|Susan|Barker|2002-09-12 3|Robert|Tyler|2000-02-09 4|Susan|Hawthorne|2002-04-24 ... 16|Theresa|Markham|2001-03-15 17|Beth|Fowler|2002-06-29 18|Rick|Tulman|2002-12-12
Along with pipe-delimited format, you may need your data in
comma-delimited format, in which case you would use
fields terminated by ','. If the data
being written to a file includes strings, however, using commas as field
separators can prove problematic, as commas are much more likely to appear
within strings than the pipe character. Consider the following query, which
writes a number and two strings delimited by commas to the comma1.txt file:
mysql>SELECT data.num, data.str1, data.str2->INTO OUTFILE 'C:\\TEMP\\comma1.txt'->FIELDS TERMINATED BY ','->FROM->(SELECT 1 num, 'This string has no commas' str1,->'This string, however, has two commas' str2) data;Query OK, 1 row affected (0.04 sec)
Since the third column in the output file (str2) is a string containing commas, you might think that an
application attempting to read the comma1.txt file will encounter problems when parsing each line
into columns, but the MySQL server has made provisions for such situations. Here
are the contents of comma1.txt:
1,This string has no commas,This string\, however\, has two commas
As you can see, the commas within the third column have been escaped by
putting a backslash before the two commas embedded in the str2 column. If you run the same query but use
pipe-delimited format, the commas will not be escaped,
since they don’t need to be. If you want to use a different escape character,
such as using another comma, you can use the fields
escaped by subclause to specify the escape character to use for
your output file.
Along with specifying column separators, you can also specify the character
used to separate the different records in your datafile. If you would like each
record in the output file to be separated by something other than the newline
character, you can use the lines subclause,
as in:
mysql>SELECT emp_id, fname, lname, start_date->INTO OUTFILE 'C:\\TEMP\\emp_list_atsign.txt'->FIELDS TERMINATED BY '|'->LINES TERMINATED BY '@'->FROM employee;Query OK, 18 rows affected (0.03 sec)
Because I am not using a newline character between records, the emp_list_atsign.txt file looks like a single long
line of text when viewed, with each record separated by the '@' character:
1|Michael|Smith|2001-06-22@2|Susan|Barker|2002-09-12@3|Robert|Tyler|2000-02- 09@4|Susan|Hawthorne|2002-04-24@5|John|Gooding|2003-11-14@6|Helen|Fleming|2004-03- 17@7|Chris|Tucker|2004-09-15@8|Sarah|Parker|2002-12-02@9|Jane|Grossman|2002-05- 03@10|Paula|Roberts|2002-07-27@11|Thomas|Ziegler|2000-10-23@12|Samantha|Jameson|2003- 01-08@13|John|Blake|2000-05-11@14|Cindy|Mason|2002-08-09@15|Frank|Portman|2003-04- 01@16|Theresa|Markham|2001-03-15@17|Beth|Fowler|2002-06-29@18|Rick|Tulman|2002-12-12@
If you need to generate a datafile to be loaded into a spreadsheet application
or sent within or outside your organization, the into
outfile clause should provide enough flexibility for whatever file
format you need.
Let’s say that you have been asked to create a table to capture information about
which of the bank’s branches are visited by which customers. The table needs to
contain the customer’s ID, the branch’s ID, and a datetime column indicating the last time the customer visited the
branch. Rows are added to the table whenever a customer visits a certain branch, but
if the customer has already visited the branch, then the existing row should simply
have its datetime column updated. Here’s the
table definition:
CREATE TABLE branch_usage (branch_id SMALLINT UNSIGNED NOT NULL, cust_id INTEGER UNSIGNED NOT NULL, last_visited_on DATETIME, CONSTRAINT pk_branch_usage PRIMARY KEY (branch_id, cust_id) );
Along with the three column definitions, the branch_usage table defines a primary key constraint on the branch_id and cust_id columns. Therefore, the server will reject any row added to
the table whose branch/customer pair already exists in the table.
Let’s say that, after the table is in place, customer ID 5 visits the main branch
(branch ID 1) three times in the first week. After the first visit, you can insert a
record into the branch_usage table, since no
record exists yet for customer ID 5 and branch ID 1:
mysql>INSERT INTO branch_usage (branch_id, cust_id, last_visited_on)->VALUES (1, 5, CURRENT_TIMESTAMP());Query OK, 1 row affected (0.02 sec)
The next time the customer visits the same branch, however, you will need to update the existing record rather than inserting a new record; otherwise, you will receive the following error:
ERROR 1062 (23000): Duplicate entry '1-5' for key 1
To avoid this error, you can query the branch_usage table to see whether a given customer/branch pair exists and then either insert
a record if no record is found or update the existing row if it already exists. To
save you the trouble, however, the MySQL designers have extended the insert statement to allow you to specify that one or
more columns be modified if an insert statement
fails due to a duplicate key. The following statement instructs the server to modify
the last_visited_on column if the given customer
and branch already exist in the branch_usage
table:
mysql>INSERT INTO branch_usage (branch_id, cust_id, last_visited_on)->VALUES (1, 5, CURRENT_TIMESTAMP())->ON DUPLICATE KEY UPDATE last_visited_on = CURRENT_TIMESTAMP();Query OK, 2 rows affected (0.02 sec)
The on duplicate key clause allows this same
statement to be executed every time customer ID 5 conducts business in branch ID 1.
If run 100 times, the first execution results in a single row being added to the
table, and the next 99 executions result in the last_visited_on column being changed to the current time. This type
of operation is often referred to as an upsert, since it is a
combination of an update and an insert statement.
Earlier in the appendix, I showed you how to write queries using the limit clause in conjunction with an order by clause to generate rankings, such as the top
three tellers in terms of accounts opened. MySQL also allows the limit and order by
clauses to be used in both update and delete statements, thereby allowing you to modify or
remove specific rows in a table based on a ranking. For example, imagine that you
are asked to remove records from a table used to track customer logins to the bank’s
online banking system. The table, which tracks the customer ID and date/time of
login, looks as follows:
CREATE TABLE login_history (cust_id INTEGER UNSIGNED NOT NULL, login_date DATETIME, CONSTRAINT pk_login_history PRIMARY KEY (cust_id, login_date) );
The following statement populates the login_history table with some data by generating a cross join between
the account and customer tables and using the account’s open_date column as a basis for generating login dates:
mysql>INSERT INTO login_history (cust_id, login_date)->SELECT c.cust_id,->ADDDATE(a.open_date, INTERVAL a.account_id * c.cust_id HOUR)->FROM customer c CROSS JOIN account a;Query OK, 312 rows affected (0.03 sec) Records: 312 Duplicates: 0 Warnings: 0
The table is now populated with 312 rows of relatively random data. Your task is
to look at the data in the login_history table
once a month, generate a report for your manager showing who is using the online
banking system, and then delete all but the 50 most-recent records from the table.
One approach would be to write a query using order
by and limit to find the
50th most recent login, such as:
mysql>SELECT login_date->FROM login_history->ORDER BY login_date DESC->LIMIT 49,1;+---------------------+ | login_date | +---------------------+ | 2004-07-02 09:00:00 | +---------------------+ 1 row in set (0.00 sec)
Armed with this information, you can then construct a delete statement that removes all rows whose login_date column is less than the date returned by the query:
mysql>DELETE FROM login_history->WHERE login_date < '2004-07-02 09:00:00';Query OK, 262 rows affected (0.02 sec)
The table now contains the 50 most-recent logins. Using MySQL’s extensions,
however, you can achieve the same result with a single delete statement using limit and
order by clauses. After returning the
original 312 rows to the login_history table, you
can run the following:
mysql>DELETE FROM login_history->ORDER BY login_date ASC->LIMIT 262;Query OK, 262 rows affected (0.05 sec)
With this statement, the rows are sorted by login_date in ascending order, and then the first 262 rows are
deleted, leaving the 50 most recent rows.
In this example, I had to know the number of rows in the table to construct the limit clause (312 original rows − 50 remaining rows = 262 deletions). It would be better if you could sort the rows in descending order and tell the server to skip the first 50 rows and then delete the remaining rows, as in:
DELETE FROM login_history ORDER BY login_date DESC LIMIT 49, 9999999;
However, MySQL does not allow the optional second parameter when using the
limit clause in delete or update
statements.
Along with deleting data, you can use the limit
and order by clauses when modifying data as well.
For example, if the bank decides to add $100 to each of the 10 oldest accounts to
help retain loyal customers, you can do the following:
mysql>UPDATE account->SET avail_balance = avail_balance + 100->WHERE product_cd IN ('CHK', 'SAV', 'MM')->ORDER BY open_date ASC->LIMIT 10;Query OK, 10 rows affected (0.06 sec) Rows matched: 10 Changed: 10 Warnings: 0
This statement sorts accounts by the open date in ascending order and then modifies the first 10 records, which, in this case, are the 10 oldest accounts.
In certain situations, you might need to modify or delete data from several
different tables to perform a given task. If you discover that the bank’s database
contains a dummy customer left over from system testing, for example, you might need
to remove data from the account, customer, and individual tables.
For this section, I will create a set of clones for the account, customer, and individual
tables, called account2, customer2, and individual2. I am doing so both to protect the sample data from
being altered and to avoid any problems with foreign key constraints between the
tables (more on this later in the section). Here are the create table statements used to generate the three
clone tables:
CREATE TABLE individual2 AS SELECT * FROM individual; CREATE TABLE customer2 AS SELECT * FROM customer; CREATE TABLE account2 AS SELECT * FROM account;
If the customer ID of the dummy customer is 1, you could generate three individual
delete statements against each of the three
tables, as in:
DELETE FROM account2 WHERE cust_id = 1; DELETE FROM customer2 WHERE cust_id = 1; DELETE FROM individual2 WHERE cust_id = 1;
Instead of writing individual delete
statements, however, MySQL allows you to write a single
multitable
delete statement, which, in this case, looks as
follows:
mysql>DELETE account2, customer2, individual2->FROM account2 INNER JOIN customer2->ON account2.cust_id = customer2.cust_id->INNER JOIN individual2->ON customer2.cust_id = individual2.cust_id->WHERE individual2.cust_id = 1;Query OK, 5 rows affected (0.02 sec)
This statement removes a total of five rows, one from each of the individual2 and customer2 tables, and three from the account2 table (customer ID 1 has three accounts). The statement
comprises three separate clauses:
delete
Specifies the tables targeted for deletion.
from
Specifies the tables used to identify the rows to be deleted. This
clause is identical in form and function to the from clause in a select statement, and not all tables named herein need to
be included in the delete
clause.
where
Contains filter conditions used to identify the rows to be deleted.
The multitable delete statement looks a lot
like a select statement, except that a delete clause is used instead of a select clause. If you are deleting rows from a single
table using a multitable delete format, the
difference becomes even less noticeable. For example, here’s a select statement that finds the account IDs of all
accounts owned by John Hayward:
mysql>SELECT account2.account_id->FROM account2 INNER JOIN customer2->ON account2.cust_id = customer2.cust_id->INNER JOIN individual2->ON individual2.cust_id = customer2.cust_id->WHERE individual2.fname = 'John'->AND individual2.lname = 'Hayward';+------------+ | account_id | +------------+ | 8 | | 9 | | 10 | +------------+ 3 rows in set (0.01 sec)
If, after viewing the results, you decide to delete all three of John’s accounts
from the account2 table, you need only replace
the select clause in the previous query with a
delete clause naming the account2 table, as in:
mysql>DELETE account2->FROM account2 INNER JOIN customer2->ON account2.cust_id = customer2.cust_id->INNER JOIN individual2->ON customer2.cust_id = individual2.cust_id->WHERE individual2.fname = 'John'->AND individual2.lname = 'Hayward';Query OK, 3 rows affected (0.01 sec)
Hopefully, this gives you a better idea of what the delete and from clauses are used
for in a multitable delete statement. This
statement is functionally identical to the following single-table delete statement, which uses a subquery to identify
the customer ID of John Hayward:
DELETE FROM account2 WHERE cust_id = (SELECT cust_id FROM individual2 WHERE fname = 'John' AND lname = 'Hayward';
When using a multitable delete statement to
delete rows from a single table, you are simply choosing to use a querylike format
involving table joins rather than a traditional delete statement using subqueries. The real power of multitable
delete statements lies in the ability to
delete from multiple tables in a single statement, as I demonstrated in the first
statement in this section.
Along with the ability to delete rows from multiple tables, MySQL also gives you
the ability to modify rows in multiple tables using a
multitable update. Let’s say that your bank is merging with
another bank, and the databases from both banks have overlapping customer IDs. Your
management decides to fix the problem by incrementing each customer ID in your
database by 10,000 so that the second bank’s data can be safely imported. The
following statement shows how to modify the ID of customer ID 3 across the individual2, customer2, and account2 tables
using a single statement:
mysql>UPDATE individual2 INNER JOIN customer2->ON individual2.cust_id = customer2.cust_id->INNER JOIN account2->ON customer2.cust_id = account2.cust_id->SET individual2.cust_id = individual2.cust_id + 10000,->customer2.cust_id = customer2.cust_id + 10000,->account2.cust_id = account2.cust_id + 10000->WHERE individual2.cust_id = 3;Query OK, 4 rows affected (0.01 sec) Rows matched: 5 Changed: 4 Warnings: 0
This statement modifies four rows: one in each of the individual2 and customer2 tables,
and two in the account2 table. The multitable
update syntax is very similar to that of the
single-table update, except that the update clause contains multiple tables and their
corresponding join conditions rather than just naming a single table. Just like the
single-table update, the multitable version
includes a set clause, the difference being that
any tables referenced in the update clause may be
modified via the set clause.
If you are using the InnoDB storage engine, you will most likely not be able
to use multitable delete and update statements if the tables involved have
foreign key constraints. This is because the engine does not guarantee that the
changes will be applied in an order that won’t violate the constraints. Instead,
you should use multiple single-table statements in the proper order so that
foreign key constraints are not violated.