MySQL in a Nutshell, 2nd Edition

Name

SELECT

Synopsis

SELECT [flags] {*|column|expression}[, ...]
  FROM table[, ...] 
  [WHERE condition]
  [GROUP BY {column|expression|position}[ASC|DESC], ...
     [WITH ROLLUP]]
  [HAVING condition]
  [ORDER BY {column|expression|position}[ASC|DESC] , ...]
  [LIMIT {[offset,] count|count OFFSET offset}]
  [PROCEDURE procedure(arguments)]
  options

Use this statement to retrieve and display data from tables within a database. It has many clauses and options, but for simple data retrieval many of them can be omitted. The basic syntax for the statement is shown. After the SELECT keyword, some keywords to control the whole operation may be given. Next comes an asterisk to retrieve all columns, a list of columns to retrieve, or expressions returning values to display, separated by commas.

Data can be retrieved from one or more tables, given in a comma-separated list. If multiple tables are specified, other clauses must define how the tables are joined. The remaining clauses may be called on to refine the data to be retrieved, to order it, and so forth. These various keywords, options, and clauses are detailed in subsections of this statement explanation. To start, here is a simple example of how you can use the SELECT statement:

SELECT name_first, name_last, telephone_home,
DATEDIFF(now( ), last_review)
AS 'Days Since Last Review'
FROM employees;

In this example, three columns and the results of an expression based on a fourth column are to be displayed. The first and last name of each employee, each employee’s home telephone number, and the difference between the date of the employee’s last employment review and the date now are listed. This last field has the addition of the AS keyword to set the column heading of the results set, and to name an alias for the field. An alias may be referenced in subsequent clauses of the same statement (e.g., the ORDER BY clause). To select all columns in the table, the wildcard * can be given instead of the column names.

SELECT statement keywords

SELECT
[ALL|DISTINCT|DISTINCTROW]
[HIGH_PRIORITY] [STRAIGHT_JOIN]
[SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
[SQL_CACHE|SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
{*|column|expression}[, ...]
FROM table[, ...]
[WHERE condition] [other clauses] [options]

Between the initial SELECT keyword and list of columns and expressions, several keywords may be given. They are shown in the preceding syntax, with the other components of the statement abbreviated.

When a WHERE clause is used with the SELECT statement, rows in the results may contain duplicate data. If you want all rows that meet the selection conditions to be displayed, you may include the ALL keyword. This is the default, so it’s not necessary to give this keyword. If you want to display only the first occurrence of a row, include the DISTINCT keyword or its synonym DISTINCTROW. Here is an example:

SELECT DISTINCT dept 
FROM employees;

This statement will list the names of all departments for which we have employees listed in the employees table. Even though there are several employees in the same department, it will list only one row for each department.

By default, any UPDATE statements that are issued have priority over SELECT statements submitted by other client sessions at the same time; the updates are run first. To give a particular SELECT statement higher priority than any UPDATE statements, use the HIGH_PRIORITY keyword.

Multiple tables may be selected with the SELECT statement. The column on which they should be joined is given with the WHERE clause or the JOIN clause. The JOIN clause is described earlier in this chapter. For the purposes of this section, you just need to know that in order to optimize retrieval, MySQL might not join tables in the order that they are listed in the SQL statement. To insist on joining in the order given, you must use the STRAIGHT_JOIN keyword.

When you know that the results of a SELECT statement using the DISTINCT keyword or the GROUP BY clause (discussed later) will be small, you can use the SQL_SMALL_RESULT keyword. This will cause MySQL to use temporary tables, with a key based on the GROUP BY clause elements, to sort the results and possibly make for faster data retrieval. If you expect the results to be large, you can use the SQL_BIG_RESULT keyword. This will cause MySQL to use temporary tables on the filesystem. Regardless of whether you use DISTINCT or GROUP BY, the SQL_BUFFER_RESULT keyword may be given for any SELECT statement to have MySQL use a temporary table to buffer the results. You can use only one of the SQL_*_RESULT keywords in each statement.

If the MySQL server is not using the query cache by default, you can force its use by including the SQL_CACHE keyword. If the server does use the query cache by default, you can use the SQL_NO_CACHE to instruct MySQL not to use the cache for this particular SELECT statement. To determine whether the server uses query cache by default, enter SHOW VARIABLES LIKE 'query_cache_type';. A value of ON indicates that it is in use.

The last keyword available is SQL_CALC_FOUND_ROWS, which counts the number of rows that meet the conditions of the statement. This is not affected by a LIMIT clause. The results of this count must be retrieved in a separate SELECT statement with the FOUND_ROWS() function. See the end of this chapter for information on this function:

SELECT SQL_CALC_FOUND_ROWS 
name_first, name_last, telephone_home,
DATEDIFF(now( ), last_review)
AS 'Days Since Last Review'
FROM employees 
WHERE dept = 'sales'
ORDER BY last_review DESC
LIMIT 10;

SELECT FOUND_ROWS();

The first statement retrieves a list of sales people to review, limited to the 10 who have gone the longest without a performance review. The second gets a count of how many employees there are to review in the sales department.

Exporting SELECT results

SELECT [flags] {*|columns|expression}[, ...]
[INTO OUTFILE '/path/filename'
  [FIELDS TERMINATED BY 'character']
  [FIELDS ENCLOSED BY 'character']
  [ESCAPED BY 'character' ]
  [LINES [STARTING BY 'character'] [TERMINATED BY 'character']]
|INTO DUMPFILE '/path/filename'
|INTO 'variable'[, ...]
[FOR UPDATE|LOCK IN SHARE MODE]]
FROM table[, ...]
[WHERE condition]
[other clauses] [options]

The INTO clause is used to export data from a SELECT statement to an external text file or a variable. Only the results will be exported, not the column names or other information.

Various clauses set delimiter and control characters in the output:

ESCAPED BY: Character used to escape special characters in the output. The default is a backslash.
FIELDS ENCLOSED BY: Character to use before and after each field. By default, no character is used.
FIELDS TERMINATED BY: Character with which to separate fields. The default is a tab.
LINES STARTING BY: Character used to start each line. By default, no character is used.
LINES TERMINATED BY: Character used to end each line. The default is a newline character.

FILE privilege is necessary to use the INTO clause of the SELECT statement. This statement and clause combination is essentially the counterpart of the LOAD DATA INFILE statement. See the explanation of that statement earlier in this chapter for more details on the options for this clause. Here is an example of this clause and these options:

SELECT * FROM employees
INTO OUTFILE '/tmp/employees.txt'
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
ESCAPED BY '\\';

The text file created by this SQL statement will contain a separate line for each row selected. Each field will end with a vertical bar. Any special characters (e.g., an apostrophe) will be preceded by a backslash. Because a backslash is an escape character within an SQL statement, two backslashes are needed in the ESCAPE BY clause because the first escapes the second. To import the resulting data text file, use the FOUND_ROWS() statement.

The second syntax uses the clause INTO DUMPFILE and exports only one row into an external text file. It does not allow any field or line terminators like the INTO OUTFILE clause. Here is an example of its use:

SELECT photograph
INTO DUMPFILE '/tmp/bobs_picture.jpeg'
FROM employees
WHERE emp_id = '1827';

This statement exports the contents of the photograph column for an employee’s record. It’s a BLOB type column and contains an image file. The result of the exported file is a complete and usable image file.

You can also use the INTO clause to store a value in a user variable or a system variable for reuse. Here’s an example:

SET @sales = 0;

SELECT SUM(total_order) AS Sales
INTO @sales
FROM orders
WHERE YEAR(order_date) = YEAR(CURDATE());

This example creates the user variable @sales. Then we calculate the total sales for the current year and store it into that variable for reuse in subsequent statements in the session.

Grouping SELECT results

SELECT [flags] {*|column|expression}[, ...]
FROM table[, ...]
[WHERE condition]
[GROUP BY {column|expression|position} [ASC|DESC], ...
  [WITH ROLLUP]]
[other clauses] [options]

A SELECT statement sometimes produces more meaningful results if you group together rows containing the same value for a particular column. The GROUP BY clause specifies one or more columns by which MySQL is to group the data retrieved. This is used with aggregate functions so that the values of numeric columns for the rows grouped will be aggregated.

For instance, suppose that a SELECT statement is to list the sales representatives for a business and their orders for the month. Without a GROUP BY clause, one line would be displayed for each sales representative for each order. Here’s an example of how this might be resolved:

SELECT CONCAT(name_first, ' ', name_last) AS 'Sales Rep.',
SUM(total_order) AS 'Sales for Month'
FROM orders, employees
WHERE employees.emp_id = sales_rep
AND MONTH(order_date) = MONTH(CURDATE( ))
GROUP BY sales_rep;

This statement concatenates the first and last name of each sales representative who placed an order for a customer during the current month. The GROUP BY clause groups together the rows found for each sales representative. The SUM() function adds the values of the total_order column for each row within each group. See Chapter 10 for more information on the SUM() function and other aggregate functions.

You can specify multiple columns in the GROUP BY clause. Instead of stating a column’s name, you can state its position in the table, where a value of 1 represents the first column in the table. Expressions may be given as well.

The GROUP BY clause does its own sorting and cannot be used with the ORDER BY clause. To set the sorting to ascending order explicitly for a column, enter the ASC keyword after the column in the clause that is to be set. This is not necessary, though, since it is the default setting. To sort in descending order, add DESC after each column that is to be sorted in reverse.

When grouping rows by one column, it may be desirable not only to have a total of the values for certain columns, but also to display a total for all of the grouped rows at the end of the results set. To do this, use the WITH ROLLUP keyword. Here is an example:

SELECT location AS Branch,
CONCAT(name_first, ' ', name_last) AS 'Sales Rep.',
SUM(total_order) AS 'Sales for Month'
FROM orders, employees, branches
WHERE sales_rep = employees.emp_id
AND MONTH(order_date) = MONTH(CURDATE( ))
AND employees.branch_id = branches.branch_id
GROUP BY Branch, sales_rep WITH ROLLUP;

+---------------+-----------------+-----------------+
| Branch        | Sales Rep.      | Sales for Month |
+---------------+-----------------+-----------------+
| Boston        | Ricky Adams     |            2472 |
| Boston        | Morgan Miller   |            1600 |
| Boston        | Morgan Miller   |            4072 |
| New Orleans   | Marie Dyer      |            1750 |
| New Orleans   | Tom Smith       |            6407 |
| New Orleans   | Simone Caporale |            5722 |
| New Orleans   | Simone Caporale |           13879 |
| San Francisco | Geoffrey Dyer   |             500 |
| San Francisco | Kenneth Dyer    |             500 |
| San Francisco | Kenneth Dyer    |            1000 |
| NULL          | Kenneth Dyer    |           18951 |
+---------------+-----------------+-----------------+

This statement groups and adds up the total for each sales representative. When there aren’t any more sales representatives for a branch, a row in the display for the subtotal is generated. It displays the branch name and the name of the last representative. When there are no more branches, a row for the grand total of sales is generated. The branch shows NULL. For clarity, I’ve boldfaced the subtotals and the grand total in the results set.

Having SELECT results

SELECT [flags] {*|column|expression}[, ...]
FROM table[, ...]
[WHERE condition]
[GROUP BY condition]
[HAVING condition]
[other clauses] [options]

The HAVING clause is similar to the WHERE clause, but it is used for conditions returned by aggregate functions (e.g., AVG(), MIN(), and MAX()). For older versions of MySQL, you must use aliases for aggregate functions in the main clause of the SELECT statement. Here is an example of how you can use this clause:

SELECT CONCAT(name_first, ' ', name_last) AS 'Name', total_order
FROM orders
JOIN employees ON sales_rep = emp_id
JOIN branches USING (branch_id)
WHERE location = 'New Orleans'
GROUP BY sales_rep
HAVING MAX(total_order);

This SQL statement retrieves from the employees table a list of employee names for all employees located in the New Orleans branch office. From this list, the statement refines the results by grouping the data for each representative together and determines the sum of each one’s total_order column. Because of the MAX() function, it displays data only for the row with the maximum number. The JOIN clause is described in its own section earlier in this chapter.

Ordering SELECT results

SELECT [flags] {*|column|expression}[, ...]
FROM table[, ...]
[WHERE condition]
[ORDER BY {column|expression|position} [ASC|DESC], ...]
[other clauses] [options]

The results of a SELECT statement, by default, are displayed in the order in which the rows of data are found in the table, which may be the order in which they were entered into the table. To change the order of a results set, use the ORDER BY clause. As a basis for ordering the results, list one or more columns separated by commas. The order in which columns are listed is the order in which sorts will be conducted. You can also use aliases for columns, column combinations, or expressions that were established earlier in the same SELECT statement. Instead of stating a column’s name, you can also state its position, where a value of 1 represents the first column in the table. Here is an example of a SELECT statement using the ORDER BY clause:

SELECT CONCAT(name_first, ' ', name_last) AS Name,
MONTH(birth_date) AS 'Birth Month', email_address
FROM employees
ORDER BY 'Birth Month' ASC, Name ASC;

Here a list of employees, the months in which they were born, and their email addresses are extracted. For the name, the CONCAT() function is used to put the first and last name together, separated by a space. The AS clause establishes an alias of Name. The MONTH() function is used to extract the month from the birth_date column, and the AS clause sets up the alias Birth Month. In the ORDER BY clause, the alias for the birth date is used for the initial sort and the name is used for the secondary sort. The result will be that all of the employees who have a birth date in the same month will be listed together and in alphabetical order by name. Both aliases are followed by the ASC keyword to indicate that the results should be sorted in ascending order. This is unnecessary, as ascending order is the default. However, to change an ordering method to descending, use the DESC keyword.

You can also order the results using expressions, which may be based on columns or aliases. Here is an example of a SELECT statement using an expression for ordering:

SELECT CONCAT(name_first, ' ', name_last) AS name,
pay_rate, hours
FROM employees
ORDER BY pay_rate * hours DESC;

In this example, the first and last names are selected and concatenated together under the name column heading in the results set. The pay_rate column lists the hourly dollar rate an employee is paid, and the hours column contains the typical number of hours a week that an employee works. In the ORDER BY clause, the product of the hourly pay rate and the number of hours is determined for the ordering of the results set. The rows are to be listed in descending order per the DESC keyword based on the expression.

Limiting SELECT results

SELECT [flags] {*|column|expression}[, ...]
FROM table[, ...]
[WHERE condition]
[other clauses]
[LIMIT {[offset,] count|count OFFSET offset}]
[PROCEDURE procedure(arguments)]
[FOR UPDATE|LOCK IN SHARE MODE]]
[other clauses] [options]

The LIMIT clause is used to limit the number of rows displayed by the SELECT statement. The most straightforward method of limiting the number of rows is to specify the maximum row count to be displayed, like this:

SELECT * FROM employees
LIMIT 5;

To begin listing rows after a specific number of records, an offset may be given. The offset for the first row is 0. Two formats accomplish this. One gives the amount of the offset, followed by a comma and then the maximum count of rows to display. The other syntax structure specifies the count, followed by the OFFSET keyword, followed by the amount of the offset. Here is an example of the first structure, which is preferred:

SELECT * FROM employees
LIMIT 10, 5;

In this example, after the 10th record is reached, the next 5 records will be displayed—in other words, results 11 through 15 are returned. The offset and count for the LIMIT clause are based on the rows in the results set, not necessarily on the rows in the tables. So the amount of the offset is related to the order of the rows retrieved from the tables based on clauses, such as the WHERE clause and the ORDER BY clause. See the description of the LIMIT clause earlier in this chapter for more details.

Other SELECT clauses and options

SELECT [flags] {*|column|expression}[, ...]
FROM table[, ...]
[WHERE condition]
[other clauses]
[PROCEDURE procedure(arguments)]
[LOCK IN SHARE MODE|FOR UPDATE]

To send the results of a SELECT statement as standard input to a procedure, use the PROCEDURE clause. The PROCEDURE keyword is followed by the name of the procedure, which can be followed by parentheses containing parameters to be passed to the procedure. Here is an example:

SELECT * FROM employees
PROCEDURE ANALYSE(10, 225);

In this statement, the results of the SELECT statement are sent to the built-in function ANALYSE() along with two numeric parameters. See ANALYSE() near the end of this chapter for more information on this function.

To lock the rows that are being selected from a table, LOCK IN SHARE MODE may be given at the end of the SELECT statement. This prevents other clients from changing the data while the SELECT statement is running. The FOR UPDATE option instructs MySQL to invoke a temporary write lock on the rows being selected. Both of these locks will be terminated when the statement is finished running.

Table of Contents for MySQL in a Nutshell, 2nd Edition