In certain situations, you may want your SQL logic to branch in one direction or another depending on the values of certain columns or expressions. This chapter focuses on how to write statements that can behave differently depending on the data encountered during statement execution.
Conditional logic is simply the ability to take one of several paths during
program execution. For example, when querying customer information, you might want
to retrieve either the fname/lname columns from the individual table or the name
column from the business table depending on what
type of customer is encountered. Using outer joins, you could return both strings
and let the caller figure out which one to use, as in:
mysql>SELECT c.cust_id, c.fed_id, c.cust_type_cd,->CONCAT(i.fname, ' ', i.lname) indiv_name,->b.name business_name->FROM customer c LEFT OUTER JOIN individual i->ON c.cust_id = i.cust_id->LEFT OUTER JOIN business b->ON c.cust_id = b.cust_id;+---------+-------------+--------------+-----------------+------------------------+ | cust_id | fed_id | cust_type_cd | indiv_name | business_name | +---------+-------------+--------------+-----------------+------------------------+ | 1 | 111-11-1111 | I | James Hadley | NULL | | 2 | 222-22-2222 | I | Susan Tingley | NULL | | 3 | 333-33-3333 | I | Frank Tucker | NULL | | 4 | 444-44-4444 | I | John Hayward | NULL | | 5 | 555-55-5555 | I | Charles Frasier | NULL | | 6 | 666-66-6666 | I | John Spencer | NULL | | 7 | 777-77-7777 | I | Margaret Young | NULL | | 8 | 888-88-8888 | I | Louis Blake | NULL | | 9 | 999-99-9999 | I | Richard Farley | NULL | | 10 | 04-1111111 | B | NULL | Chilton Engineering | | 11 | 04-2222222 | B | NULL | Northeast Cooling Inc. | | 12 | 04-3333333 | B | NULL | Superior Auto Body | | 13 | 04-4444444 | B | NULL | AAA Insurance Inc. | +---------+-------------+--------------+-----------------+------------------------+ 13 rows in set (0.13 sec)
The caller can look at the value of the cust_type_cd column and decide whether to use the indiv_name or business_name column. Instead, however, you could use conditional
logic via a case expression to determine the type of customer
and return the appropriate string, as in:
mysql>SELECT c.cust_id, c.fed_id,->CASE->WHEN c.cust_type_cd = 'I'->THEN CONCAT(i.fname, ' ', i.lname)->WHEN c.cust_type_cd = 'B'->THEN b.name->ELSE 'Unknown'->END name->FROM customer c LEFT OUTER JOIN individual i->ON c.cust_id = i.cust_id->LEFT OUTER JOIN business b->ON c.cust_id = b.cust_id;+---------+-------------+------------------------+ | cust_id | fed_id | name | +---------+-------------+------------------------+ | 1 | 111-11-1111 | James Hadley | | 2 | 222-22-2222 | Susan Tingley | | 3 | 333-33-3333 | Frank Tucker | | 4 | 444-44-4444 | John Hayward | | 5 | 555-55-5555 | Charles Frasier | | 6 | 666-66-6666 | John Spencer | | 7 | 777-77-7777 | Margaret Young | | 8 | 888-88-8888 | Louis Blake | | 9 | 999-99-9999 | Richard Farley | | 10 | 04-1111111 | Chilton Engineering | | 11 | 04-2222222 | Northeast Cooling Inc. | | 12 | 04-3333333 | Superior Auto Body | | 13 | 04-4444444 | AAA Insurance Inc. | +---------+-------------+------------------------+ 13 rows in set (0.00 sec)
This version of the query returns a single name
column that is generated by the case expression starting on the second
line of the query, which, in this example, checks the value of the
cust_type_cd column and returns either the
individual’s first/last names or the business name.
All of the major database servers include built-in functions designed to mimic the
if-then-else statement found in most programming languages (examples include
Oracle’s decode() function, MySQL’s if() function, and SQL Server’s coalesce() function). Case expressions are also
designed to facilitate if-then-else logic but enjoy two advantages over built-in
functions:
The next two subsections introduce the two different types of case expressions, and then I show you some examples of case expressions in action.
The case expression demonstrated earlier in the chapter is an example of a searched case expression, which has the following syntax:
CASE WHEN C1 THEN E1 WHEN C2 THEN E2 ... WHEN CN THEN EN [ELSE ED] END
In the previous definition, the symbols C1,
C2,..., CN represent conditions, and the symbols E1, E2,..., EN represent expressions to be returned by the
case expression. If the condition in a when
clause evaluates to true, then the case
expression returns the corresponding expression. Additionally, the ED symbol represents the default expression, which
the case expression returns if none of the conditions
C1, C2,..., CN evaluate to true (the else
clause is optional, which is why it is enclosed in square brackets). All the
expressions returned by the various when
clauses must evaluate to the same type (e.g., date, number, varchar).
Here’s an example of a searched case expression:
CASE
WHEN employee.title = 'Head Teller'
THEN 'Head Teller'
WHEN employee.title = 'Teller'
AND YEAR(employee.start_date) > 2007
THEN 'Teller Trainee'
WHEN employee.title = 'Teller'
AND YEAR(employee.start_date) < 2006
THEN 'Experienced Teller'
WHEN employee.title = 'Teller'
THEN 'Teller'
ELSE 'Non-Teller'
ENDThis case expression returns a string that can be used to determine hourly pay
scales, print name badges, and so forth. When the case expression is evaluated,
the when clauses are evaluated in order from
top to bottom; as soon as one of the conditions in a when clause evaluates to true,
the corresponding expression is returned and any remaining when clauses are ignored. If none of the when clause conditions evaluate to true, then the expression in the else clause is returned.
Although the previous example returns string expressions, keep in mind that
case expressions may return any type of
expression, including subqueries. Here’s another version of the
individual/business name query from earlier in the chapter that uses subqueries
instead of outer joins to retrieve data from the individual and business
tables:
mysql>SELECT c.cust_id, c.fed_id,->CASE->WHEN c.cust_type_cd = 'I' THEN->(SELECT CONCAT(i.fname, ' ', i.lname)->FROM individual i->WHERE i.cust_id = c.cust_id)->WHEN c.cust_type_cd = 'B' THEN->(SELECT b.name->FROM business b->WHERE b.cust_id = c.cust_id)->ELSE 'Unknown'->END name->FROM customer c;+---------+-------------+------------------------+ | cust_id | fed_id | name | +---------+-------------+------------------------+ | 1 | 111-11-1111 | James Hadley | | 2 | 222-22-2222 | Susan Tingley | | 3 | 333-33-3333 | Frank Tucker | | 4 | 444-44-4444 | John Hayward | | 5 | 555-55-5555 | Charles Frasier | | 6 | 666-66-6666 | John Spencer | | 7 | 777-77-7777 | Margaret Young | | 8 | 888-88-8888 | Louis Blake | | 9 | 999-99-9999 | Richard Farley | | 10 | 04-1111111 | Chilton Engineering | | 11 | 04-2222222 | Northeast Cooling Inc. | | 12 | 04-3333333 | Superior Auto Body | | 13 | 04-4444444 | AAA Insurance Inc. | +---------+-------------+------------------------+ 13 rows in set (0.01 sec)
This version of the query includes only the customer table in the from
clause and uses correlated subqueries to retrieve the appropriate name for each
customer. I prefer this version over the outer join version from earlier in the
chapter, since the server reads from the individual and business tables
only as needed instead of always joining all three tables.
The simple case expression is quite similar to the searched case expression but is a bit less flexible. Here’s the syntax:
CASE V0 WHEN V1 THEN E1 WHEN V2 THEN E2 ... WHEN VN THEN EN [ELSE ED] END
In the preceding definition, V0 represents
a value, and the symbols V1, V2,..., VN
represent values that are to be compared to V0. The symbols E1, E2,..., EN
represent expressions to be returned by the case expression, and ED represents the expression to be returned if
none of the values in the set V1, V2,..., VN
match the V0 value.
Here’s an example of a simple case expression:
CASE customer.cust_type_cd
WHEN 'I' THEN
(SELECT CONCAT(i.fname, ' ', i.lname)
FROM individual I
WHERE i.cust_id = customer.cust_id)
WHEN 'B' THEN
(SELECT b.name
FROM business b
WHERE b.cust_id = customer.cust_id)
ELSE 'Unknown Customer Type'
ENDSimple case expressions are less powerful than searched case expressions because you can’t specify your own conditions; instead, equality conditions are built for you. To show you what I mean, here’s a searched case expression having the same logic as the previous simple case expression:
CASE
WHEN customer.cust_type_cd = 'I' THEN
(SELECT CONCAT(i.fname, ' ', i.lname)
FROM individual I
WHERE i.cust_id = customer.cust_id)
WHEN customer.cust_type_cd = 'B' THEN
(SELECT b.name
FROM business b
WHERE b.cust_id = customer.cust_id)
ELSE 'Unknown Customer Type'
ENDWith searched case expressions, you can build range conditions, inequality
conditions, and multipart conditions using and/or/not, so I would recommend using searched case
expressions for all but the simplest logic.
The following sections present a variety of examples illustrating the utility of conditional logic in SQL statements.
You may have run into a situation where you are performing aggregations over a finite set of values, such as days of the week, but you want the result set to contain a single row with one column per value instead of one row per value. As an example, let’s say you have been asked to write a query that shows the number of accounts opened in the years 2000 through 2005:
mysql>SELECT YEAR(open_date) year, COUNT(*) how_many->FROM account->WHERE open_date > '1999-12-31'->AND open_date < '2006-01-01'->GROUP BY YEAR(open_date);+------+----------+ | year | how_many | +------+----------+ | 2000 | 3 | | 2001 | 4 | | 2002 | 5 | | 2003 | 3 | | 2004 | 9 | +------+----------+ 5 rows in set (0.00 sec)
However, you have also been instructed to return a single row of data with six columns (one for each year in the data range). To transform this result set into a single row, you will need to create six columns and, within each column, sum only those rows pertaining to the year in question:
mysql>SELECT->SUM(CASE->WHEN EXTRACT(YEAR FROM open_date) = 2000 THEN 1->ELSE 0->END) year_2000,->SUM(CASE->WHEN EXTRACT(YEAR FROM open_date) = 2001 THEN 1->ELSE 0->END) year_2001,->SUM(CASE->WHEN EXTRACT(YEAR FROM open_date) = 2002 THEN 1->ELSE 0->END) year_2002,->SUM(CASE->WHEN EXTRACT(YEAR FROM open_date) = 2003 THEN 1->ELSE 0->END) year_2003,->SUM(CASE->WHEN EXTRACT(YEAR FROM open_date) = 2004 THEN 1->ELSE 0->END) year_2004,->SUM(CASE->WHEN EXTRACT(YEAR FROM open_date) = 2005 THEN 1->ELSE 0->END) year_2005->FROM account->WHERE open_date > '1999-12-31' AND open_date < '2006-01-01';+-----------+-----------+-----------+-----------+-----------+-----------+ | year_2000 | year_2001 | year_2002 | year_2003 | year_2004 | year_2005 | +-----------+-----------+-----------+-----------+-----------+-----------+ | 3 | 4 | 5 | 3 | 9 | 0 | +-----------+-----------+-----------+-----------+-----------+-----------+ 1 row in set (0.01 sec)
Each of the six columns in the previous query are identical, except for the
year value. When the extract() function
returns the desired year for that column, the case expression returns the value
1; otherwise, it returns a 0. When summed over all accounts opened since
2000, each column returns the number of accounts opened for that year.
Obviously, such transformations are practical for only a small number of values;
generating one column for each year since 1905 would quickly become
tedious.
Although it is a bit advanced for this book, it is worth pointing out that
both SQL Server and Oracle Database 11g include
PIVOT clauses specifically for these
types of queries.
Back in Chapter 9, I showed a partial solution for an
example that demonstrated how to find accounts whose account balances don’t
agree with the raw data in the transaction table. The reason for the partial solution was that a
full solution requires the use of conditional logic, so all the pieces are now
in place to finish the job. Here’s where I left off in Chapter 9:
SELECT CONCAT('ALERT! : Account #', a.account_id,
' Has Incorrect Balance!')
FROM account a
WHERE (a.avail_balance, a.pending_balance) <>
(SELECT SUM(<expression to generate available balance>),
SUM(<expression to generate pending balance>)
FROM transaction t
WHERE t.account_id = a.account_id);The query uses a correlated subquery on the transaction table to sum together the individual transactions for
a given account. When summing transactions, you need to consider the following
two issues:
Transaction amounts are always positive, so you need to look at the transaction type to see whether the transaction is a debit or a credit and flip the sign (multiply by −1) for debit transactions.
If the date in the funds_avail_date
column is greater than the current day, the transaction should be added
to the pending balance total but not to the available balance
total.
While some transactions need to be excluded from the available balance, all transactions are included in the pending balance, making it the simpler of the two calculations. Here’s the case expression used to calculate the pending balance:
CASE
WHEN transaction.txn_type_cd = 'DBT'
THEN transaction.amount * −1
ELSE transaction.amount
ENDThus, all transaction amounts are multiplied by −1 for debit transactions and
are left as is for credit transactions. This same logic applies to the available
balance calculation as well, but only transactions that have become available
should be included. Therefore, the case expression used to calculate available
balance includes one additional when
clause:
CASE
WHEN transaction.funds_avail_date > CURRENT_TIMESTAMP()
THEN 0
WHEN transaction.txn_type_cd = 'DBT'
THEN transaction.amount * −1
ELSE transaction.amount
ENDWith the first when clause in place,
unavailable funds, such as checks that have not cleared, will contribute $0 to
the sum. Here’s the final query with the two case expressions in
place:
SELECT CONCAT('ALERT! : Account #', a.account_id,
' Has Incorrect Balance!')
FROM account a
WHERE (a.avail_balance, a.pending_balance) <>
(SELECT
SUM(CASE
WHEN t.funds_avail_date > CURRENT_TIMESTAMP()
THEN 0
WHEN t.txn_type_cd = 'DBT'
THEN t.amount * −1
ELSE t.amount
END),
SUM(CASE
WHEN t.txn_type_cd = 'DBT'
THEN t.amount * −1
ELSE t.amount
END)
FROM transaction t
WHERE t.account_id = a.account_id);By using conditional logic, the sum()
aggregate functions are being fed manipulated data by the two case expressions,
allowing the appropriate amounts to be summed.
Sometimes you will want to determine whether a relationship exists between two entities without regard for the quantity. For example, you might want to know whether a customer has any checking or savings accounts, but you don’t care whether a customer has more than one of each type of account. Here’s a query that uses multiple case expressions to generate two output columns, one to show whether the customer has any checking accounts and the other to show whether the customer has any savings accounts:
mysql>SELECT c.cust_id, c.fed_id, c.cust_type_cd,->CASE->WHEN EXISTS (SELECT 1 FROM account a->WHERE a.cust_id = c.cust_id->AND a.product_cd = 'CHK') THEN 'Y'->ELSE 'N'->END has_checking,->CASE->WHEN EXISTS (SELECT 1 FROM account a->WHERE a.cust_id = c.cust_id->AND a.product_cd = 'SAV') THEN 'Y'->ELSE 'N'->END has_savings->FROM customer c;+---------+-------------+--------------+--------------+-------------+ | cust_id | fed_id | cust_type_cd | has_checking | has_savings | +---------+-------------+--------------+--------------+-------------+ | 1 | 111-11-1111 | I | Y | Y | | 2 | 222-22-2222 | I | Y | Y | | 3 | 333-33-3333 | I | Y | N | | 4 | 444-44-4444 | I | Y | Y | | 5 | 555-55-5555 | I | Y | N | | 6 | 666-66-6666 | I | Y | N | | 7 | 777-77-7777 | I | N | N | | 8 | 888-88-8888 | I | Y | Y | | 9 | 999-99-9999 | I | Y | N | | 10 | 04-1111111 | B | Y | N | | 11 | 04-2222222 | B | N | N | | 12 | 04-3333333 | B | Y | N | | 13 | 04-4444444 | B | N | N | +---------+-------------+--------------+--------------+-------------+ 13 rows in set (0.00 sec)
Each case expression includes a correlated subquery against the account table; one looks for checking accounts,
the other for savings accounts. Since each when clause uses the exists
operator, the conditions evaluate to true as
long as the customer has at least one of the desired accounts.
In other cases, you may care how many rows are encountered, but only up to a
point. For example, the next query uses a simple case expression to count the
number of accounts for each customer, and then returns either 'None', '1',
'2', or '3+':
mysql>SELECT c.cust_id, c.fed_id, c.cust_type_cd,->CASE (SELECT COUNT(*) FROM account a->WHERE a.cust_id = c.cust_id)->WHEN 0 THEN 'None'->WHEN 1 THEN '1'->WHEN 2 THEN '2'->ELSE '3+'->END num_accounts->FROM customer c;+---------+-------------+--------------+--------------+ | cust_id | fed_id | cust_type_cd | num_accounts | +---------+-------------+--------------+--------------+ | 1 | 111-11-1111 | I | 3+ | | 2 | 222-22-2222 | I | 2 | | 3 | 333-33-3333 | I | 2 | | 4 | 444-44-4444 | I | 3+ | | 5 | 555-55-5555 | I | 1 | | 6 | 666-66-6666 | I | 2 | | 7 | 777-77-7777 | I | 1 | | 8 | 888-88-8888 | I | 2 | | 9 | 999-99-9999 | I | 3+ | | 10 | 04-1111111 | B | 2 | | 11 | 04-2222222 | B | 1 | | 12 | 04-3333333 | B | 1 | | 13 | 04-4444444 | B | 1 | +---------+-------------+--------------+--------------+ 13 rows in set (0.01 sec)
For this query, I didn’t want to differentiate between customers having more
than two accounts, so the case expression simply creates a '3+' category. Such a query might be useful if you
were looking for customers to contact regarding opening a new account with the
bank.
When performing calculations that include division, you should always take
care to ensure that the denominators are never equal to zero. Whereas some
database servers, such as Oracle Database, will throw an error when a zero
denominator is encountered, MySQL simply sets the result of the calculation to
null, as demonstrated by the following:
mysql> SELECT 100 / 0;
+---------+
| 100 / 0 |
+---------+
| NULL |
+---------+
1 row in set (0.00 sec)To safeguard your calculations from encountering errors or, even worse, from
being mysteriously set to null, you should
wrap all denominators in conditional logic, as demonstrated by the
following:
mysql>SELECT a.cust_id, a.product_cd, a.avail_balance /->CASE->WHEN prod_tots.tot_balance = 0 THEN 1->ELSE prod_tots.tot_balance->END percent_of_total->FROM account a INNER JOIN->(SELECT a.product_cd, SUM(a.avail_balance) tot_balance->FROM account a->GROUP BY a.product_cd) prod_tots->ON a.product_cd = prod_tots.product_cd;+---------+------------+------------------+ | cust_id | product_cd | percent_of_total | +---------+------------+------------------+ | 10 | BUS | 0.000000 | | 11 | BUS | 1.000000 | | 1 | CD | 0.153846 | | 6 | CD | 0.512821 | | 7 | CD | 0.256410 | | 9 | CD | 0.076923 | | 1 | CHK | 0.014488 | | 2 | CHK | 0.030928 | | 3 | CHK | 0.014488 | | 4 | CHK | 0.007316 | | 5 | CHK | 0.030654 | | 6 | CHK | 0.001676 | | 8 | CHK | 0.047764 | | 9 | CHK | 0.001721 | | 10 | CHK | 0.322911 | | 12 | CHK | 0.528052 | | 3 | MM | 0.129802 | | 4 | MM | 0.321915 | | 9 | MM | 0.548282 | | 1 | SAV | 0.269431 | | 2 | SAV | 0.107773 | | 4 | SAV | 0.413723 | | 8 | SAV | 0.209073 | | 13 | SBL | 1.000000 | +---------+------------+------------------+ 24 rows in set (0.13 sec)
This query computes the ratio of each account balance to the total balance for all accounts of the same product type. Since some product types, such as business loans, could have a total balance of zero if all loans were currently paid in full, it is best to include the case expression to ensure that the denominator is never zero.
When updating rows in a table, you sometimes need to decide what values to set
certain columns to. For example, after inserting a new transaction, you need to
modify the avail_balance, pending_balance, and last_activity_date columns in the account table. Although the last two columns are easily updated,
to correctly modify the avail_balance column you need to know whether the funds from the
transaction are immediately available by checking the funds_avail_date column in the transaction table. Given that transaction ID 999 has just been inserted, you can use the
following update statement to modify the
three columns in the account table:
1 UPDATE account 2 SET last_activity_date = CURRENT_TIMESTAMP(), 3 pending_balance = pending_balance + 4 (SELECT t.amount * 5 CASE t.txn_type_cd WHEN 'DBT' THEN −1 ELSE 1 END 6 FROM transaction t 7 WHERE t.txn_id = 999), 8 avail_balance = avail_balance + 9 (SELECT 10 CASE 11 WHEN t.funds_avail_date > CURRENT_TIMESTAMP() THEN 0 12 ELSE t.amount * 13 CASE t.txn_type_cd WHEN 'DBT' THEN −1 ELSE 1 END 14 END 15 FROM transaction t 16 WHERE t.txn_id = 999) 17 WHERE account.account_id = 18 (SELECT t.account_id 19 FROM transaction t 20 WHERE t.txn_id = 999);
This statement contains a total of three case expressions : two of them (lines 5 and 13) are used to flip the sign on the transaction amount for debit transactions, and the third case expression (line 10) is used to check the funds availability date. If the date is in the future, then zero is added to the available balance; otherwise, the transaction amount is added.
While null values are the appropriate thing
to store in a table if the value for a column is unknown, it is not always
appropriate to retrieve null values for
display or to take part in expressions. For example, you might want to display
the word unknown on a data entry screen rather than leaving
a field blank. When retrieving the data, you can use a case expression to
substitute the string if the value is null,
as in:
SELECT emp_id, fname, lname,
CASE
WHEN title IS NULL THEN 'Unknown'
ELSE title
END
FROM employee;For calculations, null values often cause a
null result, as demonstrated by the
following:
mysql> SELECT (7 * 5) / ((3 + 14) * null);
+-----------------------------+
| (7 * 5) / ((3 + 14) * null) |
+-----------------------------+
| NULL |
+-----------------------------+
1 row in set (0.08 sec)When performing calculations, case expressions are useful for translating a
null value into a number (usually 0 or 1)
that will allow the calculation to yield a non-null value. If you are performing a calculation that includes the
account.avail_balance column, for
example, you could substitute a 0 (if doing addition or subtraction) or a 1 (if
doing multiplication or division) for those accounts that have been established
but haven’t yet been funded:
SELECT <some calculation> +
CASE
WHEN avail_balance IS NULL THEN 0
ELSE avail_balance
END
+ <rest of calculation>
...If a numeric column is allowed to contain null values, it is generally a good idea to use conditional logic
in any calculations that include the column so that the results are
usable.
Challenge your ability to work through conditional logic problems with the examples that follow. When you’re done, compare your solutions with those in Appendix C.
Rewrite the following query, which uses a simple case expression, so that the
same results are achieved using a searched case expression. Try to use as few
when clauses as possible.
SELECT emp_id,
CASE title
WHEN 'President' THEN 'Management'
WHEN 'Vice President' THEN 'Management'
WHEN 'Treasurer' THEN 'Management'
WHEN 'Loan Manager' THEN 'Management'
WHEN 'Operations Manager' THEN 'Operations'
WHEN 'Head Teller' THEN 'Operations'
WHEN 'Teller' THEN 'Operations'
ELSE 'Unknown'
END
FROM employee;Rewrite the following query so that the result set contains a single row with
four columns (one for each branch). Name the four columns branch_1 through branch_4.
mysql>SELECT open_branch_id, COUNT(*)->FROM account->GROUP BY open_branch_id;+----------------+----------+ | open_branch_id | COUNT(*) | +----------------+----------+ | 1 | 8 | | 2 | 7 | | 3 | 3 | | 4 | 6 | +----------------+----------+ 4 rows in set (0.00 sec)