Second Normal Form
The First Normal Form deals with duplicate data (or redundancy) across multiple columns. The Second Normal Form is all about redundancy across multiple rows. To achieve Second Normal Form, your tables must already be in First Normal Form. Once this has been done, you achieve Second Normal Form by identifying columns whose data repeats in different places and then removing them to their own tables.
So, let’s look again at Table 9-2. Notice how Darren Ryder bought two books and therefore his details are duplicated. This tells us that the Customer columns need to be pulled into their own table. Table 9-4 shows the result of removing the Customer columns from Table 9-2.
Table 9-4. The new Titles table
| ISBN |
Title |
Price |
| 0596101015 |
PHP Cookbook |
44.99 |
| 0596527403 |
Dynamic HTML |
59.99 |
| 0596005436 |
PHP and MySQL |
44.95 |
| 0596006815 |
Programming PHP |
39.99 |
As you can see, all that’s left in Table 9-4 are the ISBN, Title, and Price columns for four unique books, so this now constitutes an efficient and self-contained table that satisfies the requirements of both the First and Second Normal Forms. Along the way, we’ve managed to reduce the information to data closely related to book titles. This table could also include years of publication, page counts, numbers of reprints, and so on, as these details are also closely related. The only rule is that we can’t put in any column that could have multiple values for a single book, because then we’d have to list the same book in multiple rows and would thus violate Second Normal Form. Restoring an Author column, for instance, would violate this normalization.
However, looking at the extracted Customer columns, now in Table 9-5, we can see that there’s still more normalization work to do, because Darren Ryder’s details are still duplicated. And it could also be argued that First Normal Form Rule 2 (all columns should contain a single value) has not been properly complied with, because the addresses really need to be broken into separate columns for Address, City, State, and Zip.
Table 9-5. The customer details from Table 9-2
| ISBN |
Customer name |
Customer address |
Purchase date |
| 0596101015 |
Emma Brown |
1565 Rainbow Road, Los Angeles, CA 90014 |
Mar 03 2009 |
| 0596527403 |
Darren Ryder |
4758 Emily Drive, Richmond, VA 23219 |
Dec 19 2008 |
| 0596005436 |
Earl B. Thurston |
862 Gregory Lane, Frankfort, KY 40601 |
Jun 22 2009 |
| 0596101015 |
Darren Ryder |
4758 Emily Drive, Richmond, VA 23219 |
Dec 19 2008 |
| 0596006815 |
David Miller |
3647 Cedar Lane, Waltham, MA 02154 |
Jan 16 2009 |
What we have to do is split this table further to ensure that each customer’s details are entered only once. Because the ISBN is not and cannot be used as a primary key to identify customers (or authors), a new key must be created.
Table 9-6 is the result of normalizing the Customers table into both First and Second Normal Forms. Each customer now has a unique customer number called CustNo that is the table’s primary key, and that will most likely have been created via AUTO_INCREMENT. All the parts of customer addresses have also been separated into distinct columns to make them easily searchable and updateable.
Table 9-6. The new Customers table
| CustNo |
Name |
Address |
City |
State |
Zip |
| 1 |
Emma Brown |
1565 Rainbow Road |
Los Angeles |
CA |
90014 |
| 2 |
Darren Ryder |
4758 Emily Drive |
Richmond |
VA |
23219 |
| 3 |
Earl B. Thurston |
862 Gregory Lane |
Frankfort |
KY |
40601 |
| 4 |
David Miller |
3647 Cedar Lane |
Waltham |
MA |
02154 |
At the same time, in order to normalize Table 9-6, we had to remove the information on customer purchases, because otherwise, there would be multiple instances of customer details for each book purchased. Instead, the purchase data is now placed in a new table called Purchases (see Table 9-7).
Table 9-7. The new Purchases table
| CustNo |
ISBN |
Date |
| 1 |
0596101015 |
Mar 03 2009 |
| 2 |
0596527403 |
Dec 19 2008 |
| 2 |
0596101015 |
Dec 19 2008 |
| 3 |
0596005436 |
Jun 22 2009 |
| 4 |
0596006815 |
Jan 16 2009 |
Here the CustNo column from Table 9-6 is reused as a key to tie the Customers and Purchases tables together. Because the ISBN column is also repeated here, this table can be linked with the Authors and Titles tables, too.
The CustNo column may be a useful key in the Purchases table, but it’s not a primary key. A single customer can buy multiple books (and even multiple copies of one book), so the CustNo column is not a primary key. In fact, the Purchases table has no primary key. That’s all right, because we don’t expect to need to keep track of unique purchases. If one customer buys two copies of the same book on the same day, we’ll just allow two rows with the same information. For easy searching, we can define both CustNo and ISBN as keys—just not as primary keys.
Note
There are now four tables, one more than the three we had initially assumed would be needed. We arrived at this decision through the normalization process, by methodically following the First and Second Normal Form rules, which made it plain that a fourth table called Purchases would also be required.
The tables we now have are Authors (Table 9-3), Titles (Table 9-4), Customers (Table 9-6), and Purchases (Table 9-7), and we can link each table to any other using either the CustNo or the ISBN key.
For example, to see which books Darren Ryder has purchased, you can look him up in Table 9-6, the Customers table, where you will see his CustNo is 2. Armed with this number, you can now go to Table 9-7, the Purchases table; looking at the ISBN column here, you will see that he purchased titles 0596527403 and 0596101015 on December 19, 2008. This looks like a lot of trouble for a human, but it’s not so hard for MySQL.
To determine what these titles were, you can then refer to Table 9-4, the Titles table, and see that the books he bought were Dynamic HTML and PHP Cookbook. Should you wish to know the authors of these books, you could also use the ISBNs you just looked up on Table 9-3, the Authors table, and you would see that ISBN 0596527403, Dynamic HTML, was written by Danny Goodman, and that ISBN 0596101015, PHP Cookbook, was written by David Sklar and Adam Trachtenberg.