Along with storing all of the data that various users insert into a database, a database server also needs to store information about all of the database objects (tables, views, indexes, etc.) that were created to store this data. The database server stores this information, not surprisingly, in a database. This chapter discusses how and where this information, known as metadata, is stored, how you can access it, and how you can use it to build flexible systems.
Metadata is essentially data about data. Every time you create a database object, the database server needs to record various pieces of information. For example, if you were to create a table with multiple columns, a primary key constraint, three indexes, and a foreign key constraint, the database server would need to store all the following information:
Table name
Table storage information (tablespace, initial size, etc.)
Storage engine
Column names
Column data types
Default column values
NOT NULL column constraints
Primary key columns
Primary key name
Name of primary key index
Index names
Index types (B-tree, bitmap)
Indexed columns
Index column sort order (ascending or descending)
Index storage information
Foreign key name
Foreign key columns
Associated table/columns for foreign keys
This data is collectively known as the data dictionary or
system catalog. The database server needs to store this
data persistently, and it needs to be able to quickly retrieve this data in order to
verify and execute SQL statements. Additionally, the database server must safeguard
this data so that it can be modified only via an appropriate mechanism, such as the
alter table statement.
While standards exist for the exchange of metadata between different servers, every database server uses a different mechanism to publish metadata, such as:
A set of views, such as Oracle Database’s user_tables and all_constraints views
A set of system-stored procedures, such as SQL Server’s sp_tables procedure or Oracle Database’s
dbms_metadata package
A special database, such as MySQL’s information_schema database
Along with SQL Server’s system-stored procedures, which are a vestige of its
Sybase lineage, SQL Server also includes a special schema called information_schema that is provided automatically
within each database. Both MySQL and SQL Server provide this interface to conform
with the ANSI SQL:2003 standard. The remainder of this chapter discusses the
information_schema objects that are available
in MySQL and SQL Server.
All of the objects available within the information_schema database (or schema, in the case of SQL Server)
are views. Unlike the describe utility, which I
used in several chapters of this book as a way to show the structure of various
tables and views, the views within information_schema can be queried, and, thus, used programmatically
(more on this later in the chapter). Here’s an example that demonstrates how to
retrieve the names of all of the tables in the bank database:
mysql>SELECT table_name, table_type->FROM information_schema.tables->WHERE table_schema = 'bank'->ORDER BY 1;+----------------------+------------+ | table_name | table_type | +----------------------+------------+ | account | BASE TABLE | | branch | BASE TABLE | | branch_activity_vw | VIEW | | business | BASE TABLE | | business_customer_vw | VIEW | | customer | BASE TABLE | | customer_vw | VIEW | | department | BASE TABLE | | employee | BASE TABLE | | employee_vw | VIEW | | individual | BASE TABLE | | nh_customer_vw | VIEW | | officer | BASE TABLE | | product | BASE TABLE | | product_type | BASE TABLE | | transaction | BASE TABLE | +----------------------+------------+ 16 rows in set (0.02 sec)
Along with the various tables we created back in Chapter 2, the results show several of the
views that I demonstrated in Chapter 14. If you want to exclude the
views, simply add another condition to the where
clause:
mysql>SELECT table_name, table_type->FROM information_schema.tables->WHERE table_schema = 'bank' AND table_type = 'BASE TABLE'->ORDER BY 1;+--------------+------------+ | table_name | table_type | +--------------+------------+ | account | BASE TABLE | | branch | BASE TABLE | | business | BASE TABLE | | customer | BASE TABLE | | department | BASE TABLE | | employee | BASE TABLE | | individual | BASE TABLE | | officer | BASE TABLE | | product | BASE TABLE | | product_type | BASE TABLE | | transaction | BASE TABLE | +--------------+------------+ 11 rows in set (0.01 sec)
If you are only interested in information about views, you can query information_schema.views. Along
with the view names, you can retrieve additional information, such as a flag that
shows whether a view is updatable:
mysql>SELECT table_name, is_updatable->FROM information_schema.views->WHERE table_schema = 'bank'->ORDER BY 1;+----------------------+--------------+ | table_name | is_updatable | +----------------------+--------------+ | branch_activity_vw | NO | | business_customer_vw | YES | | customer_vw | YES | | employee_vw | YES | | nh_customer_vw | YES | +----------------------+--------------+ 5 rows in set (1.83 sec)
Additionally, you can retrieve the view’s underlying query using the view_definition column, as long as the query is small
enough (4,000 characters or fewer for MySQL).
Column information for both tables and views is available via the columns view. The following query shows column
information for the account table:
mysql>SELECT column_name, data_type, character_maximum_length char_max_len,->numeric_precision num_prcsn, numeric_scale num_scale->FROM information_schema.columns->WHERE table_schema = 'bank' AND table_name = 'account'->ORDER BY ordinal_position;+--------------------+-----------+--------------+-----------+-----------+ | column_name | data_type | char_max_len | num_prcsn | num_scale | +--------------------+-----------+--------------+-----------+-----------+ | account_id | int | NULL | 10 | 0 | | product_cd | varchar | 10 | NULL | NULL | | cust_id | int | NULL | 10 | 0 | | open_date | date | NULL | NULL | NULL | | close_date | date | NULL | NULL | NULL | | last_activity_date | date | NULL | NULL | NULL | | status | enum | 6 | NULL | NULL | | open_branch_id | smallint | NULL | 5 | 0 | | open_emp_id | smallint | NULL | 5 | 0 | | avail_balance | float | NULL | 10 | 2 | | pending_balance | float | NULL | 10 | 2 | +--------------------+-----------+--------------+-----------+-----------+ 11 rows in set (0.02 sec)
The ordinal_position column is included merely
as a means to retrieve the columns in the order in which they were added to the
table.
You can retrieve information about a table’s indexes via the information_schema.statistics view as demonstrated by
the following query, which retrieves information for the indexes built on the
account table:
mysql>SELECT index_name, non_unique, seq_in_index, column_name->FROM information_schema.statistics->WHERE table_schema = 'bank' AND table_name = 'account'->ORDER BY 1, 3;+----------------+------------+--------------+----------------+ | index_name | non_unique | seq_in_index | column_name | +----------------+------------+--------------+----------------+ | acc_bal_idx | 1 | 1 | cust_id | | acc_bal_idx | 1 | 2 | avail_balance | | fk_a_branch_id | 1 | 1 | open_branch_id | | fk_a_emp_id | 1 | 1 | open_emp_id | | fk_product_cd | 1 | 1 | product_cd | | PRIMARY | 0 | 1 | account_id | +----------------+------------+--------------+----------------+ 6 rows in set (0.09 sec)
The account table has a total of five indexes,
one of which has two columns (acc_bal_idx) and
one of which is a unique index (PRIMARY).
You can retrieve the different types of constraints (foreign key, primary key,
unique) that have been created via the information_schema.table_constraints view. Here’s a query that
retrieves all of the constraints in the bank
schema:
mysql>SELECT constraint_name, table_name, constraint_type->FROM information_schema.table_constraints->WHERE table_schema = 'bank'->ORDER BY 3,1;+--------------------+--------------+-----------------+ | constraint_name | table_name | constraint_type | +--------------------+--------------+-----------------+ | fk_a_branch_id | account | FOREIGN KEY | | fk_a_cust_id | account | FOREIGN KEY | | fk_a_emp_id | account | FOREIGN KEY | | fk_b_cust_id | business | FOREIGN KEY | | fk_dept_id | employee | FOREIGN KEY | | fk_exec_branch_id | transaction | FOREIGN KEY | | fk_e_branch_id | employee | FOREIGN KEY | | fk_e_emp_id | employee | FOREIGN KEY | | fk_i_cust_id | individual | FOREIGN KEY | | fk_o_cust_id | officer | FOREIGN KEY | | fk_product_cd | account | FOREIGN KEY | | fk_product_type_cd | product | FOREIGN KEY | | fk_teller_emp_id | transaction | FOREIGN KEY | | fk_t_account_id | transaction | FOREIGN KEY | | PRIMARY | branch | PRIMARY KEY | | PRIMARY | account | PRIMARY KEY | | PRIMARY | product | PRIMARY KEY | | PRIMARY | department | PRIMARY KEY | | PRIMARY | customer | PRIMARY KEY | | PRIMARY | transaction | PRIMARY KEY | | PRIMARY | officer | PRIMARY KEY | | PRIMARY | product_type | PRIMARY KEY | | PRIMARY | employee | PRIMARY KEY | | PRIMARY | business | PRIMARY KEY | | PRIMARY | individual | PRIMARY KEY | | dept_name_idx | department | UNIQUE | +--------------------+--------------+-----------------+ 26 rows in set (2.28 sec)
Table 15-1 shows the entire set of
information_schema views that are available
in MySQL version 6.0.
|
View name |
Provides information about… |
|
|
Databases |
|
|
Tables and views |
|
|
Columns of tables and views |
|
|
Indexes |
|
|
Who has privileges on which schema objects |
|
|
Who has privileges on which databases |
|
|
Who has privileges on which tables |
|
|
Who has privileges on which columns of which tables |
|
|
What character sets are available |
|
|
What collations are available for which character sets |
|
|
Which character sets are available for which collation |
|
|
The unique, foreign key, and primary key constraints |
|
|
The constraints associated with each key column |
|
|
Stored routines (procedures and functions) |
|
|
Views |
|
|
Table triggers |
|
|
Server plug-ins |
|
|
Available storage engines |
|
|
Table partitions |
|
|
Scheduled events |
|
|
Running processes |
|
|
Foreign keys |
|
|
Server status information |
|
|
Session status information |
|
|
Server status variables |
|
|
Session status variables |
|
|
Stored procedure and function parameters |
|
|
User profiling information |
While some of these views, such as engines,
events, and plugins, are specific to MySQL, many of these views are available in
SQL Server as well. If you are using Oracle Database, please consult the online
Oracle Database
Reference Guide for information about the user_, all_, and dba_ views.
As I mentioned earlier, having the ability to retrieve information about your schema objects via SQL queries opens up some interesting possibilities. This section shows several ways in which you can make use of metadata in your applications.
While some project teams include a full-time database designer who oversees
the design and implementation of the database, many projects take the
“design-by-committee” approach, allowing multiple people to create database
objects. After several weeks or months of development, you may need to generate
a script that will create the various tables, indexes, views, and so on that the
team has deployed. Although a variety of tools and utilities will generate these
types of scripts for you, you can also query the information_schema views and generate the
script yourself.
As an example, let’s build a script that will create the bank.customer table. Here’s the command used to
build the table, which I extracted from the script used to build the example
database:
create table customer
(cust_id integer unsigned not null auto_increment,
fed_id varchar(12) not null,
cust_type_cd enum('I','B') not null,
address varchar(30),
city varchar(20),
state varchar(20),
postal_code varchar(10),
constraint pk_customer primary key (cust_id)
);Although it would certainly be easier to generate the script with the use of a
procedural language (e.g., Transact-SQL or Java), since this is a book about SQL
I’m going to write a single query that will generate the create table statement. The first step is to query
the information_schema.columns table to
retrieve information about the columns in the table:
mysql>SELECT 'CREATE TABLE customer (' create_table_statement->UNION ALL->SELECT cols.txt->FROM->(SELECT concat(' ',column_name, ' ', column_type,->CASE->WHEN is_nullable = 'NO' THEN ' not null'->ELSE ''->END,->CASE->WHEN extra IS NOT NULL THEN concat(' ', extra)->ELSE ''->END,->',') txt->FROM information_schema.columns->WHERE table_schema = 'bank' AND table_name = 'customer'->ORDER BY ordinal_position->) cols->UNION ALL->SELECT ')';+-----------------------------------------------------+ | create_table_statement | +-----------------------------------------------------+ | CREATE TABLE customer ( | | cust_id int(10) unsigned not null auto_increment, | | fed_id varchar(12) not null , | | cust_type_cd enum('I','B') not null , | | address varchar(30) , | | city varchar(20) , | | state varchar(20) , | | postal_code varchar(10) , | | ) | +-----------------------------------------------------+ 9 rows in set (0.04 sec)
Well, that got us pretty close; we just need to add queries against the
table_constraints
and key_column_usage views to retrieve
information about the primary key constraint:
mysql>SELECT 'CREATE TABLE customer (' create_table_statement->UNION ALL->SELECT cols.txt->FROM->(SELECT concat(' ',column_name, ' ', column_type,->CASE->WHEN is_nullable = 'NO' THEN ' not null'->ELSE ''->END,->CASE->WHEN extra IS NOT NULL THEN concat(' ', extra)->ELSE ''->END,->',') txt->FROM information_schema.columns->WHERE table_schema = 'bank' AND table_name = 'customer'->ORDER BY ordinal_position->) cols->UNION ALL->SELECT concat(' constraint primary key (')->FROM information_schema.table_constraints->WHERE table_schema = 'bank' AND table_name = 'customer'->AND constraint_type = 'PRIMARY KEY'->UNION ALL->SELECT cols.txt->FROM->(SELECT concat(CASE WHEN ordinal_position > 1 THEN ' ,'->ELSE ' ' END, column_name) txt->FROM information_schema.key_column_usage->WHERE table_schema = 'bank' AND table_name = 'customer'->AND constraint_name = 'PRIMARY'->ORDER BY ordinal_position->) cols->UNION ALL->SELECT ' )'->UNION ALL->SELECT ')';+-----------------------------------------------------+ | create_table_statement | +-----------------------------------------------------+ | CREATE TABLE customer ( | | cust_id int(10) unsigned not null auto_increment, | | fed_id varchar(12) not null , | | cust_type_cd enum('I','B') not null , | | address varchar(30) , | | city varchar(20) , | | state varchar(20) , | | postal_code varchar(10) , | | constraint primary key ( | | cust_id | | ) | | ) | +-----------------------------------------------------+ 12 rows in set (0.02 sec)
To see whether the statement is properly formed, I’ll paste the query output
into the mysql tool (I’ve changed the table
name to customer2 so that it won’t step on
our other table):
mysql>CREATE TABLE customer2 (->cust_id int(10) unsigned not null auto_increment,->fed_id varchar(12) not null ,->cust_type_cd enum('I','B') not null ,->address varchar(30) ,->city varchar(20) ,->state varchar(20) ,->postal_code varchar(10) ,->constraint primary key (->cust_id->)->);Query OK, 0 rows affected (0.14 sec)
The statement executed without errors, and there is now a customer2 table in the bank database. In order for the query to generate a well-formed
create table statement for
any table, more work is required (such as handling
indexes and foreign key constraints), but I’ll leave that as an
exercise.
Many organizations allow for database maintenance windows, wherein existing
database objects may be administered (such as adding/dropping partitions) and
new schema objects and code can be deployed. After the deployment scripts have
been run, it’s a good idea to run a verification script to ensure that the new
schema objects are in place with the appropriate columns, indexes, primary keys,
and so forth. Here’s a query that returns the number of columns, number of
indexes, and number of primary key constraints (0 or 1) for each table in the
bank schema:
mysql>SELECT tbl.table_name,->(SELECT count(*) FROM information_schema.columns clm->WHERE clm.table_schema = tbl.table_schema->AND clm.table_name = tbl.table_name) num_columns,->(SELECT count(*) FROM information_schema.statistics sta->WHERE sta.table_schema = tbl.table_schema->AND sta.table_name = tbl.table_name) num_indexes,->(SELECT count(*) FROM information_schema.table_constraints tc->WHERE tc.table_schema = tbl.table_schema->AND tc.table_name = tbl.table_name->AND tc.constraint_type = 'PRIMARY KEY') num_primary_keys->FROM information_schema.tables tbl->WHERE tbl.table_schema = 'bank' AND tbl.table_type = 'BASE TABLE'->ORDER BY 1;+--------------+-------------+-------------+------------------+ | table_name | num_columns | num_indexes | num_primary_keys | +--------------+-------------+-------------+------------------+ | account | 11 | 6 | 1 | | branch | 6 | 1 | 1 | | business | 4 | 1 | 1 | | customer | 7 | 1 | 1 | | department | 2 | 2 | 1 | | employee | 9 | 4 | 1 | | individual | 4 | 1 | 1 | | officer | 7 | 2 | 1 | | product | 5 | 2 | 1 | | product_type | 2 | 1 | 1 | | transaction | 8 | 4 | 1 | +--------------+-------------+-------------+------------------+ 11 rows in set (13.83 sec)
You could execute this statement before and after the deployment and then verify any differences between the two sets of results before declaring the deployment a success.
Some languages, such as Oracle’s PL/SQL and Microsoft’s Transact-SQL, are supersets of the SQL language, meaning that they include SQL statements in their grammar along with the usual procedural constructs, such as “if-then-else” and “while.” Other languages, such as Java, include the ability to interface with a relational database, but do not include SQL statements in the grammar, meaning that all SQL statements must be contained within strings.
Therefore, most relational database servers, including SQL Server, Oracle
Database, and MySQL, allow SQL statements to be submitted to the server as
strings. Submitting strings to a database engine rather than utilizing its SQL
interface is generally known as dynamic SQL execution.
Oracle’s PL/SQL language, for example, includes an execute immediate command, which you can use to submit a string
for execution, while SQL Server includes a system stored procedure called
sp_executesql for executing SQL
statements dynamically.
MySQL provides the statements prepare,
execute, and deallocate to allow for dynamic SQL execution. Here’s a simple
example:
mysql>SET @qry = 'SELECT cust_id, cust_type_cd, fed_id FROM customer';Query OK, 0 rows affected (0.07 sec) mysql>PREPARE dynsql1 FROM @qry;Query OK, 0 rows affected (0.04 sec) Statement prepared mysql>EXECUTE dynsql1;+---------+--------------+-------------+ | cust_id | cust_type_cd | fed_id | +---------+--------------+-------------+ | 1 | I | 111-11-1111 | | 2 | I | 222-22-2222 | | 3 | I | 333-33-3333 | | 4 | I | 444-44-4444 | | 5 | I | 555-55-5555 | | 6 | I | 666-66-6666 | | 7 | I | 777-77-7777 | | 8 | I | 888-88-8888 | | 9 | I | 999-99-9999 | | 10 | B | 04-1111111 | | 11 | B | 04-2222222 | | 12 | B | 04-3333333 | | 13 | B | 04-4444444 | | 99 | I | 04-9999999 | +---------+--------------+-------------+ 14 rows in set (0.27 sec) mysql>DEALLOCATE PREPARE dynsql1;Query OK, 0 rows affected (0.00 sec)
The set statement simply assigns a string
to the qry variable, which is then submitted
to the database engine (for parsing, security checking, and optimization) using
the prepare statement. After executing the
statement by calling execute, the statement
must be closed using deallocate prepare,
which frees any database resources (e.g., cursors) that have been utilized
during execution.
The next example shows how you could execute a query that includes placeholders so that conditions can be specified at runtime:
mysql>SET @qry = 'SELECT product_cd, name, product_type_cd, date_offered, date_retired FROM product WHERE product_cd = ?';Query OK, 0 rows affected (0.00 sec) mysql>PREPARE dynsql2 FROM @qry;Query OK, 0 rows affected (0.00 sec) Statement prepared mysql>SET @prodcd = 'CHK';Query OK, 0 rows affected (0.00 sec) mysql>EXECUTE dynsql2 USING @prodcd;+------------+------------------+-----------------+--------------+-------------+ | product_cd | name | product_type_cd | date_offered | date_retired| +------------+------------------+-----------------+--------------+-------------+ | CHK | checking account | ACCOUNT | 2004-01-01 | NULL | +------------+------------------+-----------------+--------------+-------------+ 1 row in set (0.01 sec) mysql>SET @prodcd = 'SAV';Query OK, 0 rows affected (0.00 sec) mysql>EXECUTE dynsql2 USING @prodcd;+------------+-----------------+-----------------+--------------+--------------+ | product_cd | name | product_type_cd | date_offered | date_retired | +------------+-----------------+-----------------+--------------+--------------+ | SAV | savings account | ACCOUNT | 2004-01-01 | NULL | +------------+-----------------+-----------------+--------------+--------------+ 1 row in set (0.00 sec) mysql>DEALLOCATE PREPARE dynsql2;Query OK, 0 rows affected (0.00 sec)
In this sequence, the query contains a placeholder (the ? at the end of the statement) so that the product
code can be submitted at runtime. The statement is prepared once and then
executed twice, once for product code 'CHK'
and again for product code 'SAV', after which
the statement is closed.
What, you may wonder, does this have to do with metadata? Well, if you are
going to use dynamic SQL to query a table, why not build the query string using
metadata rather than hardcoding the table definition? The following example
generates the same dynamic SQL string as the previous example, but it retrieves
the column names from the information_schema.columns view:
mysql>SELECT concat('SELECT ',->concat_ws(',', cols.col1, cols.col2, cols.col3, cols.col4,->cols.col5, cols.col6, cols.col7, cols.col8, cols.col9),->' FROM product WHERE product_cd = ?')->INTO @qry->FROM->(SELECT->max(CASE WHEN ordinal_position = 1 THEN column_name->ELSE NULL END) col1,->max(CASE WHEN ordinal_position = 2 THEN column_name->ELSE NULL END) col2,->max(CASE WHEN ordinal_position = 3 THEN column_name->ELSE NULL END) col3,->max(CASE WHEN ordinal_position = 4 THEN column_name->ELSE NULL END) col4,->max(CASE WHEN ordinal_position = 5 THEN column_name->ELSE NULL END) col5,->max(CASE WHEN ordinal_position = 6 THEN column_name->ELSE NULL END) col6,->max(CASE WHEN ordinal_position = 7 THEN column_name->ELSE NULL END) col7,->max(CASE WHEN ordinal_position = 8 THEN column_name->ELSE NULL END) col8,->max(CASE WHEN ordinal_position = 9 THEN column_name->ELSE NULL END) col9->FROM information_schema.columns->WHERE table_schema = 'bank' AND table_name = 'product'->GROUP BY table_name->) cols;Query OK, 1 row affected (0.02 sec) mysql>SELECT @qry;+------------------------------------------------------------------------------- ---------------------+ | @qry | +------------------------------------------------------------------------------- ---------------------+ | SELECT product_cd,name,product_type_cd,date_offered,date_retired FROM product WHERE product_cd = ? | +------------------------------------------------------------------------------- ---------------------+ 1 row in set (0.00 sec) mysql>PREPARE dynsql3 FROM @qry;Query OK, 0 rows affected (0.01 sec) Statement prepared mysql>SET @prodcd = 'MM';Query OK, 0 rows affected (0.00 sec) mysql>EXECUTE dynsql3 USING @prodcd;+------------+----------------------+-----------------+--------------+--------------+ | product_cd | name | product_type_cd | date_offered | date_retired | +------------+----------------------+-----------------+--------------+--------------+ | MM | money market account | ACCOUNT | 2004-01-01 | NULL | +------------+----------------------+-----------------+--------------+--------------+ 1 row in set (0.00 sec) mysql>DEALLOCATE PREPARE dynsql3;Query OK, 0 rows affected (0.00 sec)
The query pivots the first nine columns in the product table, builds a query string using the concat and concat_ws functions, and assigns the string to the qry variable. The query string is then executed as
before.
Generally, it would be better to generate the query using a procedural language that includes looping constructs, such as Java, PL/SQL, Transact-SQL, or MySQL’s Stored Procedure Language. However, I wanted to demonstrate a pure SQL example, so I had to limit the number of columns retrieved to some reasonable number, which in this example is nine.
The following exercises are designed to test your understanding of metadata. When you’re finished, please see Appendix C for the solutions.
Write a query that lists all of the indexes in the bank schema. Include the table names.