This lesson should take about 35 minutes to complete. It will
introduce you to basic concepts about relational database management systems,
such as tables, primary keys, and indexed columns.
Relational Database System Concepts
After completing this lesson, please answer the following questions:
1. What element or group of elements uniquely identifies a
row in a table?
2. Describe how one table can be joined to another.
3. How can you improve the performance of a query?
4. When should a query include a condition that explicity deals
with null values?
5. Describe the four clauses that make up an SQL (Standard
Query Lanaguage) statement. Be sure to identify the function of each clause
in the SQL statement.
The data in the Data Warehouse is stored using a relational database
management system (or RDBMS) called Oracle. Relational databases store
their data in two-dimensional tables (rather like spreadsheets).
Each record within a table is stored in a row, and each data element
(or field) within a row is stored in a column. (The two dimensions of
a table are row and column.) As shown in Figure 1, a database might have a BLOOD_DONORS table and a DONATIONS
A primary key (PK), consisting of one or more columns, uniquely identifies
each row in a table.
For example, the primary key of the BLOOD_DONORS table is DONOR_ID, and
the primary key of the DONATIONS table is the combination of DONOR_ID
and DONATION_DATE. Figure 2 further illustrates the example: In the BLOOD_DONORS
table, primary key value 123-45-6789 uniquely identifies a record. In
the DONATIONS table, it is the combination of DONOR_ID and DONATION_DATE
that uniquely identifies a record.
Tables may be joined to each other.
For example, the BLOOD_DONORS table may be joined to the DONATIONS table
by matching DONOR_ID in the BLOOD_DONORS table to DONOR_ID in the DONATIONS
table to get a report as shown in Figure 3.
As shown in Figure 4, queries involving selection
conditions (to determine which rows are returned) may run faster when
indexed columns are used. An index works just like the index in the back
of a book. For example, you can do a sequential read and look at every
page of a book to find the references to John Doe, or you and find those
references faster by doing an indexed read, using the index (which happens
to point to just 3 pages out of a 500 page book). Oracle makes the decision
as to whether it is quicker to do a sequential read or an indexed read
for a given query, but it won't try to do an indexed read at all if your
query row selection conditions do not mention an indexed column.
Indexed columns and primary keys. The primary key is indexed. If
the primary key consists of more than one column, the index is created
from the concatenation of all the columns in the primary key. (For example,
the index for the primary key for the DONATIONS table has entries for
123-45-6789 1991/11/25, 234-56-7890 1992/01/21, and 234-56-7890 1993/10/25.)
The primary key index may be used if your conditions mention all components
of the primary key, or just the first one.
Individual columns may also have indexes created for them.
Unless otherwise specified, a column in an Oracle table may have a null
value. Any column that is part of the primary key, or that has been specified
as being Not Null, may not contain a null value. All other columns may
contain null values.
It is important to know about nulls when you set conditions (other than
simple = conditions) on your query. For example, Temperature may be null.
If you ask for rows with a Temperature equal to 98.6 (WHERE TEMPERATURE
= 98.6), the query will not return rows that have a null value for Temperature,
and that is probably what you want. But if you ask for rows with a Temperature
less than 98.6 (WHERE TEMPERATURE (98.6), the query will also not return
rows that have a null value for Temperature, because null means unknown--it
might be less than 98.6, or it might not. To be safe, include a condition
that explicitly deals with null values. For example, you could say WHERE
(TEMPERATURE (98.6) OR (TEMPERATURE IS NULL). Dealing with nulls is most
commonly an issue for negative conditions--for example, WHERE (TEMPERATURE
<> 98.6) OR (TEMPERATURE IS NULL) would get all the rows where the
Temperature is not 98.6, including rows where the Temperature has a null
SQL. Structured Query Language (SQL) is the ANSI (American National
Standards Institute) standard language for use with relational database
management systems. Query tools (such as Business Objects) automatically
translate your query specifications into SQL.
The SQL for a query generally has four parts:
- The SELECT clause, a required clause that specifies the column(s)
to be retrieved
- The FROM clause, a required clause that specifies the table(s) from
which the data is to be retrieved
- The WHERE clause, an optional clause that specifies the conditions
for selecting specific rows to be retrieved
- The ORDER BY clause, an optional clause that specifies the sort order
in which the retrieved rows are to be written to the query output