What the Warehouse Contains
This document describes how data in the Warehouse is organized and explains
table and data element terminology. It is subdivided as follows:
· Data Collections
· Tables
· Table Help
· Data Elements
· Data Element Help
· Summary Data
Data Collections
The Warehouse contains data from transaction systems. The information
from each transaction system is referred to as a data collection. For
example, information from SRS (Student Records System) is referred to
as the Student Data Collection, while account balance information from
the BEN Financials General Ledger is referred to as the Genera Ledger
Data Collection.
Note that depending on your authorization level, you may or may not have
access to a specific data collection. If you have questions about a specific
data collection or want access to a collection, contact
Data Administration.
Tables
Each data collection in the Warehouse is organized in a set of related
tables. Each table consists of data elements that describe or qualify
an item of business significance. For example, the Student Data Collection
has an ADDRESS table with data elements such as SSN (Social Security Number),
street address, etc. The collection also has a PERSON table with data
elements such as SSN, name, birth date, etc. The ADDRESS table is related
to the PERSON table through the SSN. Because the data is stored in tables,
it is easy to access just the data you need, rather than having to plow
through all the data in the data collection. A table may be a part of
more than one data collection.
Note that some data elements in tables are indexed (indexed columns are
noted in the documentation for each table). Indexing enables the system
to execute queries faster. A query with a record selection condition using
an indexed data element tells the system to go directly to the rows in
the table that contain the value indicated and to stop retrieving data
when the value is no longer found. If a query does not select records
based on an indexed data element in its record selection condition, the
system starts searching at the first row in the table and works through
every row until it reaches the last row in the table. Tables can contain
hundreds of thousands or even millions of rows (for example, one table
contains 93,000 students, about 250,000 addresses, and approximately 700,000
enrollments). Thus, queries that do not use indexed data elements for
record selection will run slowly.
Table Help. Help documentation is available
for each table in the Warehouse, accessible via a hyperlink from the table
name. Help describes the basic contents of the table. If applicable, it
also gives the following information:
Explanation. Describes the physical makeup or content of the
table.
Common uses. Describes some queries that would make use of the
table.
Primary key. Lists the data elements that are the primary keys
in the table.
Indexed data elements. Lists the data elements that are indexed
in the table. Since tables can consist of many rows, queries that include
record selection conditions based on indexed data elements provide faster
results.
Related tables. Identifies other tables that may be meaningful
to your query. That is, tables that are good candidates for containing
information that you may want to include in your results. For example,
if you are using the Enrollment table to list students in a specific course
section, you may want to use the Person table to get the students' names.
Cautions. Provides additional guidance, help, or explanation
about a table. It can also include recommendations that must be followed
to prevent poor query results.
Data Elements
The smallest unit of data that you can work with is called a data element.
A data element cannot be logically divided any further without losing
its meaning or context. Zip code, last name, and SSN are examples of data
elements that cannot be logically divided any further without becoming
meaningless. In contrast, student and address are not data elements because
they can be logically divided into more units of data.
Data Element Help
Help documentation is available for each data element in the Warehouse.
Help describes the data element and includes its indexed, format, and
not null values. If applicable, it also provides a list of valid values
for the data element. Values for data elements can be listed in alphabetical
order or in the order most frequently used.
The primary datatypes used in the Date Warehouse are CHAR (character),
DATE and NUMBER. Element formats are indicated by the datatype, length
and, for NUMBER types, precision and scale. The format of Name column
in the EMPLOYEE_GENERAL table, for example, is listed as CHAR(30), meaning
that the column is of character datatype, and holds a maximum of 30 characters.
Numeric datatypes (such as Payment_Amount in the EMPLOYEE_PAYMENT table)
have a specified precision and scale. Precision is the total maximum
length of the column, while scale represents the number of places
to the right of the decimal. For example, the format for EMPLOYEE_PAYMENT.Payment_Amount
is represented by NUMBER(9,2), meaning that the column is of numeric datatype,
with a total of 9 characters of which 2 are to the right of the decimal
point; thus, the maximum value is 9999999.99.
Summary Data
One major advantage of the Data Warehouse is that it contains data at
different levels of summarization. For example, you could retrieve data
as individual transactions or as summaries by week, by month, or by year.
Note that additional levels of summarization can be added to the Warehouse
as needed and as resources allow without impacting existing data.
Summary data are different for every data collection in the Warehouse.
For example, the Student Data Collection includes a student detail level
which consists of the basic SRS tables, and the student census level which
is a snapshot of student term activity taken at the census date (one week
after the end of the drop period). The student detail level changes daily.
The student census level, once loaded for a given term, remains static
and never changes for that term. The advantage of the student census level
is that you can run hundreds of queries on a hundred different days, and
run them ten years from now and still have the same numbers for comparison.
The census level is the data Penn uses for official enrollment statistics,
for example, for providing data to the state and federal governments.
In the General Ledger Data Collection, the SUMMARY_BALANCES table contains
budget, encumbrance, and actual balances for summary-level Accounting
Flexfields by accounting period. Balances are available for the month,
the fiscal year-to-date, and the project year-to-date.
If you need to know what summary data are available for a specific data
collection, refer to the documentation for that data collection.
Related Documents
How the Warehouse Works
Data
Warehouse: Information Under One Roof, Tad Davis, PennPrintout, April
1995, Vol 11:6
|