Penn Computing

Penn Computing

Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn

  
  UPDATES
Current Load Status
Regular Availability
  
  INFORMATION
FAQs & Tips
Password Changer
Support services
Security
About the Data Warehouse
Data Administration
  
  DATA COLLECTIONS
Advancement
Assets
BRIM
Express Mail
Facilities
Faculty
GAR
General Ledger
Infrastructure
ISSS-iOffice
Learning Management
Position Inventory
Research-PennERA Proposals
Salary Management
Student
Travel Expense Management
Tuition Distribution
Cross-Collection

What the Warehouse Contains

This document describes how data in the Warehouse is organized and explains table and data element terminology. It is subdivided as follows:
· Data Collections
· Tables
· Table Help
· Data Elements
· Data Element Help
· Summary Data

Data Collections
The Warehouse contains data from transaction systems. The information from each transaction system is referred to as a data collection. For example, information from SRS (Student Records System) is referred to as the Student Data Collection, while account balance information from the BEN Financials General Ledger is referred to as the Genera Ledger Data Collection.

Note that depending on your authorization level, you may or may not have access to a specific data collection. If you have questions about a specific data collection or want access to a collection, contact Data Administration.

Tables
Each data collection in the Warehouse is organized in a set of related tables. Each table consists of data elements that describe or qualify an item of business significance. For example, the Student Data Collection has an ADDRESS table with data elements such as SSN (Social Security Number), street address, etc. The collection also has a PERSON table with data elements such as SSN, name, birth date, etc. The ADDRESS table is related to the PERSON table through the SSN. Because the data is stored in tables, it is easy to access just the data you need, rather than having to plow through all the data in the data collection. A table may be a part of more than one data collection.

Note that some data elements in tables are indexed (indexed columns are noted in the documentation for each table). Indexing enables the system to execute queries faster. A query with a record selection condition using an indexed data element tells the system to go directly to the rows in the table that contain the value indicated and to stop retrieving data when the value is no longer found. If a query does not select records based on an indexed data element in its record selection condition, the system starts searching at the first row in the table and works through every row until it reaches the last row in the table. Tables can contain hundreds of thousands or even millions of rows (for example, one table contains 93,000 students, about 250,000 addresses, and approximately 700,000 enrollments). Thus, queries that do not use indexed data elements for record selection will run slowly.

Table Help. Help documentation is available for each table in the Warehouse, accessible via a hyperlink from the table name. Help describes the basic contents of the table. If applicable, it also gives the following information:

Explanation. Describes the physical makeup or content of the table.

Common uses. Describes some queries that would make use of the table.

Primary key. Lists the data elements that are the primary keys in the table.

Indexed data elements. Lists the data elements that are indexed in the table. Since tables can consist of many rows, queries that include record selection conditions based on indexed data elements provide faster results.

Related tables. Identifies other tables that may be meaningful to your query. That is, tables that are good candidates for containing information that you may want to include in your results. For example, if you are using the Enrollment table to list students in a specific course section, you may want to use the Person table to get the students' names.

Cautions. Provides additional guidance, help, or explanation about a table. It can also include recommendations that must be followed to prevent poor query results.

Data Elements
The smallest unit of data that you can work with is called a data element. A data element cannot be logically divided any further without losing its meaning or context. Zip code, last name, and SSN are examples of data elements that cannot be logically divided any further without becoming meaningless. In contrast, student and address are not data elements because they can be logically divided into more units of data.

Data Element Help
Help documentation is available for each data element in the Warehouse. Help describes the data element and includes its indexed, format, and not null values. If applicable, it also provides a list of valid values for the data element. Values for data elements can be listed in alphabetical order or in the order most frequently used.

The primary datatypes used in the Date Warehouse are CHAR (character), DATE and NUMBER. Element formats are indicated by the datatype, length and, for NUMBER types, precision and scale. The format of Name column in the EMPLOYEE_GENERAL table, for example, is listed as CHAR(30), meaning that the column is of character datatype, and holds a maximum of 30 characters. Numeric datatypes (such as Payment_Amount in the EMPLOYEE_PAYMENT table) have a specified precision and scale. Precision is the total maximum length of the column, while scale represents the number of places to the right of the decimal. For example, the format for EMPLOYEE_PAYMENT.Payment_Amount is represented by NUMBER(9,2), meaning that the column is of numeric datatype, with a total of 9 characters of which 2 are to the right of the decimal point; thus, the maximum value is 9999999.99.

Summary Data
One major advantage of the Data Warehouse is that it contains data at different levels of summarization. For example, you could retrieve data as individual transactions or as summaries by week, by month, or by year. Note that additional levels of summarization can be added to the Warehouse as needed and as resources allow without impacting existing data.

Summary data are different for every data collection in the Warehouse. For example, the Student Data Collection includes a student detail level which consists of the basic SRS tables, and the student census level which is a snapshot of student term activity taken at the census date (one week after the end of the drop period). The student detail level changes daily. The student census level, once loaded for a given term, remains static and never changes for that term. The advantage of the student census level is that you can run hundreds of queries on a hundred different days, and run them ten years from now and still have the same numbers for comparison. The census level is the data Penn uses for official enrollment statistics, for example, for providing data to the state and federal governments. In the General Ledger Data Collection, the SUMMARY_BALANCES table contains budget, encumbrance, and actual balances for summary-level Accounting Flexfields by accounting period. Balances are available for the month, the fiscal year-to-date, and the project year-to-date.

If you need to know what summary data are available for a specific data collection, refer to the documentation for that data collection.

Related Documents
How the Warehouse Works
Data Warehouse: Information Under One Roof, Tad Davis, PennPrintout, April 1995, Vol 11:6

top

Information Systems and Computing
University of Pennsylvania
Comments & Questions


Penn Computing University of Pennsylvania
Information Systems and Computing, University of Pennsylvania