How the Data Warehouse Works
· Relationship of Warehouse to
Transaction Systems
· How the Warehouse is Built and Refreshed
· Important Warehouse Facts
Relationship of the Warehouse to Transaction Systems
Transaction systems handle the day-to-day functions of a business--recording,
processing, and storing operational data. Examples of transaction systems
at Penn include the Student Records System, BEN Financials General Ledger,
Online Time Reporting, and Property Management. These systems handle operations
such as enrolling a student in a course, changing a student's major, creating
journal entries, printing a paycheck, or recording the location of equipment.
Transaction systems, however were not designed to provide managers with
the information they need to conduct trend analyses or to undertake longitudinal
studies that require use of historical University data. Data Warehouses,
unlike transaction systems, are specifically designed to handle these
types of management queries and analyses.
How the Warehouse is Built and Refreshed
To build the Warehouse, data from the transaction systems was analyzed
(each piece of data evaluated for inclusion into the Warehouse), "cleaned"
(old codes converted into new codes, existing data redefined), and restructured
(entities broken apart and new entities created). The data was then summarized
and arranged in a format to support analysis and reporting. In some instances,
additional data elements were calculated to support a particular analysis.
The Warehouse is refreshed periodically, using the transaction systems
as its source of data. That is, programs extract data from a transaction
system and translate it into Warehouse format. The formatted files are
than transferred and loaded into the Warehouse. The data is than available
for analyses and reports.
Note that each data collection has its own refresh cycle governing when
its data is periodically updated in the Warehouse. For example, one data
collection may be refreshed daily, while others may be refreshed hourly
or once a term. Thus, data in the Warehouse reflects the most recent refresh
cycles for the data collections. Recipients of reports from the Data Warehouse
need to be aware of this "delay" in Warehouse data.
Refresh schedule information is noted in the documentation for each collection.
Refresh cycles are subject to change; please refer to the Warehouse updates page for schedule changes and announcements.
Important Warehouse Facts
The following are important facts that you should know about the University
Data Warehouse:
- The refresh cycle varies for each data collection in the
Warehouse. Do not execute queries against a collection in the Data Warehouse
when it is being refreshed. Your queries will not be completed because
the refresh process involves dropping index pointers from tables.
- As previously mentioned, there is a "data delay" between the Warehouse
and the transaction system. Keep this time frame in mind when comparing
data between the two systems or when analyzing query results--it may
help you to understand and account for any data discrepancies found.
Always indicate or communicate the date on which your Warehouse results
were obtained and identify the Warehouse, and not the transaction system,
as the source of the information. For example, include the date of the
query in the report header and the source of the information, the University
Data Warehouse, in the report footer.
- Some University data are more dynamic--change more frequently--than
other data or are subject to change at specific times during the fiscal
year. Take this into consideration when writing queries against the
Warehouse. Examples of data that change are course enrollments during
the add/drop period and student grades during the grading period.
- Data in the Warehouse is for queries only. That is, you cannot add,
change, or delete data in the Warehouse. These activities are done only
in the transaction system.
|