Penn Computing
Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn

Report to the Data Policy Committee
on Data Access Strategy


It is our wish to create a query environment which utilizes a common base of data and which supports easy access by non-technical users. This environment needs to support query access to operational as well as strategic analysis, planning, and decision making. In order to support both the University's data access requirements and its operational needs, the query and operationa environments must be seperate. The Decision Support Environment (DSE) will be comprised of a common base of data organized in a relational database, a data dictionary/repository, and a variety of tools to support query , analysis, and reporting. The long term audience for the DSE will be any end user who needs information. This would include support staff, managers, executive managers, and information analysts. DSE is a critical component of the overall Cornerstone effort. It's implementation will be based upon the University's Enterprise Data Model and will become a portion of the common base of data.

Strategy


As the University implements new administrative systems over time, better information will become available and access to that information will become easier. In light of the current financial constraints and management information needs the implemantation of the DSE cannot wait. It is clear that we must begin to construct the DSE concurrent with the implementation of the new administrative systems and that the end result must be fully compatible with the future administrative operational environment.
Therefore, a strategy to implement the DSE must allow a progressive implementation of that environment, occurring in tandem with the implementation of a new operational environment and initially utilizing existing operational data and new data as it becomes available. This effort will set the strategy for future Cornerstone query initiatives and will define the look and feel for unstructured query access to operational data as well as data in the Decision Support Environment. A suggested implementation model for the DSE is depicted in Figure 1; this model is described in more detail in the Appendix.
A progressive implementation of the DSE will allow some flexibility in modifying and adapting the implementation strategy as the new operational environmentestablishes itself. Progressive implementation of the DSE will occur on four levels:
Data Availability
At the outset, the DSE will contain only selected logical subsets of the University's administrative data. Additional subsets will be made available over time until all relevant University administrative data, both centrally and locally managed, and any required data from sources external to the University have been addressed.
Level of Data Aggregation
Eventually, the DSE will contain multiple collections of data which are characterized by varying degrees of granularity or detail. These collections will range from data which is basically a time-variant snapshot of selected operational detail, both current and historic, to data which may be combined with data from external sources and then synthesized, aggregated, or otherwise "packaged" for analytical purposes. All collections need not be implemented immediately. In the initial implementation, only the collections of fairly coarse granularity, i.e., detailed or lightly summarized data, will be addressed; additional collections with finer granularity (higher level of aggregation) can be added later.
Tools
The DSE will need to include a variety of tools to support query, analysis and reporting by users who a) have varying levels of technical expertise; b) have varying needs for data access (e.g., simple query lists, complicated query/ statistical modeling, etc.); and c) will be operating from a variety of platforms. While some of the current end-user toolset, such as SAS and Focus, may satisfy some needs, these are clearly insufficient in for the full range of access that the DSE is envisioned to support. The new tools must be enable easy access by users with little or no technical expertise and must support multi-dimensional views of data. Eventually the environment will also include Executive Information System applications. At the outset, however, only those tools, both desktop and server, which most closely match the needs of the initial customer base will be required; additional tools and applications can be added over time.
Repository
Once fully implemented a data repository will be used to store and to make metadata (data about data) available to those who need it. The repository will be used during data and systems analysis for storage of models, definitions and business rules, used in generating systems and databases, and made available to end users for understanding and locating data. In addition to the obvious need for standards for repository use and metadata creation, implemmentation of a repository will also require an easy-to-use application and query tools to support access to metadata. The initial implementation of the repository will focus on support for the query environment and, therefore, will be limited to the standards and accessibility requirements needed to support the first phase of DSE.

Benefits


The DSE can provide the University with the opportunity to make data accessible from legacy systems, bridging the gap until the new systems and data are in place. Currently, accessing data often requires a technologist and our current levels of technical support across the campus cannot meet the demand. The problem is further complicated when data is inaccurate, inconsistent or unavailable. The DSE can help solve these problems by:
  • Making data more accessible to non-technical users with desktop tools that will integrate with their current environment;
  • providing historical data which the operational systems are currently unable to provide, designed in a manner to support longitudinal analysis of data;
  • providing appropriate security and privacy rules so that previously secured data can be provided to a greater number of people;
  • integrating data from seperate operational systems;
  • providing a mechanisim for identifying and correcting inaccuracies in operational data;
  • providing clear and accessible documentation about data; and
  • reducing the need for seperate local systems which are kept for reporting purposes.

Pilot Proposal


Given the sheer volume of administrative data combined with the fact that the administrative base of data will be something of a "moving target" over the next several years, a progressive implementation of strategy suggests the selection of a pilot project to identify a set of administrative data that is currently available and provides utility to end users. To provide a reasonable chance for success, such a pilot should:
  • satisfy an already identified set of information needs;
  • be reasonably limited in scope with respect to data entities;
  • be based on a relatively stable set of administrative data ( that is, the existing data is not likely to be the first set of data addressed in the implementation of the new administrative systems;
  • have participants who are knowledgeable, both technically and in the subject area, and who are available for work on the project team; and
  • be delivered to customers who are comfortable in using desktop software and knowledgeable about the subject data area.

For these reasons, the data access working group makes the following proposal:
Subject Data Areas
The proposed pilot will be for the following subject areas with detailed attributes, aggregation levels and historical data requirements to be determined during the requirements specification phase of the pilot.
  • Student data (including student biographic/ demographic and registration data) which resides on the Student Record System.
  • Course data which resides in the Student Records System.
  • Faculty data which currently resides in the Payroll system and local database within the Office of the Provost.
  • Grant data which currently resides in the ORA database.
  • Sponsor dat which also resides in the ORA database.

The combination of these subject areas will provide data to support departmental teaching analysis, research activity, listings of current students, and grade analysis, etc. Many of these things were identified in the interviews with school and departmental personnel conducted through the Cornerstone project. In addition this will satisfy a large number of student data requirements of departments for such things as advising, grant acquisition and review, and student distributions.
Audience
The DSE pilot will be directed towards school and departmental personnel who have non-technical positions, routinely need to gather information and have some experience with desktop hardware and software. In addition, it will support the continuing needs of information analysts.
Tools
The pilot of the DSE will need to have tools to support simple and complex queries and the integration of extracted data with other desktop software such as spreadsheets and word processors. In addition, more powerful server based tools will be needed for statistical analysis. The identification, development and support of these tools will be done in conjunction with the other Cornerstone initiatives. The desktop environment will be consistent with the standards set by the Cornerstone technical architecture.
Database and System Software
The pilot DSE will be deployed in the relational database to be acquired through the Cornerstone RFP process and will be the same database that the new financial application will utilize. The system software will be UNIX based on the platform will be chosen based upon the criteria set during the requirements specification phase of the pilot.
Repository
This pilot will be the starting point for the data repository which will be used for the storage and provision of data documentation (metadata). The extent to which we deploy the repository will be determined during the requirements specification phase of the pilot.

Next Steps


  • Establish a team with support from across the University, whose members must possess a thorough understanding of the subject data areas, the needs of the audience, and the technologu.
  • Begin the Requirements Specification for the data, the tools, and the repository.
  • Gather detailed data requiremnets including attributes, aggregation level and historical data. This will require focus groups with school and departmental personnel.
  • Coordinate our efforst with other Cornerstone activities to leverage resources in some of the data definition areas (i.e. organizational data) and tool acquisition.

Issues for the Committee


  • Is this the correct target for data and audience?
  • How will we get the neccessary resources with representation from across the University? We will need people to help on an individual as well as collective basis. SAS, Institutional Planning, Wharton Computing, Data Administration, and UMIS have agreed to provide resources.
  • Who are the appropriate people for the data requirements and how do we make contact with them?
  • Some work has been done in defining the data, however it does not include the clinical perspective. Is there interest in defining the clinical data? If so how should we organize defining it?
  • Is there a need to phase pieces of this pilot; i.e. subject areas?
  • What is an appropriate implementation time frame?
  • Other issues?

[ Data Administration ]
top

Information Systems and Computing
University of Pennsylvania
Comments & Questions


University of Pennsylvania Penn Computing University of Pennsylvania Information Systems & Computing (ISC)
Information Systems and Computing, University of Pennsylvania