Report to the Data Policy Committee
on Data Access Strategy
It is our wish to create a query environment which utilizes a common base of data
and which supports easy access by non-technical users. This environment needs
to support query access to operational as well as strategic analysis, planning,
and decision making. In order to support both the University's data access requirements
and its operational needs, the query and operationa environments must be seperate.
The Decision Support Environment (DSE) will be comprised of a common base of data
organized in a relational database, a data dictionary/repository, and a variety
of tools to support query , analysis, and reporting. The long term audience for
the DSE will be any end user who needs information. This would include support
staff, managers, executive managers, and information analysts. DSE is a critical
component of the overall Cornerstone effort. It's implementation will be based
upon the University's Enterprise Data Model and will become a portion of the common
base of data.
As the University implements new administrative systems over time, better information
will become available and access to that information will become easier. In light
of the current financial constraints and management information needs the implemantation
of the DSE cannot wait. It is clear that we must begin to construct the DSE concurrent
with the implementation of the new administrative systems and that the end result
must be fully compatible with the future administrative operational environment.
Therefore, a strategy to implement the DSE must allow a progressive implementation
of that environment, occurring in tandem with the implementation of a new operational
environment and initially utilizing existing operational data and new data as
it becomes available. This effort will set the strategy for future Cornerstone
query initiatives and will define the look and feel for unstructured query access
to operational data as well as data in the Decision Support Environment. A suggested
implementation model for the DSE is depicted in Figure 1; this model is described
in more detail in the Appendix.
A progressive implementation of the DSE will allow some flexibility in modifying
and adapting the implementation strategy as the new operational environmentestablishes
itself. Progressive implementation of the DSE will occur on four levels:
- Data Availability
- At the outset, the DSE will contain only selected logical subsets of the
University's administrative data. Additional subsets will be made available
over time until all relevant University administrative data, both centrally
and locally managed, and any required data from sources external to the University
have been addressed.
- Level of Data Aggregation
- Eventually, the DSE will contain multiple collections of data which are
characterized by varying degrees of granularity or detail. These collections
will range from data which is basically a time-variant snapshot of selected
operational detail, both current and historic, to data which may be combined
with data from external sources and then synthesized, aggregated, or otherwise
"packaged" for analytical purposes. All collections need not be implemented
immediately. In the initial implementation, only the collections of fairly
coarse granularity, i.e., detailed or lightly summarized data, will be addressed;
additional collections with finer granularity (higher level of aggregation)
can be added later.
- The DSE will need to include a variety of tools to support query, analysis
and reporting by users who a) have varying levels of technical expertise;
b) have varying needs for data access (e.g., simple query lists, complicated
query/ statistical modeling, etc.); and c) will be operating from a variety
of platforms. While some of the current end-user toolset, such as SAS and
Focus, may satisfy some needs, these are clearly insufficient in for the full
range of access that the DSE is envisioned to support. The new tools must
be enable easy access by users with little or no technical expertise and must
support multi-dimensional views of data. Eventually the environment will also
include Executive Information System applications. At the outset, however,
only those tools, both desktop and server, which most closely match the needs
of the initial customer base will be required; additional tools and applications
can be added over time.
- Once fully implemented a data repository will be used to store and to make
metadata (data about data) available to those who need it. The repository
will be used during data and systems analysis for storage of models, definitions
and business rules, used in generating systems and databases, and made available
to end users for understanding and locating data. In addition to the obvious
need for standards for repository use and metadata creation, implemmentation
of a repository will also require an easy-to-use application and query tools
to support access to metadata. The initial implementation of the repository
will focus on support for the query environment and, therefore, will be limited
to the standards and accessibility requirements needed to support the first
phase of DSE.
The DSE can provide the University with the opportunity to make data accessible
from legacy systems, bridging the gap until the new systems and data are in place.
Currently, accessing data often requires a technologist and our current levels
of technical support across the campus cannot meet the demand. The problem is
further complicated when data is inaccurate, inconsistent or unavailable. The
DSE can help solve these problems by:
- Making data more accessible to non-technical users with desktop tools that
will integrate with their current environment;
- providing historical data which the operational systems are currently unable
to provide, designed in a manner to support longitudinal analysis of data;
- providing appropriate security and privacy rules so that previously secured
data can be provided to a greater number of people;
- integrating data from seperate operational systems;
- providing a mechanisim for identifying and correcting inaccuracies in operational
- providing clear and accessible documentation about data; and
- reducing the need for seperate local systems which are kept for reporting
Given the sheer volume of administrative data combined with the fact that the
administrative base of data will be something of a "moving target" over the next
several years, a progressive implementation of strategy suggests the selection
of a pilot project to identify a set of administrative data that is currently
available and provides utility to end users. To provide a reasonable chance for
success, such a pilot should:
- satisfy an already identified set of information needs;
- be reasonably limited in scope with respect to data entities;
- be based on a relatively stable set of administrative data ( that is, the
existing data is not likely to be the first set of data addressed in the implementation
of the new administrative systems;
- have participants who are knowledgeable, both technically and in the subject
area, and who are available for work on the project team; and
- be delivered to customers who are comfortable in using desktop software
and knowledgeable about the subject data area.
For these reasons, the data access working group makes the following proposal:
- Subject Data Areas
- The proposed pilot will be for the following subject areas with detailed
attributes, aggregation levels and historical data requirements to be determined
during the requirements specification phase of the pilot.
- Student data (including student biographic/ demographic and registration
data) which resides on the Student Record System.
- Course data which resides in the Student Records System.
- Faculty data which currently resides in the Payroll system and local
database within the Office of the Provost.
- Grant data which currently resides in the ORA database.
- Sponsor dat which also resides in the ORA database.
The combination of these subject areas will provide data to support departmental
teaching analysis, research activity, listings of current students, and grade
analysis, etc. Many of these things were identified in the interviews with
school and departmental personnel conducted through the Cornerstone project.
In addition this will satisfy a large number of student data requirements
of departments for such things as advising, grant acquisition and review,
and student distributions.
- The DSE pilot will be directed towards school and departmental personnel
who have non-technical positions, routinely need to gather information and
have some experience with desktop hardware and software. In addition, it will
support the continuing needs of information analysts.
- The pilot of the DSE will need to have tools to support simple and complex
queries and the integration of extracted data with other desktop software
such as spreadsheets and word processors. In addition, more powerful server
based tools will be needed for statistical analysis. The identification, development
and support of these tools will be done in conjunction with the other Cornerstone
initiatives. The desktop environment will be consistent with the standards
set by the Cornerstone technical architecture.
- Database and System Software
- The pilot DSE will be deployed in the relational database to be acquired
through the Cornerstone RFP process and will be the same database that the
new financial application will utilize. The system software will be UNIX based
on the platform will be chosen based upon the criteria set during the requirements
specification phase of the pilot.
- This pilot will be the starting point for the data repository which will
be used for the storage and provision of data documentation (metadata). The
extent to which we deploy the repository will be determined during the requirements
specification phase of the pilot.
- Establish a team with support from across the University, whose members
must possess a thorough understanding of the subject data areas, the needs
of the audience, and the technologu.
- Begin the Requirements Specification for the data, the tools, and the repository.
- Gather detailed data requiremnets including attributes, aggregation level
and historical data. This will require focus groups with school and departmental
- Coordinate our efforst with other Cornerstone activities to leverage resources
in some of the data definition areas (i.e. organizational data) and tool acquisition.
Issues for the Committee
- Is this the correct target for data and audience?
- How will we get the neccessary resources with representation from across
the University? We will need people to help on an individual as well as collective
basis. SAS, Institutional Planning, Wharton Computing, Data Administration,
and UMIS have agreed to provide resources.
- Who are the appropriate people for the data requirements and how do we
make contact with them?
- Some work has been done in defining the data, however it does not include
the clinical perspective. Is there interest in defining the clinical data?
If so how should we organize defining it?
- Is there a need to phase pieces of this pilot; i.e. subject areas?
- What is an appropriate implementation time frame?
- Other issues?
[ Data Administration ]