Penn Community
Phase I Deliverables and General Concepts
PHASE I DELIVERABLES
SRS and U. of PA Payroll data feeds (essentially the same as current feeds to PennCard
with some modifications
A new associate payroll feed. The first user is intended to be one or more or the
UPHS (University of PA Health System) systems. HUP has been identified as a pilot.
A Penn Community data model and database
Database views
A cleanup of PennCard data
A program to initially load Penn Community tables from PennCard data
Processes to load Penn Community tables from the SRS, U. or PA Payroll and associate
Payroll feeds (these are all sources of 'persistent' data)
A program to batch load non-persistent (short lived) data to Penn Community.
Processes to update PennCard tables from Penn Community whenever Penn Community
is updated. This includes:
From the three data feeds
From the online data entry application.
From the SSN change process
From conflicts being resolved
From cleanup programs
An SSN change process for Penn Community to replace the one currently in PennCard.
A PINs table with unique PINs including an expiration date field to be set when
PINs are distributed.
A conflict resolution application
A web-based data-entry and view application.
Cleanup and audit programs for Penn Community data
An audit log to track PennName changes. There is currently no reporting or distribution
process planned.
Modifications to ISC/Networking's Email bulk load processes to eliminate the creation
of 'DG' rows. This change will be made by ISC/Networking
GENERAL CONCEPTS
Penn-ids will be always be generated from processes that insert rows to Penn Community
tables and never from from processes that insert rows to Penn Card tables.
The database will include a central 'member' table with one row per penn-id and
an 'affiliation' table with a one to many relationship with the member table allowing multiple affiliations per
member to be stored.
DATA
Penn Community will store the following information:
All denigraphic fields currently stored in PennCard
New data supplied by associate Payroll feeds.
A few new fields from the SRS system to support PIN distribution.
Currently the two data feeds to PennCard supply only one affiliation per person
at a time. This will continue to be the case after Phase I of Penn Community. If a need is determined during subsequent
phases the feeds could be modified to supply more than one affiliation per person at a time.
AFFILIATION TYPES
A distinction between ‘persistent’ and ‘non-persistent’ affiliations will be made.
Persistent affiliations are required to have SSN, birth-date, first name and last name and an attempt will be made
to match the data with existing records. Non-persistent affiliations only need the name fields and no attempt will
be made to match with existing rows.
Persistent affiliations will be supplied by the Payroll, SRS feeds and the new
associate payroll batch feeds and by the online data-entry application.
Non-persistent affiliations to be supplied by a batch load process and by the online
data-entry application
All non-persistent affiliations and all affiliations entered online will expire
after a certain time that will be specified in a reference table.
SSN ISSUES
SSN will be a unique field on the ‘members’ table and a non-unique field on the
affiliations table.
A distinction will be made between USA SSNs and Penn-assigned-SSNs (those generated
by University systems from pre-assigned ranges).
Penn-assigned-SSNs will be stored on the affilation table and USA SSNs will be
stored on the member table.
A given penn-id can have only one USA SSN but multiple Penn-assigned-SSNs (because
the systems generating the Penn-assigned-SSNs each have their own pre-assigned range)
An SSN-change process will be written to replace the current one in the PennCard
system.
DATA LOADS AND MATCHING
An algorithm will be used to attempt to match persistent affiliations with existing
rows to determine if the data represents the same person. The algorithm will be use the first and last names and
birth date and will match using the Oracle 'soundx' function which finds names that sound alike.
Multiple data sources do not have to have consistent match information per SSN.
For example a person can have SSN 123456789 in both PayRoll and SRS with the name ‘Joe Smith’ in one and ‘Joseph
Smith’ in the other
Records will be put in suspense if they would create any kind of ‘bad’ data or
are incomplete. For instance:
An incoming USA SSN is new but has the same match data as an existing row
An incoming USA SSN matches an existing row but has the same match data as a different
row
An incoming record for a persistent affiliation doesn’t have all its required fields
Once a conflict has been resolved once and determined to not really be a conflict
there will be flags set to let future data be processed without creating new conflicts. For example, if we have
a ‘Joseph Smith’ in Payroll with SSN 100000001 and birth date 1/1/1980 and a ‘Joe Smith’ in SRS with SSN 100000002
and birth date 1/1/1980 and they really are different people we would flag both penn-ids as not conflicting with
each other so that future updates won’t create conflicts.