Penn Computing

Penn Computing

Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn

 

Tuesday, May 13, 2008

 
  Security Checklists & Policies
Secure desktop computing
Secure servers
Secure web applications
Tips for safe computing
Computing policies
 
  Email
Harassment & Forgery
Hoaxes, frauds & scams
Spam & Email relays
Encryption & digital signatures
 
  More in-depth information for
Local support providers
System administrators
Application developers
 
  Security initiatives
Critical host compliance
Authentication & authorization
Penn Security & Privacy Assessment (SPIA)
 
  Related links
Electronic privacy
PennKey
Viruses
Worms, trojans, backdoors

Searching for Sensitive Data

The unintentional disclosure of sensitive data (Social Security Numbers, credit card or student information, etc.) can result in privacy risks to individuals, and serious compliance, financial and reputational risks to the University.

In addition to manually checking your systems for sensitive data on a regular basis, you may wish to use an automated tool to help identify sensitive data on machines that are your responsibility.

Spider

The Office of Information Security currently recommends using Cornell's Spider, which is available for PC, Mac & Linux:

http://www.cit.cornell.edu/computer/security/tools/

In its default configuration, Spider uses Unix regular expressions to find SSN's as well as 15 and 16 digit credit card data. Custom expressions can also be added and the tool provides a broad array of configuration options for scanning and logging.

Prior to using Spider, it is important to be aware of the following:

  • False positives - Spider is doing simple pattern matching (e.g., XXX-XX-XXXX = SSN), which means false positives are very common. You will need to review and interpret the results critically.
  • False negatives -If SSN or credit card data is entered in a non-standard format, Spider will not identify it. For example, "SSN:123456789" is not found with Spider's default configuration. Along similar lines, remember that Spider is only looking for SSN & credit card data. You may have other types of sensitive data on the machine.
  • File Access - Unless the Linux version is used and the partition to be scanned is mounted as read-only, Spider will update the access timestamp on every file it scans. Therefore, the Windows version is recommended only for auditing purposes and the Linux version should be used exclusively for incident response.
  • Clean-up - Remember to delete the log file as soon as you are finished or you may be creating a pointer file for hackers to find sensitive data on your machine!

The Office of Information Security periodically provides training for this tool and is available to provide assistance by contacting security@isc.upenn.edu.

The Computing Support group for Penn's School of Arts and Sciences has prepared this Spider Best Practices document.

Other Tools

Other Universities have also developed sensitive data scanning tools. Although they are not currently supported, they include:

http://www.purdue.edu/securepurdue/services/scanningTools.cfm

https://source.its.utexas.edu/groups/its-iso/projects/senf/

Last updated: Wednesday, November 7, 2007

top

Information Systems and Computing
University of Pennsylvania
Comments & Questions


Penn Computing University of Pennsylvania
Information Systems and Computing, University of Pennsylvania