
Searching for Sensitive Data
The unintentional disclosure of sensitive data (Social Security Numbers,
credit card or student information, etc.) can result in privacy risks to
individuals, and serious compliance, financial and reputational risks to the
University.
In addition to manually checking your systems for sensitive data on a regular
basis, you may wish to use an automated tool to help identify sensitive data on
machines that are your responsibility.
Spider
The Office of Information Security currently recommends using Cornell's Spider, which is available for PC, Mac & Linux:
http://www.cit.cornell.edu/computer/security/tools/
In its default configuration, Spider uses Unix regular expressions to find
SSN's as well as 15 and 16 digit credit card data. Custom expressions can also be added
and the tool provides a broad array of configuration options for scanning and
logging.
Prior to using Spider, it is important to be aware of the following:
- False positives - Spider is doing simple pattern matching (e.g., XXX-XX-XXXX
= SSN), which means false positives are very common. You will need to review and
interpret the results critically.
- False negatives -If SSN or credit card data is entered in a non-standard
format, Spider will not identify it. For example, "SSN:123456789" is not found
with Spider's default configuration. Along similar lines, remember that Spider
is only looking for SSN & credit card data. You may have other types of
sensitive data on the machine.
- File Access - Unless the Linux version is used and the partition to be
scanned is mounted as read-only, Spider will update the access timestamp on
every file it scans. Therefore, the Windows version is recommended only for
auditing purposes and the Linux version should be used exclusively for incident
response.
- Clean-up - Remember to delete the log file as soon as you are finished or
you may be creating a pointer file for hackers to find sensitive data on your
machine!
The Office of Information Security periodically provides training for this
tool and is available to provide assistance by contacting
security@isc.upenn.edu.
The Computing Support group for Penn's School of Arts and Sciences has prepared this Spider Best Practices document.
Other Tools
Other Universities have also developed sensitive data scanning tools.
Although they are not currently supported, they include:
http://www.purdue.edu/securepurdue/services/scanningTools.cfm
https://source.its.utexas.edu/groups/its-iso/projects/senf/
Last updated: Wednesday, November 7, 2007
|