Penn Computing

Penn Computing

Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn

 

Monday, September 8, 2008

 
  Security Checklists & Policies
Secure desktop computing
Secure servers
Secure web applications
Tips for safe computing
Computing policies
 
  Email
Harassment & Forgery
Hoaxes, frauds & scams
Spam & Email relays
Encryption & digital signatures
 
  More in-depth information for
Local support providers
System administrators
Application developers
 
  Security initiatives
Critical host compliance
Authentication & authorization
Penn Security & Privacy Assessment (SPIA)
 
  Related links
Electronic privacy
PennKey
Viruses
Worms, trojans, backdoors

Using the Spider Search Tool: Best Practices

Including Technical Recommendations for Implementation on Personal Computers Running Windows Operating Systems

Table of Contents

Introduction
Recommendations for Using the Spider Tool
How to Use the Spider Tool
Download the Spider Tool
Spider Updates

Introduction

The Spider search tool, which was developed by Cornell University, scans files searching for Social Security numbers or credit card numbers; other search patterns can also be created. Based on the scan, it produces a list of files that may contain confidential data. These files should be reviewed, and steps should be taken to protect any confidential data that is found.

The Spider tool is gaining increasing attention at Penn, in part as a result of growing scrutiny of the use and storage of Social Security numbers. The University has determined that SSN data requires strong protections, given the clear identity theft risks that are involved. Long-term efforts in this area have been underway, and a University-wide SSN policy has been adopted. The Spider tool automates the search for electronically-stored SSNs, and greatly facilitates implementation of the SSN policy.

Recommendations for Using the Spider Tool: Organizational Steps

Note: Technical best practices for using the Spider Tool are addressed in the next section.

  • Buy-in. Before undertaking any Spider scanning activities communicate with relevant leadership about the reasons for the project and the proposed work plan. (See bullet point below regarding elements of work plan.) When approval for the project is given, request that a communication be issued to let staff know that the project is about to get underway and has high-level support.
  • Team Approach. A team approach is recommended, in order to have access to the appropriate expertise and to have enough resources to complete the work within the established timeframe. For example, an appropriate team could consist of the LSP and the computer users themselves.
  • Work Plan. Develop a work plan that makes responsibilities clear and establishes reasonable deadlines. Include proposed team members, anticipated staff time requirements and any other resources required. Keep in mind that the scanning work may need to be phased, so that the volume of scanning results (the file logs) does not get ahead of the capacity to review the results and take appropriate action to secure confidential data.
    • Also, be sure that the work plan includes provision of separate notice to each computer "owner" concerning the date and time when his/her computer is scheduled to be scanned. The notice should be provided at least 3 business days in advance of the scheduled scan, and should identify a person to contact if there are questions or concerns.
  • Managing Expectations. Let the individuals participating in the project know in advance that the file logs produced by the Spider tool will include false positives and false negatives. It will take individual staff time and effort to review the files listed in the logs. (Investigate false positives; if the reasons for false positives can be identified, a regular expression or configuration rule can be crafted so that those false positives will not reappear in future scans.)
  • Identifying Computers to be Scanned. If the organizational unit is relatively small (i.e., has no more than about 25 computers), it probably makes sense to run the Spider tool on all computers in the unit. For larger organizations, it may be necessary, depending upon available resources, to run the tool on selected machines only, in the initial scanning phase. In the latter case, it is recommended that the following types of machines be given priority:
    • Central data stores
    • Mobile devices
    • Computers used by individuals who have access to sensitive data stores for which they had to receive special permission;
    • Machines used by staff who have been in their jobs the longest (who may be more likely to have stored and potentially forgotten sensitive data on their computers); and
    • Machines used by staff who are most likely to have stored sensitive data because of their job functions.
  • Security of Log Files. Running the Spider tool results in a log pointing to files that potentially contain sensitive data. This log must be very well secured since it is a roadmap to possibly confidential information.
  • Secure Deletion of Files. If files containing sensitive data are identified and deletion is desired, keep in mind that sending the files to the computer's "recycle" bin and emptying the bin will not result in secure file deletion; these files can still be retrieved by relatively simple means. For information regarding secure deletion of electronic files visit the following site: http://www.upenn.edu/privacy/specialtopics_electronicfiles.html

How to Use the Spider Tool

Spider is available for Windows, Mac, and Linux Operating Systems. We are working to produce technical guidelines for each version, and will provide details below as they become available.

Spider on Windows

  • Download the .zip archive from here. This archive includes all the tools you will need to run Spider3, including the Microsoft dotNet 2.0 framework installer, the Cornell Spider3 installer, and the Penn specific batch file for running Spider3 in accordance with our recommended best practices.
  • Install the dotNet 2.0 framework from the archive if necessary.
  • Install Spider3 and accept the default installation location.
  • Run Spider3 by double clicking the batch file included in the .zip archive.
  • Follow the instructions provided within the CMD window.

Spider Updates

(i) False Positives

Many users have reported that running the Spider scan results in an inordinately large number of false positives (that is, large numbers of files are identified as containing SSNs but do not actually have them). Based on these reports, the batch file that provides the recommended configuration for running the scan has been modified (in the above-referenced website) by removing the SSN9 regular expression, which searched for 9 consecutive digits of any kind. Running the scan using the modified batch file should reduce the number of false positives that would otherwise be produced.

It is important to note that, even with this modification, there will still be significant numbers of false positives due to limitations of the current technology.It is strongly recommended that the scan be run by IT staff, or with such staff present, so that IT staff can review the results log and point the end user toward files that would be most likely to hold SSNs.

(ii) Preserving Encryption

It is also worth noting that the above-referenced best practices batch file configures the Spider program to produce an encrypted log, and to close when the scan is completed. It is important to take advantage of this encryption feature and here is the method to do so: An encryption password must be entered before the program (i.e. scan) begins. It is possible to enter this password, start the Spider scan, and then let the scan run unattended (since the resulting log is encrypted and the program closes by itself). Be aware that after the encrypted log file is produced the user must restart the Spider program to view the log; to do this, open Spider, select File -> View Log, then browse to the location of the encrypted log file. Once the log file is selected the user must enter the above-referenced encryption password to display the decrypted scan results.

(iii) Maintaining Computer Performance

Finally, it is not advisable to run Spider while a user is logged in and using another program, since the user would experience slowed computer performance.

In summary, it is recommended that (1) Spider be run using the above-referenced updated batch file, and (2) IT staff remain very involved as the Spider scan is run and the output log is reviewed to address the matter of false positives in an effective manner and to ensure that the encryption feature is utilized throughout the process. Spider is a valuable tool but it does not completely automate the search for SSNs.

Last updated: Friday, March 14, 2008

top

Information Systems and Computing
University of Pennsylvania
Comments & Questions


Penn Computing University of Pennsylvania
Information Systems and Computing, University of Pennsylvania