University of Pennsylvania Home Page

About Penn
Academic Programs
Campus in the City
Services & Administration

Highlights for:

Prospective Students
Current Students
Family & Friends
Faculty & Staff
Penn Home Penn A-Z Directories Calendar Maps
Advanced Search

Advanced Search

Help Internet search tools Technical information

Penn Web Search Technical Information

The Penn Web Search indexes the content on about 375 web servers across Penn's campus. Most of these are administered directly by schools and departments of the University. The central web server,, provides web sites for University of Pennsylvania schools, departments, centers, and institutes that do not have access to a web server within their school or department; information about housing a web site on is available.

The service is hosted on a pair of fully redundant Compaq AlphaServer DS20E systems running Tru64 UNIX 5.1. Each has dual Alpha 21264 667 MHz processors and two gigabytes of memory, and is connected to a Compaq StorageWorks EMA12000 Fibre Channel RAID System. The configuration is designed to survive the failure of any component, including the loss of an entire machine room, power grid, local network, or Internet connection.

This highly redundant, survivable configuration is hosted in two data centers approximately five blocks apart. Each location houses one of the hosts and one side of the fully mirrored storage array, is served by separate power grids, and includes fully redundant power and HVAC systems. Data is replicated in real time between the two locations.

Apache 1.3.26 is the web server daemon. You can search Penn's web using two different tools: the Simple Search, based on Google's index of the Penn Web, or the Advanced Search, using AltaVista Search INTRANET 2.3A.

The Advanced Penn Web Search starts indexing from the central web server homepage and follows links down until it covers all of the web servers running within the domain. The index is replaced every two weeks. For the installation date of the current index, see We have provided information to help you to maintain your documents so that they are better indexed by Penn Web Search and other search engines.

A site will be automatically included in the central Advanced Search index if there is a link to the site's top level page somewhere within the hierarchy of links. Approximately 250,000 pages are being indexed. If your site (or server) is not being indexed by the central index, it may not be linked from anywhere in the hierarchy, and we will have to explicitly include your site in the central index. Please ask your web administrator to send mail to giving us the starting URL of your web site.

If there are documents on your particular site that should not be indexed, your web administrator will need to maintain a robots.txt file that will prevent the central index from indexing these documents.

Example of excluding with a robots.txt file
If you had a directory called "test" where you stored pages under development and you didn't want these documents indexed, your server administrator would create a file called robots.txt in the document root directory that looks like this:
        User-agent: *
        Disallow: /test/
For more detailed information about excluding web pages from being indexed, please see the Standard for Robot Exclusion.

Penn Home Penn A-Z Directories Calendar Maps
Copyright © 2006, University of Pennsylvania
3451 Walnut Street, Philadelphia, PA 19104 · 215-898-5000
Copyright Information | Contact Us | Privacy Policy | Disclaimer