University of Pennsylvania Home Page

About Penn
Admissions
Academic Programs
Research
Campus in the City
Services & Administration

Highlights for:

Prospective Students
Current Students
Alumni
Family & Friends
Faculty & Staff
Penn Home Penn A-Z Directories Calendar Maps
Advanced Search

Advanced Search

Help Internet search tools Technical information

Penn Web Search Technical Information

The Penn Web Search indexes the content on about 375 web servers across Penn's campus. Most of these are administered directly by schools and departments of the University. The central web server, www.upenn.edu, provides web sites for University of Pennsylvania schools, departments, centers, and institutes that do not have access to a web server within their school or department; information about housing a web site on www.upenn.edu is available.

The www.upenn.edu service is hosted on a pair of fully redundant Sun 40Z systems running Linux. Each has four AMD Opteron 2.6 GHz processors and 16 gigabytes of memory, and is connected to a Sun StorEdge 9970 Storage Area Network. The configuration is designed to survive the failure of any component, including the loss of an entire machine room, power grid, local network, or Internet connection

This highly redundant, survivable configuration is hosted in two data centers approximately five blocks apart. Each location houses one of the hosts and one side of the fully mirrored storage array, is served by separate power grids, and includes fully redundant power and HVAC systems. Data is replicated in real time between the two locations.

In addition to having redundant hardware we also utilize Akamai caching that will permit pages that have been cached by Akamai to be served in the event that we have a failure of both servers.

Apache 1.3.x is the web server daemon. You can search Penn's web using Google's index of the Penn Web.

We have provided information to help you to maintain your documents so that they are better indexed by Penn Web Search and other search engines.

If there are documents on your particular site that should not be indexed, your web administrator will need to maintain a robots.txt file that will prevent the central index from indexing these documents.

Example of excluding with a robots.txt file
If you had a directory called "test" where you stored pages under development and you didn't want these documents indexed, your server administrator would create a file called robots.txt in the document root directory that looks like this:
        User-agent: *
        Disallow: /test/
For more detailed information about excluding web pages from being indexed, please see the Standard for Robot Exclusion.

Penn Home Penn A-Z Directories Calendar Maps
 
Copyright © 2006, University of Pennsylvania
3451 Walnut Street, Philadelphia, PA 19104 · 215-898-5000
Copyright Information | Contact Us | Privacy Policy | Disclaimer