Penn Computing
Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn
PENN WEB TEAM - Search Announcement for Web Administrators

After evaluation and testing, the Penn Web Team has chosen AltaVista as Penn's central search engine and purchased a single server site license. The AltaVista site license allows the software to reside on one central server but it can index any number of servers within the Penn domain and can support unlimited access to the indexes created by that software.

Penn's central search index powered by AltaVista will be available on www.upenn.edu as part of the new Penn Web pages that will be available for preview throughout the months of August and September. The new Penn Web and the AltaVista search will officially debut on October 1, 1997. Documentation on how web administrators can create a customized search screen that will limit searches to pages on a particular server will be available.

Coordination will be needed if Penn is to have a central search engine that indexes all public web documents on campus. Since Altavista is a "crawler" that starts at a given URL and then follows the subsequent hierarchy of links from that URL, your site will be automatically included in the central search index if there is a link to your site's top level page somewhere within that hierarchy. If you are not sure that your site will be included in the central index, please send your top level URL to www-help@isc.upenn.edu

If there are documents on your particular server that should not be indexed, you will need to maintain a robots.txt file that will prevent the software from indexing the documents.

We would appreciate your creating your robots.txt file before July 31, 1997. At that time, we will start to build the central search index in earnest.

Example of excluding with a robots.txt file
If you had a directory called "stats" where you store the statistics of usage of your web server and you didn't want these documents indexed, you would create a file called robots.txt in your document root directory that looks like this:
        User-agent: *
        Disallow: /stats/
For more detailed information about excluding web pages from being indexed, please see the Standard for Robot Exclusion.

The AltaVista license will also allow the administrator of any server other than the central server to purchase a limited indexing copy of AltaVista that will index all of the documents on that server only. The price for this software is $1000 per server. For more information about purchasing a copy of this software, contact www-help@isc.upenn.edu.

In addition to the central search index, the separate subject search indexes currently available on the Penn Web will also be available under AltaVista and will debut throughout the fall.. Penn will continue to evaluate the need for these or other separate indexes.

top

Information Systems and Computing
University of Pennsylvania
Comments & Questions


University of Pennsylvania Penn Computing University of Pennsylvania Information Systems & Computing (ISC)
Information Systems and Computing, University of Pennsylvania