University of Pennsylvania Home Page

About Penn
Admissions
Academic Programs
Research
Campus in the City
Services & Administration

Highlights for:

Prospective Students
Current Students
Alumni
Family & Friends
Faculty & Staff
Penn Home Penn A-Z Directories Calendar Maps
Advanced Search

Advanced Search

Help Internet search tools Technical information

Searching the Penn Web


Simple and Advanced Search

You can search Penn's web using two different tools: the Simple Search (based on Google's index of the Penn Web) or the Advanced Search (using AltaVista Intranet). Some kinds of searches, such as those for the home pages of organizations, may yield more relevant results using the Simple Search. Complex searches for documents or topical information may benefit from the advanced query features of the AltaVista Advanced Search. Although Simple Search is updated once a month, please note that the index is maintained by Google on its own site; we cannot guarantee its comprehensiveness or update frequency. Should Penn's Internet trunk or the Google site itself be unavailable for any reason, your query will be forwarded automatically to Advanced Search, which is locally maintained. For help using Simple Search, consult Search Tips on the Google site.


Overview of Penn Web's Advanced Search

The Penn Web Advanced Search is based on AltaVista Intranet search software, a variant of the AltaVista software for searching the entire Internet. When you initiate a Penn Web Advanced search, AltaVista Intranet searches the current index to the Penn Web, a very large index of the pages on more than 300 Web servers in the upenn.edu domain. The current index is not updated dynamically, as Web pages are added, deleted, or changed. Instead, a new index is compiled in the background and installed every two weeks, on the first and third Sunday of each month. See http://www.upenn.edu/search/updates.html for the date the current index was installed.

To create a new Penn Web Advanced Search index, AltaVista's web crawler module starts at the Penn home page and follows hyperlinks throughout the upenn.edu domain. Every document in the upenn.edu domain that is linked from another document that the crawler was able to index is included in the index. That means that several hundred thousand documents are indexed and retrievable. Depending on when in the indexing cycle a new document is linked or an outdated document is removed, it can take anywhere from a few days to about two weeks for the document to appear in, or disappear from, the Penn Web Advanced Search index.

The only documents that are not included in the current index are:

  • those excluded from indexing by the administration of the servers on which the documents are housed
  • password-protected documents
  • those put onto a Web server and linked after the web crawler visited the server as it was compiling the current index

Because of the timing of Penn Web Advanced Search index updates, it's possible that a Penn document you can't find using the Advanced Search has already been indexed by the Internet version of AltaVista. You can automatically retry a query on the AltaVista site by clicking on the "Try your search for ..." link at the bottom of each Penn Web Advanced Search results page.


Special AltaVista searches

In addition to the main Penn Web Advanced search, there are AltaVista Intranet-based searches available for several subsections of the Penn Web.

The Publications Search indexes the Almanac, The Penn Current and The Pennsylvania Gazette web sites. In addition to the standard AltaVista search functions, the Publications search allows users to request only current or archival materials from the publications it indexes. Search results indicate whether a returned page is current or archival. In this search, "current material" includes articles from the most recent issues of all publications, plus policy-related items from back issues of Almanac that have not been superseded by more recent material. The index for this search is updated whenever a new issue is added. A detailed technical explanation of the Publications search is available.

Other AltaVista Intranet-based Penn Web subsite searches include the Computing Web Search and the University Archives and Records Center Search. Unlike the PennWeb Advanced Search index, which is updated every two weeks, these search indexes are updated monthly. Since they are relatively small, they can be compiled and installed on the same day. The Computing Web Search is updated on the second Sunday of each month. The University Archives and Records Center Search is updated on the third Sunday of each month. See http://www.upenn.edu/search/updates.html for the dates of the last update.

Because the Penn Web Advanced Search index is updated more frequently than the special indexes, it's possible that a document you can't find using one of the special searches is already available in the Penn Web Advanced Search. You can automatically retry a special search query using the Penn Web Advanced Search by clicking on the "Try your search ..." link at the bottom of each special search results page.


Advanced Search basics

All AltaVista search functions use the same rules regarding phrasing, case sensitivity, and finding related words.

How a word is defined

AltaVista Search defines a word as any string of letters and digits that is separated by either:

  • White space -- such as spaces, tabs, line ends, or the start or end of a document
  • Special characters and punctuation -- such as %, $, /, #, and _

Example: AltaVista Search interprets and indexes HAL5000, 60258, www, http, and EasierSaidThanDone all as single words, because they are continuous strings of characters, surrounded by characters that are neither letters nor digits. AltaVista Search indexes all words that it finds on a web page, regardless of whether the word exists in a dictionary or is spelled correctly.

How to search for phrases using quotation marks

You can search for phrases or groups of related words that appear next to each other. To indicate a phrase in a search query, put quotation marks around the words. Using quotation marks tells AltaVista Search to find the words together, instead of looking for separate instances of each word individually. You can also use punctuation to indicate phrases.

Example: To look for the phrase penn reading project, type "penn reading project"

If you did not use quotation marks, AltaVista Search would find instances of "penn" alone, "reading" alone and "project" alone, as well as any instances where the three words happen to appear together. Enclosing the words in double quotes indicates that you want to find only instances of all three words together.

How punctuation and special characters are interpreted

AltaVista Search ignores punctuation except to interpret it as a separator for words. Placing punctuation or special characters between each word, with no spaces between the characters and the words, is also a way to indicate a phrase. As an example of when punctuation might be useful in indicating a phrase, consider searching for a telephone number. Entering 1-800-555-1212 is easier than entering "1 800 555 1212", which is an equally acceptable syntax, but is less natural. Hyphenated words, such as CD-ROM, also automatically form a phrase because of the hyphen.

Normally, however, using quotation marks to indicate a phrase is recommended over the use of punctuation between words, because some special characters have additional meaning:

  • In AltaVista searches, you can use the asterisk (*) as a wildcard indicating that you want to find all words containing a match for the specified pattern of letters.

    Case-sensitive searches

    A search in AltaVista may be case sensitive depending on whether you type the query in quotation marks or not.

    • A query in all lowercase letters will not result in a case-insensitive search.

      Example: Typing reading in the query field will find all occurrences of the word reading, including those spelled REading, READING, reading.

    • Example: Typing "Reading" in the query field will find all occurrences of the word Reading.

    Multinational characters

    AltaVista Search supports exact-match searches for characters in the ISO Latin-1 character set. That is, you can enter a word containing an accent or other diacritical mark, and AltaVista Search will find only documents with the accented spelling of the word.

    For example, if you search for the French word éléphant, AltaVista Search will find only documents containing an exact match for the French spelling of the word.

    Entering a word with mixed case and an accent, (for example, Éléphant) would produce only results that match the word in terms of both case and accent.

    If you omit accents and other diacritical marks from a search query, AltaVista Search finds documents containing words both with and without the special marks. Although this feature might produce some irrelevant results for users doing an English language search, it enables users to enter queries for non-English words even when they do not have international support on their keyboard.

    To support searching for special characters without their diacritical marks, AltaVista Search makes a mapping to the closest possible plain character or combination of characters. The software then indexes words in both forms: with special characters as they appear, and also with special characters replaced by the mappings. The following table illustrates the special characters and their mappings:

    Character(s)MappingCharacter(s)Mapping
    ÆAEæ ae
    Á Â À Å Ã ÄAá â à å ã äa
    ÇCç c
    Ð Dð d
    É Ê È Ë Eé ê è ë e
    Í Î Ì Ï Ií î ì ï i
    Ñ Nñ n
    Ó Ô Ò Ø Õ ÖOó ô ò ø õ ö o
    Þ THþ th
    Ú Û Ù Ü Uú û ù ü u
    Ý Yý ÿy
    ßss

    Finding related words (wildcards and truncation)

    Wildcard searching (truncation) is convenient for finding derivatives and spelling variants of the same word.

    1. Use the asterisk wildcard notation ( * ) to search for a group of words that contain the same pattern.

      Example: To look for the word sing and any derivatives, such as singer, singers, and singing, type sing* in the query field. Searching for the stem cantalo* will produce matches for cantaloup, cantaloupe, cantalope, and their plurals.

    2. Specify at least three letters in front of the asterisk *. (This is required in order to limit extraneous searching.)

    3. The asterisk * matches only lowercase characters (not capital letters or digits), and interchanges with a maximum of five letters.

    4. A wildcard search can produce words that match the pattern of your query but are unrelated to what you are looking for. It is sometimes possible to change the placement of the wildcard character to reduce the number of irrelevant results.

    5. Example: If you want to find matches for both color and colour, a query of the form col*r could also find matches for the words collector and collider. Submitting a query for colo*r is more precise, and results in matches for both color and colour.

Error messages

AltaVista Search will display a message similar to the following if your query results in matches that are too numerous to be meaningful.

Example: Ignored inte*: 4292323

This message means that there are more than four million instances in the index of words starting with "inte." Consequently, AltaVista Search does not return any results, because the query is not specific enough to be useful.


Choosing between Simple and Advanced Search

The Simple and Advanced Searches are equally powerful and flexible.

Advantages of Simple Search

The main advantages of Simple Search are:

  • Google only returns web pages that contain all the words in your query; refining or narrowing your search is as simple as adding more words to the search terms you have already entered. Your new query will return a smaller subset of the pages Google found for your original "too-broad" query.
    For details, see How Search Results are Ranked.

  • By default, Google only returns pages that include all of your search terms. There is no need to include "and" between terms. Please note that the order in which the terms are typed will affect the search results.To further restrict a search, include more terms.

    For example, to plan a vacation to Hawaii, simple type: vacation hawaii.

  • Google ignores common words and characters such as "where" and "how", as well as certain single digits and single letters, because they tend to slow down your search.

  • To provide the most accurate results, Google does not use "stemming" or support "wildcard" searches. Google searches for exactly the words that you enter in the search box. This will help your query yield more accurate results.

  • You can use the plus (+) in front of a common word if it is essential to use the common word in your search. Be sure to include a space before the "+" sign.

For details, see Simple Searching.

Advantages of Advanced Search

The main advantages of Advanced Search are:

  • The Advanced Search interface requires a more precise, logical syntax which, although it is more exacting, also gives you more control over the results of your search. Using the apple pear muffin recipe example, suppose you decide that you do not want to see any documents unless they contain at least the words muffin and recipe.

  • Entering several words separated by spaces indicates that you want to find documents containing any or all of the words (documents containing all of the words will be listed first).

    For example, suppose you want to find a recipe for muffins that includes either apples or pears, but ideally would contain both fruits. You could enter the series of words apple pear muffin recipe. If any document contains all four words, automatic ranking places that document at the top of your results list. Documents containing only some of the words would be next, and documents containing only one of the words would be ranked last.

  • AltaVista Search ranks results automatically based on a series of factors that ensure that the most relevant documents appear at the top of the results list.

  • You can type your query in the form of a Natural Language question, as if you were conversing with another person. For example, you can type in the query field What is Penn's policy on vacation leave? and AltaVista Search will sort out the most important words in the phrase and return a list of documents containing those words.

  • You can get a count of the number of documents that meet your search criteria. This is useful, for example, if you want to get an idea of how many web pages contain links to your own home page.

For additional information on using the Advanced search, see Advanced Searching.


Simple Searching

To enter a query, just type in a few descriptive words and click the Search button (or hit the "enter" key) for a list of relevant web pages. Since Google only returns pages that contain all the words in your query, narrowing your search is as easy as adding more words to the search terms you have already entered. Your new query will return a smaller subset of the pages Google found for your original query.

Simple Search examples

To find the documents most relevant to your needs, construct your query as precisely as you can.

  • Google will ignore common words and characters unless you put a plus (+) sign in front of the word.

    Example: Star Wars Episode +1

  • Another method to include common words or characters in the search would be to to put quotation marks around two or more words.

    Example: "Star Wars Episode 1"

  • Google searches are not case sensitive. All letters will be understood as lower case, regardless of how you type them.

    Example:
    Searches for University of Pennsylvania and university of pennsylvania will return the same results.

  • Since Google does not support "stemming" or "wildcard" searches, Google searches for exactly the words you enter into the search box.

  • Example: Searches for penn* or pennsy* will not yield pennsylvania.


Advanced Searching

  1. Enter as many words as you wish. AltaVista will find web pages that include some or all of the words. Those with more of your search terms
  2. will rank higher.

  3. Use quotation marks to indicate phrases:
    "affirmative action" hiring employment
  4. Use a plus sign + to mark words that must be present:
    +rodin president administration
  5. Use a minus sign - to mark terms that must not be present:
    athletics sports -football
  6. Use an asterisk * as a wildcard.

    a. Search word stems and find the stem and more:

    penn* finds penn penns pennsylvania pennsylvanian

    b. Search for spelling variants or terms with internal spelling differences:

    wom*n finds woman women

    lab*r finds labor labour

  7. Uppercase and lowercase are treated the same. To maintain a certain capitalization, put the word in quotes:

    spruce finds spruce Spruce

    "Spruce Street" finds Spruce Street

Advanced Search examples

To find the documents most relevant to your needs, construct your query as precisely as you can.

  • To increase the likelihood that the most relevant documents will appear at the top of the list, enter several synonyms for the topic for which you are searching.

    Example: Querying for sandals leather footwear instead of just one of those words increases the chance of finding documents about leather sandals.

  • Use quotation marks to group several words into a phrase.

    Example: bicycle "for sale" finds documents that contain both the phrase for sale and the word bicycle.

  • Use the wildcard notation (*) at the end of a word stem to find related words.

    Example: quilt* finds the words quilts, quilter, quilting, and quilted.

  • Use the + and - signs to further refine your search. Do not type a space between the sign and the search term.

    Example: noir +film -"pinot noir" finds documents containing both noir and film but not the phrase pinot noir.


Displaying search results

You can choose how to display the results of an AltaVista search by selecting one of the following options from the dropdown menu on the search screen.

Form Resulting action
In Standard Form Displays a hot link to the title and the URL of each document; the first several lines of the document; the size; and the date the document was posted to the web.
In Compact Form Displays a hot link to the title of each document; the date posted; and the first several words. The information about each document fits on one line.
As a Count Only Displays the total number of documents that match the search, without any additional information. This option is available only from the Advanced Search screen.

Search results are displayed in a list

Search results are displayed in a list, with best matches first. Fifteen results are displayed per page. Click on a number at the bottom of the page or click [Next] to display a new list of search results. To return to the list of search results just viewed, click [Prev].

Maximum 200 documents are displayed for each search

AltaVista Search displays a maximum of 200 documents regardless of how many documents it found that matched the search criteria. For information about how AltaVista Search chooses the documents to display first, see How Search Results are Ranked.


How search results are ranked

Ranking Simple Search results

Google ranks the results of a search based on a score that includes these criteria:

  • Google applies its patented PageRank™ technology to rank the sites based on their importance.

  • PageRank™ uses the web's vast link structure as an indicator of an individual page's value. Basically, Google interprets a link from page A to page B as a vote, by page A, for page B. Google looks at more than the than just the volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."

  • Important, high-quality sites receive a higher PageRank, which Google remembers each time it conducts a search. Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google looks beyond the number of times a term appears on a page and examines all aspects of the page's content (and the content of the pages linking to it) to determine if it's a good match for your query.

Ranking Advanced Search results

AltaVista ranks the results of a search based on a score that includes these criteria:

  • Whether the words or phrases are found in the first few lines of the document (for example, in the title of a web page).

  • The frequency of occurrence of a query word or phrase. Rare words in a query are weighted more heavily than common words (rarity is determined by the number of occurrences of the word in the index).

  • Whether all of the specified words or phrases appear in a document. A document containing all three words specified in a three-word query would rank higher than a document containing only two or one of the words.

  • Whether multiple query words or phrases are found close to each other in a document.

 


Using keywords to refine searches

Advanced Search supports the use of keywords to restrict your searches to pages that meet specific criteria regarding the structure and contents of a web page. Using keywords, you can search based on a URL or portion of a URL, or based on the links, art, text, and coding that a web page contains. With keywords, you can do useful things such as

  • Find all pages on a certain host or in a specific naming domain.
  • Find all pages that contain links pointing to your own web page.
  • Find all pages that contain a specific class of Java applets.

To search based on keywords, enter a query in the format keyword:search-criterion where keyword is any of a list of special items for which AltaVista can search, and search-criterion is the string or condition that you want to match.

You must enter the keyword in lowercase, followed immediately by a colon. The conventions for specifying a phrase in the search criterion are the same as for specifying a phrase in a regular query; the most convenient method is to enclose the phrase in quotation marks (double quotes).

The following table describes the keywords that AltaVista Search supports:
Keyword Function
anchor:text Finds pages that contain the specified word or phrase in the text of a hyperlink.
applet:class Finds pages that contain a Java applet of the specified class.
domain:domainname Finds pages with the specified word or phrase in the domain name of the web server where the page exists (the rightmost portion of an Internet hostname is the domain name).
host:name Finds pages with the specified word or phrase in the hostname of the web server where the page exists.
image:filename Finds pages that have an image tag with the specified filename.
link:URLtext Finds pages that contain at least one link to a page with the specified text in its URL.
text:text Finds pages that contain the specified text in any part of the page other than an image tag, link, or URL.
title:text Finds pages that contain the specified word or phrase in the title.
url:text Finds pages that contain the specified word or phrase in the URL.

The url, host, and domain keywords all serve a similar purpose in that they search for URLs based on a specific portion of the URL itself, or on the hostname or domain name where the web page exists.

The link and anchor keywords are similar in that they both look for information about jumps. The link keyword looks for text in a URL that is the target of a jump (for example, http://www.abc.org/help.html), whereas the anchor keyword looks for the actual text of a hyperlink as users would see it on a web page (for example, click here).

The text and title tags both search for the contents of a document itself. The text keyword finds any visible text (other than tags, links, and URLs) within a document, whereas the title keyword restricts the search to text that the document's author coded as part of the <title> tag. The title is what appears in the window banner of your web browser. The title keyword can be a good way to hone your search to only the most significant pages about a topic.

For additional information on advanced search operators, see Advanced Searching.

Examples

url:http://host1.myagency.org/volunteer
Finds all pages with the words http://host1.myagency.org/volunteer/ in the URL (the result is a listing of pages advertising volunteer opportunities in the Myagency organization).

host:host1.myagency
Matches pages with host1.myagency in the hostname of the Web server.

domain:org
Matches pages with the domain name org in the hostname of the web server.

image:demo_screens.jpg
Matches pages that contain an image tag with a reference to demo_screens.jpg.

anchor:"click here"
Matches pages with the phrase click here in the text of a hyperlink.

link:http://www.abc.org/mypage.html
Matches pages that contain at least one link to a page with the URL http://www.abc.org/mypage.html.

link:http://myhost.abc.org/mypage.html -host:myhost.abc.org
Finds only external pages containing links to the specified URL (the - operator eliminates pages on the same web server as the page of interest).

text:training
Matches pages that contain the word training in any part of the visible text of a page (not in a hyperlink or image tag.)

title:"The Wall Street Journal"
Matches pages with the phrase The Wall Street Journal in the title.

applet:NervousText
Matches pages containing the Java applet class named NervousText.

[Table of Contents]
Penn Home Penn A-Z Directories Calendar Maps
 
Copyright © 2006, University of Pennsylvania
3451 Walnut Street, Philadelphia, PA 19104 · 215-898-5000
Copyright Information | Contact Us | Privacy Policy | Disclaimer