Simple and Advanced Search
You can search Penn's web using two different tools: the Simple
Search (based
on Google's index of the Penn Web) or the Advanced Search (using
AltaVista Intranet). Some kinds of
searches, such as those for the home pages of organizations, may yield more
relevant results using the Simple Search. Complex searches for
documents or topical information
may benefit from the advanced query features of the AltaVista
Advanced Search.
Although Simple Search is updated once a month, please note that the index is
maintained by Google on its own site; we cannot guarantee its
comprehensiveness
or update frequency. Should Penn's Internet trunk or the Google site
itself be unavailable for any reason, your query will be forwarded
automatically to
Advanced Search, which is locally maintained. For help using Simple
Search, consult Search Tips on the Google site.
Overview of Penn Web's Advanced
Search
The Penn Web Advanced Search is based on AltaVista Intranet
search software,
a variant of the AltaVista software for searching the entire Internet. When
you initiate a Penn Web Advanced search, AltaVista Intranet
searches the current
index to the Penn Web, a very large index of the pages on more than
300 Web servers
in the upenn.edu domain. The current index is not updated dynamically, as Web
pages are added, deleted, or changed. Instead, a new index is compiled in the
background and installed every two weeks, on the first and third
Sunday of each
month. See
http://www.upenn.edu/search/updates.html for the date the current index
was installed.
To create a new Penn Web Advanced Search index, AltaVista's web
crawler module
starts at the Penn home page and follows hyperlinks throughout the upenn.edu
domain. Every document in the upenn.edu domain that is linked from
another document
that the crawler was able to index is included in the index. That means that
several hundred thousand documents are indexed and retrievable. Depending on
when in the indexing cycle a new document is linked or an outdated document
is removed, it can take anywhere from a few days to about two weeks for the
document to appear in, or disappear from, the Penn Web Advanced
Search index.
The only documents that are not included
in the current index are:
- those excluded from indexing by the administration of the servers on which
the documents are housed
- password-protected documents
- those put onto a Web server and linked after the web crawler visited the
server as it was compiling the current index
Because of the timing of Penn Web
Advanced Search index updates, it's possible that a Penn document
you can't find using
the Advanced Search has already been indexed by the Internet
version of AltaVista.
You can automatically retry a query on the AltaVista site by clicking
on the "Try your search for ..." link at the bottom of each Penn
Web Advanced Search
results page.
Special AltaVista
searches
In addition to the main Penn Web Advanced search, there are
AltaVista Intranet-based
searches available for several subsections of the Penn Web.
The Publications
Search indexes
the Almanac,
The
Penn Current and
The
Pennsylvania Gazette
web sites. In addition to the standard AltaVista
search functions, the Publications search allows users to request only
current or archival materials from the publications it indexes. Search
results indicate whether a returned page is current or archival. In this
search, "current material" includes articles from the most recent issues
of all publications, plus policy-related items from back issues of Almanac
that have not been superseded by more recent material. The index for this
search is updated whenever a new issue is added. A detailed technical
explanation of the Publications search is available.
Other AltaVista Intranet-based Penn Web subsite searches include the
Computing Web Search and
the University
Archives and Records Center
Search.
Unlike the PennWeb Advanced Search index, which is updated every two
weeks, these search
indexes are updated monthly.
Since they are relatively small, they can be compiled and installed on
the same day.
The Computing Web Search is updated on the second Sunday of each month.
The University Archives and Records Center Search is updated on the
third Sunday of each month.
See
http://www.upenn.edu/search/updates.html for the
dates of the last update.
Because the Penn Web Advanced Search index is updated
more frequently than the special indexes,
it's possible that a document
you can't find using one of the special searches is already available in
the Penn Web Advanced Search. You can
automatically retry a special search query using the Penn Web
Advanced Search by clicking on the
"Try your search ..." link at the
bottom of each special search results page.
Advanced Search basics
All AltaVista search functions use the same rules regarding phrasing, case
sensitivity, and finding related words. How
a word is defined
AltaVista Search defines a word as any string of letters and
digits that is separated
by either:
- White space -- such as spaces, tabs, line ends, or the start or
end of a document
- Special characters and punctuation -- such as %, $, /, #, and _
Example: AltaVista Search
interprets and indexes HAL5000, 60258, www, http, and
EasierSaidThanDone all as single words, because they are continuous
strings of characters, surrounded by
characters that are neither letters nor digits. AltaVista Search
indexes all words that it finds on a web page,
regardless of whether the word exists in a dictionary or is spelled correctly.
How to search for phrases using
quotation marks
You can search for phrases or groups of related words that appear
next to each other. To
indicate a phrase in a search query, put quotation marks around the
words. Using quotation marks
tells AltaVista Search to find the words together, instead of looking
for separate instances of each word individually.
You can also use punctuation to indicate phrases.
Example: To look for the
phrase penn reading project, type
"penn reading project"
If you did not use quotation marks, AltaVista Search would find
instances of "penn" alone,
"reading" alone and "project" alone, as well as any instances where
the three words happen to appear together. Enclosing the words
in double quotes indicates that you want to find only instances of
all three words together.
How punctuation and special characters are interpreted
AltaVista Search ignores punctuation except to interpret it as a
separator for words. Placing punctuation
or special characters between each word, with no spaces between the
characters and the words, is also a way to
indicate a phrase. As an example of when punctuation might be useful
in indicating a phrase, consider searching
for a telephone number. Entering 1-800-555-1212 is easier than entering "1 800 555 1212", which is an equally
acceptable syntax, but is less natural.
Hyphenated words, such as CD-ROM, also automatically form a phrase
because of the hyphen.
Normally, however, using quotation marks to indicate a phrase is
recommended over the use of punctuation between
words, because some special characters have additional meaning:
- In AltaVista searches, you can use the asterisk (*) as a wildcard
indicating that you want to find all words containing a match for
the specified
pattern of letters.
Case-sensitive searches
A search in AltaVista may be case sensitive depending on whether
you type the
query in quotation marks or not.
- A query in all lowercase letters will not result in a case-insensitive search.
Example: Typing reading
in the query field will find all occurrences of
the word reading, including those spelled REading, READING, reading.
- Example: Typing "Reading" in
the query field will find all occurrences of the word Reading.
Multinational characters
AltaVista Search supports exact-match searches for characters in
the ISO Latin-1 character set. That is,
you can enter a word containing an accent or other diacritical mark,
and AltaVista Search will find only documents
with the accented spelling of the word.
For example, if you search for the French word
éléphant, AltaVista Search will find only documents
containing an exact match for the French spelling of the word.
Entering a word with mixed case and an accent, (for example,
Éléphant) would produce only
results that match the word in terms of both case and accent.
If you omit accents and other diacritical marks from a search
query, AltaVista Search finds documents
containing words both with and without the special marks. Although
this feature might produce
some irrelevant results for users doing an English language search,
it enables users to enter queries for non-English
words even when they do not have international support on their keyboard.
To support searching for special characters without their diacritical marks,
AltaVista Search makes a mapping to the closest possible plain character or
combination of characters. The software then indexes words in both
forms: with
special characters as they appear, and also with special characters replaced
by the mappings. The following table illustrates the special characters and
their mappings:
| Character(s) | Mapping | Character(s) | Mapping |
| Æ | AE | æ | ae |
| Á Â À Å Ã
Ä | A | á â à å ã
ä | a |
| Ç | C | ç | c |
| Ð | D | ð | d |
| É Ê È Ë | E | é
ê è ë | e |
| Í Î Ì Ï | I | í
î ì ï | i |
| Ñ | N | ñ | n |
| Ó Ô Ò Ø Õ
Ö | O | ó ô ò ø õ
ö | o |
| Þ | TH | þ | th |
| Ú Û Ù Ü | U | ú
û ù ü | u |
| Ý | Y | ý ÿ | y |
| ß | ss |
Finding related words (wildcards
and truncation)
Wildcard searching (truncation) is convenient for finding
derivatives and spelling
variants of the same word.
- Use the asterisk wildcard notation ( * ) to search for a group of words
that contain the same pattern.
Example: To look for the word sing
and any derivatives, such as singer, singers, and singing,
type sing*
in the query field. Searching for the stem cantalo* will produce
matches for cantaloup, cantaloupe, cantalope, and their plurals.
- Specify at least three letters in front of the asterisk *.
(This is required
in order to limit extraneous searching.)
- The asterisk * matches only lowercase characters (not capital letters or
digits), and interchanges with a maximum of five letters.
- A wildcard search can produce words that match the pattern of your query
but are unrelated to what you are looking for. It is sometimes possible to
change the placement of the wildcard character to reduce the
number of irrelevant
results.
Example: If you want to find
matches for
both color and colour, a query of the form col*r could
also find matches
for the words collector and collider. Submitting a query for colo*r
is more precise, and results in matches for both color and
colour.
Error messages
AltaVista Search will display a message similar to the following if your
query results in matches that are too numerous to be meaningful.
Example: Ignored inte*: 4292323
This message means that there are more than four million instances in the
index of words starting with "inte." Consequently, AltaVista
Search does not
return any results, because the query is not specific enough to be useful.
The Simple and Advanced Searches are equally powerful and flexible.
Advantages
of Simple Search
The main advantages of Simple Search are:
- Google only returns web pages that contain all the words in your query;
refining or narrowing your search is as simple as adding more words to the
search terms you have already entered. Your new query will return a smaller
subset of the pages Google found for your original
"too-broad" query.
For details, see How Search
Results are Ranked.
- By default, Google only returns pages that include all of your
search terms.
There is no need to include "and" between terms. Please note that
the order in which the terms are typed will affect the search
results.To further
restrict a search, include more terms.
For example, to plan a vacation to Hawaii, simple type:
vacation hawaii.
- Google ignores common words and characters such as "where" and "how", as
well as certain single digits and single letters, because they tend to slow
down your search.
- To provide the most accurate results, Google does not use
"stemming"
or support "wildcard" searches. Google searches for exactly the
words that you enter in the search box. This will help your query yield
more accurate results.
- You can use the plus (+) in front of a common word if it is essential to
use the common word in your search. Be sure to include a space before the
"+" sign.
For details, see Simple Searching.
Advantages of Advanced Search
The main advantages of Advanced Search are:
- The Advanced Search interface requires a more precise, logical
syntax which,
although it is more exacting, also gives you more control over the results
of your search. Using the apple pear muffin recipe example,
suppose you decide
that you do not want to see any documents unless they contain at least the
words muffin and recipe.
- Entering several words separated by spaces indicates that you
want to find
documents containing any or all of the words (documents containing all of
the words will be listed first).
For example, suppose you want to find a recipe for muffins that
includes either
apples or pears, but ideally would contain both fruits. You could enter the
series of words apple pear muffin recipe. If any document contains
all four words, automatic ranking places that document at the top of your
results list. Documents containing only some of the words would
be next, and
documents containing only one of the words would be ranked last.
- AltaVista Search ranks results automatically based on a series of factors
that ensure that the most relevant documents appear at the top of
the results
list.
- You can type your query in the form of a Natural Language question, as if
you were conversing with another person. For example, you can type in the
query field What is Penn's policy on vacation leave? and AltaVista
Search will sort out the most important words in the phrase and
return a list
of documents containing those words.
- You can get a count of the number of documents that meet
your search
criteria. This is useful, for example, if you want to get an idea
of how many
web pages contain links to your own home page.
For additional information on using the Advanced search, see Advanced Searching.
To enter a query, just type in a few descriptive words and click
the Search button (or hit the "enter" key) for a list of relevant
web pages. Since Google only returns pages that contain all the words in your
query, narrowing your search is as easy as adding more words to the
search terms
you have already entered. Your new query will return a smaller subset of the
pages Google found for your original query.
Simple Search examples
To find the documents most relevant to your needs, construct your query as
precisely as you can.
- Google will ignore common words and characters unless you put a plus (+)
sign in front of the word.
Example: Star Wars
Episode +1
- Another method to include common words or characters in the search would
be to to put quotation marks around two or more words.
Example: "Star
Wars Episode
1"
- Google searches are not case sensitive. All letters will be understood as
lower case, regardless of how you type them.
Example: Searches for University of
Pennsylvania
and university of pennsylvania will return the
same results.
- Since Google does not support "stemming" or
"wildcard"
searches, Google searches for exactly the words you enter into the search
box.
Example: Searches for
penn*
or pennsy* will not yield
pennsylvania.
- Enter as many words as you wish. AltaVista will find web pages
that include
some or all of the words. Those with more of your search terms
will rank higher.
- Use quotation marks to indicate phrases:
"affirmative action" hiring employment
- Use a plus sign + to mark words that must be present:
+rodin president administration
- Use a minus sign - to mark terms that must not be present:
athletics sports -football
- Use an asterisk * as a wildcard.
a. Search word stems and find the stem and more:
penn* finds penn penns pennsylvania pennsylvanian
b. Search for spelling variants or terms with internal
spelling differences:
wom*n finds woman women
lab*r finds labor labour
- Uppercase and lowercase are treated the same. To maintain a certain capitalization,
put the word in quotes:
spruce finds spruce Spruce
"Spruce Street" finds Spruce Street
Advanced Search examples
To find the documents most relevant to your needs, construct your
query as precisely as you can.
- To increase the likelihood that the most relevant documents will appear
at the top of the list, enter several synonyms for the topic for which you
are searching.
Example:
Querying for sandals leather footwear instead of just one of those
words increases the chance of finding documents about leather sandals.
- Use quotation marks to group several words into a phrase.
Example: bicycle "for sale"
finds documents that contain both the phrase for sale and the word
bicycle.
- Use the wildcard notation (*) at the end of a word stem
to find related
words.
Example: quilt*
finds the words quilts, quilter,
quilting, and quilted.
- Use the + and - signs to further refine your search. Do not
type a space between the sign and the search term.
Example: noir
+film -"pinot
noir" finds documents containing both noir and film but
not the phrase pinot noir.
You can choose how to display the results of an AltaVista search
by selecting
one of the following options from the dropdown menu on the search
screen.
| Form
| Resulting action |
| In Standard Form
| Displays a hot link to the title and the URL of each
document; the first
several lines of the document; the size; and the date the document was
posted to the web. |
| In Compact Form
| Displays a hot link to the title of each document; the date posted;
and the first several words. The information about each document fits
on one line. |
| As a Count Only
| Displays the total number of documents that match the search, without
any additional information. This option is available only
from the Advanced
Search screen. |
Search results are displayed in a list
Search results are displayed in a list, with best matches first.
Fifteen results
are displayed per page. Click on a number at the bottom of the page or click
[Next] to display a new list of search results. To return to the
list of search
results just viewed, click [Prev].
Maximum 200 documents are displayed for each search
AltaVista Search displays a maximum of 200 documents regardless of how many
documents it found that matched the search criteria. For information about how
AltaVista Search chooses the documents to display first, see How
Search Results are Ranked.
Ranking Simple Search results
Google ranks the results of a search based on a score that
includes these criteria:
- Google applies its patented PageRank™ technology to rank the sites
based on their importance.
- PageRank™ uses the web's vast link structure as an indicator of an
individual page's value. Basically, Google interprets a link from page A to
page B as a vote, by page A, for page B. Google looks at more than the than
just the volume of votes, or links a page receives; it also
analyzes the page
that casts the vote. Votes cast by pages that are themselves
"important"
weigh more heavily and help to make other pages "important."
- Important, high-quality sites receive a higher PageRank, which
Google remembers
each time it conducts a search. Google combines PageRank with sophisticated
text-matching techniques to find pages that are both important and relevant
to your search. Google looks beyond the number of times a term appears
on a page and examines all aspects of the page's content (and the content
of the pages linking to it) to determine if it's a good match for
your query.
Ranking Advanced Search results
AltaVista ranks the results of a search based on a score that includes these
criteria:
- Whether the words or phrases are found in the first few lines
of the document
(for example, in the title of a web page).
- The frequency of occurrence of a query word or phrase. Rare
words in a query
are weighted more heavily than common words (rarity is determined
by the number
of occurrences of the word in the index).
- Whether all of the specified words or phrases appear in a
document. A document
containing all three words specified in a three-word query would
rank higher
than a document containing only two or one of the words.
-
- Whether multiple query words or phrases are found close to each other in
a document.
-
Advanced Search supports the use of keywords
to restrict your searches to pages that meet specific criteria regarding
the structure and contents of a web page. Using keywords, you can
search based
on a URL or portion of a URL, or based on the links, art, text, and coding
that a web page contains. With keywords, you can do useful things such as
- Find all pages on a certain host or in a specific
naming domain.
- Find all pages that contain links pointing to your
own web page.
- Find all pages that contain a specific class of Java applets.
To search based on keywords, enter a query in the format
keyword:search-criterion
where keyword is any of a list of special items for which AltaVista
can search, and search-criterion is the string or condition that you
want to match.
You must enter the keyword in lowercase, followed immediately by a colon.
The conventions for specifying a phrase in the search criterion
are the same
as for specifying a phrase in a regular query; the most convenient method
is to enclose the phrase in quotation marks (double quotes).
The following table describes the keywords that AltaVista Search supports:
| Keyword
| Function |
| anchor:text
| Finds pages that contain the specified word or phrase in the text of
a hyperlink. |
| applet:class
| Finds pages that contain a Java applet of the specified class. |
| domain:domainname
| Finds pages with the specified word or phrase in the domain name of
the web server where the page exists (the rightmost portion
of an Internet
hostname is the domain name). |
| host:name
| Finds pages with the specified word or phrase in the hostname of the
web server where the page exists. |
| image:filename
| Finds pages that have an image tag with the specified filename.
|
| link:URLtext
| Finds pages that contain at least one link to a page with
the specified
text in its URL. |
| text:text
| Finds pages that contain the specified text in any part of the page
other than an image tag, link, or URL. |
| title:text
| Finds pages that contain the specified word or phrase in
the title. |
| url:text
| Finds pages that contain the specified word or phrase in
the URL. |
The url, host, and domain keywords all
serve a similar
purpose in that they search for URLs based on a specific portion of the URL
itself, or on the hostname or domain name where the web page exists.
The link and anchor keywords are similar in that they both
look for information about jumps. The link keyword looks for text in
a URL that is the target of a jump (for example,
http://www.abc.org/help.html),
whereas the anchor keyword looks for the actual text of a
hyperlink as users
would see it on a web page (for example, click here).
The text and title tags both search for the
contents of a document
itself. The text keyword finds any visible text (other
than tags, links,
and URLs) within a document, whereas the title keyword restricts the
search to text that the document's author coded as part of the
<title>
tag. The title is what appears in the window banner of your web
browser. The
title keyword can be a good way to hone your search to only the most
significant pages about a topic.
For additional information on advanced search operators, see Advanced Searching.
Examples
- url:http://host1.myagency.org/volunteer
- Finds all pages with the words
http://host1.myagency.org/volunteer/
in the URL (the result is a listing of pages advertising
volunteer opportunities
in the Myagency organization).
- host:host1.myagency
- Matches pages with host1.myagency in the hostname of
the Web server.
- domain:org
- Matches pages with the domain name org in the hostname
of the web server.
- image:demo_screens.jpg
- Matches pages that contain an image tag with a reference to
demo_screens.jpg.
- anchor:"click here"
- Matches pages with the phrase click here in the text
of a hyperlink.
- link:http://www.abc.org/mypage.html
- Matches pages that contain at least one link to a page with
the URL http://www.abc.org/mypage.html.
- link:http://myhost.abc.org/mypage.html
-host:myhost.abc.org
- Finds only external pages containing links to the specified URL (the -
operator eliminates pages on the same web server as the page of
interest).
- text:training
- Matches pages that contain the word training
in any part
of the visible text of a page (not in a hyperlink or image tag.)
- title:"The Wall Street Journal"
- Matches pages with the phrase The Wall Street Journal
in the title.
- applet:NervousText
- Matches pages containing the Java applet class named
NervousText.
[Table of Contents]
|