Penn Computing
Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn

Table of evaluation criteria for Penn Web index/search packages

Criterion Alta Vista Excite Webinator
User Functionality
Ability for user to define subsets of Penn Web to be searched. Ability to construct query filter (a la Deja News), select from predefined list of searches (schools, etc.), or otherwise limit data to be searched Yes.

In order for the user to select from a predefined list of indexes as is currently available on the PennWeb, we would have to install the Alta Vista software in multiple directories, one for each separate index. This does not seem like an unmanageable situation since the software requires so little disk space, uses so little system resources, and requires almost no administrative intervention but it is badly designed for this function.

Index covers only one server. Searches cannot be made across servers. Yes.

In general the administrative overhead for Webinator is high. Lists of starting URL's for each server must be maintained; the gathering and index processes are two separate steps that are interdependent and must be regularly scheduled.

Ability for user to search for Internet web sites from Penn interface. Not part of standard configuration, but can be added. Yes. Standard configuration can pass queries to eXcite's commercial Internet search site, No. Webinator does not have an Internet wide search engine.
Ability to index/search multiple data formats: html, txt, pdf (note increasing pdf use in Wharton, Library, and elsewhere) Searches html and txt only.

Search Intranet v2.0 will index HTML, text, MS Office documents, PDF, Postscript and 200 other file formats.

Indexes html and acsii only. Indexes html, txt and rtf. pdf files can be indexed with a separately purchased extention ($600).
Ability to specify protocol types indexed/searched (news, http, etc.) No. An additional plug-in should allow indexing of news servers and is due in the last quarter of '97. No. Files to be indexed need to be contained within the htdocs file system. No. Files to be indexed need to be contained within the htdocs file system.
Text-only browser support Yes Yes, depends only on the query and results pages we design. Yes
Ability to index/search defined fields (in combination with free text), including meta tags and keyword-only searches Yes. Can also constrain search to specific html elements (image tags, link anchors, etc.). Key word searches are possible but not very well documented. Meta tags can be used to generate document summaries, but they cannot be searched as fields. Support for meta tags coming in new version. No release date yet, but should be "very soon." They are using new version internally.
Ability to express queries in:
Natural language
Set logic operators
Boolean logic operators
Special pattern matchers (regular expressions, quantities, fuzzy patterns)
Proximity operations
Thesaurus mode
or any combination of these
Yes to Natural Language, set logic operators (2), and proximity (1 level). No mention of Theasurus mode in documentation. Yes to natural language, set logic operators and Boolean logic operators. No to special pattern matching, proximity operators and thesaurus mode. Yes to all except Boolean logic. Webinator uses set logic instead of Boolean. 3 set logic operators including 'permute;' 5 levels of proximity.
Ability to present results, including:
Relevance-ranked order, automatic
Relevance-ranked order, at user request only
Date-ranked order
Document similarity searches (doc surfing)
Link reference reports (lists docs linking to each result, allows backwards navigation)
In-context result listings
URLs displayed
Yes to auto relavance-ranking, no doc-surfing. Displays URLs, but link refrences can only be see by doing another search. No date-ranking, but can search by date range. Yes to automatic relevance-ranked order and document similarity searches (query-by-example). No to relevance-ranked order at user request, date ranked order, link refrence reports and in-context result listsing. Can be configured to display URLs. Yes on everything except date ranking. User must click on 'show context' icon to see URL for each result. No search by date range.
Administration capabilities
Ability to index all servers in the Penn Web. Default search should be the widest possible. - REQUIRED Yes. No. Only indexes local server. Can index any server, would have to add list of servers to walker's to-do list (or start walker on a page that links to all servers to be indexed) if they were to all be in the same database or start a new walking process if they were to be in separate databases.
Ability to define and control exclusions from indexing/searching; ability to search across predefined (not user-defined) subsets of the Penn Web (control exclusion "from top down"). Yes. Very fine control of what is indexed via file lists and file filters. Yes
Ability to recognize and respond to authorial exclusions from indexing, e.g., "robots.txt." (control exclusion "from bottom up.") Yes. No, because can only index documents on same server as install. Yes.
Ability to customize query and results interfaces Yes Query page is standard html, results can be customized via perl programming. Yes
Ability to search while index is updating -REQUIRED Yes. Could not find this directly, but product does not do incremental indexing which probably means searching CANNOT be done during indexing. Yes
Access to technical support Yes, with purchased version. Unsupported product. FAQ's and online docs available on web site. Yes - only a listserv in demo version. Purchased version has full tech support.
Cost of ownership/operation:
Cost of license/renewals $11,900 for site license and $5,599 for the distribution. Free $700 or $4395 - Two versions contain slightly different features. See Webinator Homepage for features description. Updates are free.
Cost of technical support subscription Free with purchased version. Not Available. Free with purchased version.
Cost of any necessary hardware purchases or enhancements None Would require pruchase of a whole new system, approx $30,000 to $50,000. None
Strategic fit
Should not preclude use of any hardware platform or server software under consideration for future deployment in the Penn Web Software: Should work with most web servers. Hardware: Platforms listed below. Software: Should work with most web servers. Hardware: Platforms listed below. Software: Should work with most web servers. Hardware: Platforms listed below.
Availability on multiple platforms including DEC Alpha UNIX and Sun Solaris Only supported on Windows NT, 95 and Dec Alpha.

Recent announcement of port to SUN/Solaris. Should be available 7/15/97 or shortly thereafter.

Curerent platforms: SunOS, Solaris, SGI Irix, HP-UX, IBM AIX, BSDI, Windows NT, others "coming soon." Not supported DEC Unix. Current platforms: Unixware, Irix, Linux, Solaris, Sparc, SunOs, SCO, AIX, HPUX, DEC ALPHA(UX), SVR4 386, BSDI, DGUX, Windows NT

Results of user test of above criteria


Information Systems and Computing
University of Pennsylvania
Comments & Questions

University of Pennsylvania Penn Computing University of Pennsylvania Information Systems & Computing (ISC)
Information Systems and Computing, University of Pennsylvania