Penn Computing
Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn

Evaluation criteria for Penn Web index/search packages

User functionality:

  • Ability to index all servers in the Penn Web. Default search should be the widest possible. - REQUIRED (exclusion capability covered below under 'Administration capabilities')

  • Speed of returning results

  • Ability for user to define subsets of Penn Web to be searched. Ability to construct query filter (a la Deja News), select from predefined list of seaches (schools, etc.), or otherwise limit data to be searched

  • Simple navigation interface for queries and presentation of results (e.g., easy for user to perceive and understand tasks and results reports; few steps required to perform user tasks)

  • Ability to index/search multiple data formats: html, txt, pdf (note increasing pdf use in Wharton, Library, and elsewhere)

  • Ability to specify protocol types indexed/searched (news, http, etc.)

  • Degree of support for text-only browsers

  • Ability to index/search defined fields (in combination with free text), including meta tags and keyword-only searches

  • Ability to express queries in:
    • Natural language
    • Set logic operators
    • Boolean logic operators
    • Special pattern matchers (regular expressions, quantities, fuzzy patterns)
    • Proximity operations
    • Thesaurus mode
    or any combination of these

  • Ability to present results, including:
    • Relevance-ranked order, automatic
    • Relevance-ranked order, at user request only
    • Date-ranked order
    • Document similarity searches (doc surfing)
    • Link reference reports (lists docs linking to each result, allows backwards navigation)
    • In-context result listings
    • URLs displayed
Administration capabilities
  • Ability/flexibility to define and control exclusions from indexing/searching; ability to search across predefined (not user-defined) subsets of the Penn Web (control exclusion "from top down").

    Descriptive criterion from Helen Anderson at SEAS: "I want to be able to search the official SEAS pages but not the student pages. This means that we cross about five servers. Some whole file hierarchies get searched, but other cases just have a list of files we want to include. I need reasonable tools to handle what gets searched."

  • Ability to recognize and respond to authorial exclusions from indexing, e.g., "robots.txt." (control exclusion "from bottom up.")

  • Ability to customize query and results interfaces

  • Ability to search while index is updating -REQUIRED

  • Access to and quality of technical support
Cost of ownership/operation:

  • Cost of license/renewals
  • Cost of technical support subscription
  • Cost of any necessary hardware purchases or enhancements
Strategic fit:
  • Market strength of vendor

  • Should not preclude use of any hardware platform or server software under consideration for future deployment in the Penn Web

  • Availability on multiple platforms including DEC Alpha UNIX and Sun Solaris

Comparison of major contenders based on above criteria

top

Information Systems and Computing
University of Pennsylvania
Comments & Questions


University of Pennsylvania Penn Computing University of Pennsylvania Information Systems & Computing (ISC)
Information Systems and Computing, University of Pennsylvania