PENN PRINTOUT
The University of Pennsylvania's Online Computing Magazine

PENN PRINTOUT October 1992 - Volume 9:2

[Printout | Contents | Search ]


Electronic dreaming: the paperless library

By Michael Halperin

"Books on a Chip," "The Hypertext Novel," "The Multimedia Encyclopedia," "The Virtual Library"--although electronic substitutes for the printed page are impressive, there is little substitution of machine-readable information for print information in libraries. Most of the computer-readable information now in libraries (such as online catalogs and computer-readable indexes and abstracts) is designed to make the collection of printed journals and books easier to use.

There are two principal reasons for this state of affairs. First, there isn't much text available in electronic form compared to print. Numeric data, especially financial data, are well represented, but books and journals are not: Only about 5 percent of currently published journals and much less than 1 percent of currently published books exist in commercially available electronic files; about one hundred journals and newsletters are published directly in machine-readable form. Although electronic availability is increasing, the volume of printed publications is increasing as well. The available machine-readable text files are heavily concentrated in the subjects of law and business. Wharton's Lippincott Library, for example, can supply about 10 percent of the requests for text from electronic sources. This is a library that uses almost all the available full-text machine-readable sources in business and economics.

The second reason for the dearth of machine-readable information is that the available text files are represented by a jumble of competing formats and search software that often make them awkward to use as a substitute for print. This point requires some elaboration. Published documents in electronic form first became widely available when Mead's LEXIS timesharing system became operational in 1972. LEXIS contained the complete text of court cases, regulations and legislative law. During the 1980s several database services, including Mead's NEXIS, Dow Jones Information Retrieval, and DIALOG, became active in supplying the "full text" of journals, newspapers, and newsletters. The services have these features in common:

  • They can be searched remotely using modems.

  • They allow us to search the complete text of a document for the words and phrases we wish to see.

  • They display the text of the document.

  • They allow the text to be saved in electronic form (downloaded)

For example, on the NEXIS system we can enter a search statement such as this:

OMNI;electronic w/1 book and date aft 1/1/92

This statement will screen the more than 500 journals and newspapers in the OMNI database for any mention of "electronic" or "electronics" occurring within one word of "book" or "books" that are in publications issued after January 1, 1992. The search will take about 10 seconds to run and will retrieve several hundred documents. We can then display the results of the search as brief citations, as keywords in context, or as the complete text of the original article. Similar searches can be done on several competing online systems such as DIALOG and Dow Jones News Retrieval. However, the search languages, passwords, costs, and information retrieved will be different on each system searched.


Online full-text

Online full-text files are effective research tools. A researcher can often find and retrieve information in seconds that would be impossible to locate through printed sources. The ability to both find the desired text and display it immediately is very attractive. It overcomes the frustrations inherent in using a collection of printed documents where the desired journal may be missing, mutilated, or not owned.

Despite their positive features, full-text files have many limitations:

  • The label "full-text" is misleading. Because they are stored as text (ASCII files), journal and newspaper files do not include photographs and with a few minor exception do not include other graphic information. In addition, they usually exclude data in tabular form, such as stock quotations. Full-text files rarely include non-article material, such as letters to the editor, announcements, or advertisements.

  • The online files are usually less current than the printed documents they represent. Journals and magazines often appear online months after the publication of their print counterparts. The texts of newspapers are sometimes available online the day they appear in print. A few newsletters appear online in advance of print publication.

  • Online full-text files are expensive. Costs of 2 or 3 cents per line printed or displayed are typical.

In general, ASCII full-text files are usually a poor (but rarely cheap) substitute for the original printed document.


CD-ROM files

Storing text as images overcomes some of the problems associated with ASCII files. Cover-to-cover facsimiles of printed documents are commercially available on CD-ROM. Unfortunately, image files have limitations of their own.

  • We cannot search the document directly as we can with ASCII files because the characters are simply pictures composed of dots.

  • CD-ROM image files require much more storage space than ASCII files. About 250,000 pages of text can be squeezed onto one CD-ROM disc, but one CD can hold only about 6,000 pages of images. This is not much more than the capacity of a microfilm cartridge. For example, one commercial product, Business Periodicals Ondisc, requires more than 300 CDs to contain the images of 400 journals for the past four years.

  • Wide area networking of CD-ROM images is difficult because of the large amount of data that must be transmitted to form an image.

  • The images are usually in black and white. Although facsimiles in shades of gray are adequate for many purposes, color is essential in such fields as art, architecture, and geography. Full-text color facsimiles of journals often require too much disk space to be practical. For example, about one issue of the National Geographic in color would fit on one CD-ROM.

ASCII text files have been combined with images to get some of the benefits of both technologies, but this is expensive. A few years ago, the Bechtel Corporation combined text searching capability with CD-ROM images of U.S. Security and Exchange Commission Reports.


The future

Today's electronic substitutes for print are represented by a patchwork of technologies, each having different strengths and none completely adequate. When can we expect to see the all-electronic library? Raymond Kurzweil, the inventor of the Kurzweil reading machine, lists the improvements that must take place before the electronic book can replace the printed book. ("The Future of Libraries: Part 1: The Technology of the Book," Library Journal, January 1992 pp. 80, 82.)

  • Screen presentations must be equal to the quality of the printed page. This will require improvements in screen contrast, resolution, and color capability.

  • The size, weight, and cost of the electronic book must be competitive with print.

  • A substantial part of the millions of volumes of existing printed material must be available in electronic format.

Mr. Kurzweil forecasts that the price and performance of the electronic book will be competitive with the printed book in about ten years. Replacing what he calls "the enormous installed base of print books" will take much longer. How much longer will depend on improved methods of converting print to electronic forms and on the willingness of publishers, the academic community, and the public to accept electronic rather than print formats. Electronic technology will certainly supplant print technology eventually. The arrival of the paperless library, like the paperless office, may take longer than we expect.


Sidebar: Electronic text at Penn

The Penn Libraries have access to a wide variety of electronic text and image files. For information call Van Pelt Reference (898-7555) or Lippincott Reference (898-5924). For examples of electronic journals and a directory of electronic publications on PennInfo, look in "Electronic Journals" under "Libraries."


MICHAEL HALPERIN is Librarian of Wharton's Lippincott Library.