November 1994 - Volume 11:3

Word search: Library puts the OED online

By James English and Stephen Lehmann

The monumental Oxford English Dictionary, the world's greatest lexicon, is now available to the Penn community in state-of-the-art electronic form, thus becoming the Library's first locally loaded full-text database. In its 12-volume printed format the OED has been an indispensable research tool for scholars of literature, history, and other fields for over half a century. But with the advent of a computerized version, the OED becomes both more accessible and far more powerful than ever before.

The dictionary, first published in 1928 and now in its second edition, is valued not just for the scope of its definitions (500,000 words) and the reliability of its etymologies, but for the vast pool of quotations it incorporates to illustrate histories of usage. All told, there are nearly two million citations of usage, drawn from texts as old as the thirteenth-century Life of Beket or as recent as Monty Python's Life of Brian. For scholars, these quotations are often of considerable interest quite apart from their lexicographical value. Many of them, gathered by Victorian researchers, come from authors or works which have since faded into near oblivion; and as such they serve as an important indicator of the ongoing process of revaluation and reassessment that shapes what we too often think of as a fixed hierarchy of literary value.

Unprecedented search capabilities

One of the major advantages of an electronic OED is that this enormous set of quotations becomes available for rapid searching. Students and scholars are no longer restricted to alphabetical word lookups, as with the printed version, but can, for example, search for all the citations of a particular work or author across the entire OED database. With a few keystrokes, Professor Stuart Curran of the English Department obtained a complete listing of the OED citations of Charlotte Smith, an all but forgotten English poet of the Romantic period. That the computer found no fewer than three hundred and eleven such citations was, says Curran, an astonishing result which "told me more about 19th-century England than anything I've seen in years."

This kind of author search by no means exhausts the capabilities of the online OED. Though designed to be user-friendly, it is such a powerful tool that even faculty in the English department are only just beginning to see some of the ways they can take advantage of it. One professor in the department has already given students an online OED assignment (involving the changing usage of the terms "imperialism" and "colonialism" in England), and two of the department's spring semester courses will be taught in computerized classrooms where extensive use will be made of the tool. Students in these classes will be asked not simply to perform a particular search, but to come up with their own ways of using the OED to gain knowledge about particular authors, periods, or cultural issues.

It is expected that by next fall the online OED will be widely integrated into the undergraduate program in English. At that time it is likely that the department's strong Renaissance group, several of whom have been using the print OED in their undergraduate teaching and have been enthusiastic supporters of the Library's online OED project, will be at the forefront in helping students to conduct their own electronic lexicographical research.

The OED Task Force

Implementing or "mounting" the dictionary on the Library's computer system was not as simple a project as it may sound. A number of decisions needed to be made about hardware, software, points of access, security, and so on. With these concerns in view, an OED Task Force, including members of both the Library staff and the English Department, was assembled last May to begin considering the options.

The first and most important decision facing the task force was what sort of interface to adopt. The Library acquired the machine-readable text of the OED from Oxford University Press, and the "search engine," a very powerful but exceedingly arcane full-text search program called PAT, from the Canadian firm Open Text. All libraries that have acquired the online OED have found it necessary to devise some kind of "front end," or friendlier interface, to enable users to work effectively with PAT.

After much searching, the group finally settled on an interface developed at the University of Virginia by John Price-Wilkin. It is relatively straightforward and clear, it seemed to meet the needs of most users, and it was adaptable to Penn's extremely varied computer environment. Also decisive was the fact that the Virginia interface is an implementation of the World-Wide Web, which makes it available to graphical clients and consistent with the Web implementations offered by SAS, the English Department, and other campus Web servers. Above all, the Library wanted an interface that would enable users both to take full advantage of the wealth of historical and linguistic information in the OED, and to do simple dictionary word lookups - and the Virginia interface seemed to offer this kind of flexibility.

Implementing the interface

The original charge to the Task Force by Paul Mosher, Vice Provost and Director of Libraries, was to implement the OED in such a way that it would be a suitable tool for scholarly research and teaching, as well as a conventional online dictionary for simple definition lookups. Although the Virginia interface was quite usable "off the shelf," it didn't fully meet these requirements, and the task group set out to improve both its appearance and its functionality - adding a definition-only feature and the ability to combine terms, for example.

Another important consideration facing the task force was that Penn's OED would need to be accessible both to graphical browsers such as Mosaic (with their ability to display many of the phonetic and other non-standard characters used by the OED), and to more rudimentary, text-only browsers such as Lynx, which remain the most common vehicles of access for remote modem users and for vt100-type machines on campus. The Library's systems experts crafted an interface that works well for almost everyone - although perhaps not always as elegantly as a system designed for a single platform.

Given the vastness and richness of the OED, it is a tremendously versatile research tool, working differently for different scholars. Online this will be truer than ever. It is not, however, an expanded Bartlett's quotation dictionary. Central to an effective use of the OED is the understanding that its quotations are chosen to illustrate usage, not necessarily wit or wisdom. Quoted almost as often as Virginia Woolf, for example, is Lady Bird Johnson ("Luci walked in...happy as a lark, saying, 'Mama, I probably aced it [her zoology final]'.") Which is not to say that wit is entirely absent: for example, "Dr. privily known throughout Germany as Wotan's Mickey Mouse" (Sinclair Lewis), and a W. H. Auden quotation whose inclusion must have given the OED editors, and Auden himself, wry satisfaction, "One of my great ambitions is to get into the OED, as the first person to have used in print a new word."

JAMES ENGLISH is an Associate Professor in the English Department; STEPHEN LEHMANN is the Library's Humanities Bibliographer.