Higher Education in the Information Age

Digital Libraries: The Revolution in Scholarly Information

Peter Lyman


Paul Mosher: I first got to know Peter Lyman, who is presently University Librarian and Professor in the School of Information Management and Systems, when he was working on the Tyro project at Stanford, which was a project to introduce humanities professors to personal computers and how they might be used. In a way, Peter's interests then evolved increasingly from political science in which he has a Ph.D. from Stanford, too, I think, into the sociology and anthropology of information, as I understand it. And I hope that more and more of us become interested in that. I might, to introduce him, simply give you the names of a few of his recent publications: "Access is the Killer Application", "What is a Digital Library: Technology, Intellectual Property, and the Public Interest", "What is the Place of Computer Literacy in Higher Education?", and "Problem Solving as Skill and Social Relation". Peter.

Peter Lyman: My assignment was to talk about the economic and operational implications of the digital library. It all reminds me of my favorite line in "Star Trek: The Next Generation", in which the starship enterprise is being pulled into a black hole, there will be a virus raging through the crew, killing them, my TV set will be flickering, and an engineer will turn to captain Picard and say, "Well, all we have to do is reduce the polarity of DNA," and Picard will say, "Make it so." And that's my job. You guys decide, "Oh, Mike Lesk is right: digital library. Make it so." So I'm going to describe what happens when you say that.

As an ethnographer--and I have to look at economics from an ethnographer's point of view because it's all I know--I study today not computers and not information and not cyberspace; I study Cyberia. And Cyberia -C-Y-B-E-R-I-A first of all, is a word ethnographers use to describe people's experience of communication in the network. And it is, as we've heard from all of the students today and from many of you, it's not an experience interacting with a computer. You know, we wouldn't talk about a child/toy relation or a musician/instrument relation, so I don't know why we talk about human/computer interfaces. Our experiences of these things is the experience of a place. A being and a place and participating in a community. I have to admit that I don't share that experience with the rest of you. I guess part of being an ethnographer is you learn to watch and never quite get to join.

The second part of the concept of Cyberia, and there's a lovely article on this by Arturo Escobar, an anthropologist at Smith College, is that we're dealing not with technology in the traditional sense of a means to an end, but with a technology, information technology and biotechnology, that creates culture. It creates different kinds of social relationships. And I'm going to try and describe that because I think, what if I describe it in two ways? First as a case study--I do have two hours, right Greg? Lock the doors please!--first a case study of a problem I'm facing as a librarian in the transition to the digital library. And second, some other implications. But this is not talking about a new means to the same end. The digital library is not the same place that the print library is. And I'd like to explain why.

It's been fascinating listening to the discussions today. From my point of view, they're an extended fantasy about technology. It's an extended fantasy that a number of extremely highly subsidized experiments can be scaled into a national infrastructure to provide library services of high quality. And it's not a foolish fantasy because it reflects one of our most important core values, which is that we are a community. And one of the ways an academic community is a community is it shares a gift of culture. That is, we exchange information and knowledge freely, without economic constraints with each other. And that's the way we build relationships to each other. It's the way disciplines exist through gift exchange, it's our relationships to each other as faculty and departments, and it's our relationship to our students. And almost every description of the communication that occurs in Cyberia today has been a gift culture. But the digital library is not going to be a gift culture. It's going to be a market culture. So let me describe the case study.

Within the next two years, three to five thousand science, technology, medical, and business journals are going to be available on-line. They are the best and most expensive journals that we buy. Perhaps as much as 60% of our acquisitions budget is those 5,000 journals: Elsevier, Springer-Verlag, John Wiley, Academic Press, Blackwell's. The core fact about that information in science, technology, medicine, and business is that the price of it has been increasing at double digit rates every year for the last two decades, and there's no reason to believe it isn't going to conintue. But those publishers have offered us contracts now to place those journals on-line. So I've gone through a series of thoughts about that transition and how we at Berkeley make that transition.

First of all, each year for the last two years I go to the meeting of the faculty senate library committee, and they, the faculty, ask me questions like, "Why are you--how much money are you spending on digital?" And I'll say, "Two and a half percent of the budget." And they'll say, "Why are you wasting our money?" I will go then directly to my budget hearing with the chancellor, vice chancellor, and head of the university budget system, and they'll say, "How much money are you spending on print?" And I'll say, "97 1/2%" And they'll say, "Why are you wasting our money?" So the first problem I have is a major disconnect between the faculty and the administration.

Now as I have told Stan, and he didn't get too annoyed with me, from my point of view, technology is the drug of choice of administration in higher education. Well, maybe that's why you're not speaking to me. But will digital save money? Here's some of the thought process. First of all, looking at national copyright policy and intellectual property policy, there is no indication that the Clinton and Gore administration has any interest in extending the fair use provisions that apply to print that allow you and your students to use copyrighted material for educational purposes without a fee. There is no indication that those things will apply to the use of digital information. In fact, the Green Paper proposes the somewhat ridiculous and almost ineffable concept that every time you make a copy, you should pay a copyright fee. I think of tens of thousands or hundreds of thousands of copies per second and I think of paying copyright fees, well, millions of times per second. It's a wonderful thought. So first of all, we do not live in a universe in which copyright will allow an exception for educational uses of information, which is what fair use is. So that's reality one.

Reality number two is we're not going to be offered these licenses under copyright anyway. We're going to be offered them under contract, and the terms of those contracts are going to completely change our relationship to the use of information. And I don't know what all they're going to say, but all I know is the evolution of exceptions to copyright for print are not going to apply to digital, and even if they did, we're not going to be offered this on-line information within a copyright environment anyway. It's going to be a contract. Probably--and some of you have probably heard about shrink wrap licenses being applied not only to digital things but to printed things. And many of us, in fact, are quite worried that there'll be a reverse engineering of a very restrictive use of information that applies to digital environments back onto print, and we'll receive printed books with shrink wrap on them that will not allow libraries, for example, to loan them--that "is:" changes the whole relationship of higher education to the use of information.

The second thought process is that contracts we're being offered are digital journals at 90% the cost of print. And if you want both, print and digital, it'll cost between 110 and 130%. And, of course, it sounds like a good deal until we figure out, well, who is going to tell the faculty that we're going to take it, we're going to harvest, as Jim said, we're going to harvest the savings of the digital library by canceling print the same day we bring on-line the digital? I think that's your job, basically, not the librarian's job.

Thirdly, a lot of what looks like a cost savings is a cost shift, and it's a cost shift the more I look at it the more frightening it becomes. First of all the trade off between library space and network infrastructure, even the Berkeley stacks-which are far more beautiful than your computer, Mike-even the Berkeley stacks are a good deal compared to the network that you're going to have to build to manage intellectual property. It is not the network that you build as part of your ARPA or NFS grant. It has special requirements, special requirements that look a lot like a regulated marketplace. Authentication. Who are you, and what right do you have to use this information? All of your interactions in this network are going to be under surveillance, because your use of information is of value to the publishers. It helps create marketing information. At least at my Safeway, when I use my debit card, the back of my little printout gives me free coupons to buy things because everything I've used my credit card on has been put into a database and I'm being marketed back things that I'm likely to buy. It's always interesting to see what inferences they make about my consumption. The same thing is going to happen. The way you use information will not be private, but it will be part of a marketing database. So we need authentication. We need version control so all of the people around Penn have the same kind of access. We need digital cash and accounting because it's very possible this will be charged for on a per use basis, certainly after we get through the transitional period. We need information property management and all of those things. If that's not frightening enough, we have to have Network printing. The second--so we need--the Network is not the same Network that scientists use to connect UNIX work stations into communities of researchers and learners. This is a Network to keep your use of information under surveillance so that you can pay.

Secondly, the software for this is going to have to replace the functions of the library catalog, the technical services costs, which after all only organize information in a logical way. With these wonderful search engines, of course we don't need that anymore. It's wonderful to have the best 20 million citations. I mean it's so interesting that we talk about the information problem as if it's hard to find information. And wonderful computers can swamp you with information. Well, in my universe, people don't have the problem of having too little information. They have the problem of having too much information. So what is the digital equivalent of the library catalog? Or, to speak of a DNA system, of the librarian--what is the digital equivalent of somebody who helps you focus upon what your question is, and helps you answer the question of the quality of information?

Just to give you some hint of the way a social scientist looks at your information behavior, UC California shares a union catalog called "Melvyl". Why we name things after dead librarians, I don't know, but it's part of our tradition. It has about 28 million citations in it. We had to ban some searches because people would do searches like "United States" which would produce 18 million records which would swamp the entire system you see, ability to search is not part of a Kantian catalogical imperative. It's not built into the structure of human thinking. Someone has to teach you. And these are very primitive systems. I mean any system that you give commands to and has control keys is something I don't want to have to do with. So we're going to need software that provides the kinds of information filtering that catalogs do and that librarians do, and if you believe intelligent systems are going to do that for you--well, I won't say that. I'm sure you're right.

Thirdly, this technology is only recently operational. I think the problem of information quality, and information quality is what we're talking about when we're talking about a library, the problem is unsolved, I think the economic model is unsolved, and I think there are unknown social consequences that we haven't solved yet. The first of which is privacy and surveillance. Privacy has been a core value of libraries, that what you do research on is confidential, even from the FBI. But everything you do on the Web is under surveillance, and that's a feature, not a bug. Secondly, you have a conflict between the Net culture, if that isn't an oxymoron, and local norms. I think you see that already in court cases about pornography.

I think maybe the most interesting use of the technology is by political and social movements in repressive regimes able to organize underground information sources and cultures and coordinate each other. But there's at least an issue of local culture and the ability to govern local culture. And then there's the question of social relationships. Sexual harassment on the Web is the most obvious case. I wanted to mention there is a little bit of research of this, and it's odd. For something that is so hyped, there is so little research on the impact of these technologies on social relationships. There is evidence that electronic mail and teleconferencing are much more successful if the participants have a face to face relationships. But the quality of communication is--there's a complimentary impact of face to face relationships and shared cultures and the use of the technology. And you tend to get the most violent or aggressive kinds of interactions the less face to face knowledge you have of each other. So there is a kind of sociology of these communications.

I think, finally, on the economic model, just given the cost, the increasing cost of high quality information, not the local opinion on whether the earth goes around the sun or vice versa, higher education faces two really difficult choices or alternatives. On the one hand there is a world of limiting supply in which you simply say that access will be on demand on a fee for service basis. Ultimately the way to control the cost of information and the cost rise in information is to pass the cost on to the user And that gets to the other side which is educating demand. We've always had a model of an infinite supply of information. Now we're going to have to begin to think about the demand side. One of the ways of moderating demand or giving people a sense of information ecology is to have them share some part of the cost, or at least be aware of what the cost is. When I started working for the Stanford computer center a long time ago, at the end of every session it would tell you how much this session cost. And we weren't charged for it. We were just told. And I was thinking how useful it would be if we had that with every library interaction. You know, "This session just cost $700." Simply to educate people about the costs of what they take for granted.

On that score, in being faced with cutting library budgets, I began a series of studies of the use of information, thinking: I'll get even with these foreign capitalists who are driving up the price of information; I'm going to find out how much this stuff is used. And what I found, in the study of especially the engineering library and some of the natural science laboratories is the cost per use of the highest priced journals is the best bargain in the entire library. That these things are priced fairly on a per use basis, and as a matter of fact it's these mediaeval historians who are so expensive. These publishers know something.

Audience Member: It's a lot of work!

Lyman: Well, you know when I was a faculty member, I used to be thrilled to walk into a library and find, gee, the last time anybody read this book was 1923. That was at Stanford. They don't read too much at Stanford. Now as university librarian, I hate that. So just on a per use basis, publishers know something about the value of their information.

Well, I wanted to talk just a little bit about some of the impacts on the system of scholarly communication. You don't change one part of the system without changing all of it. And just on certification, which Mike brought up and a lot of people have brought up, I want to say three things about it. First of all, we had an emergency meeting of the deans at Berkeley. One of the faculty members had published an article, good article, on the Web. What should we do? And, you know, we had a pretty good discussion of, well, was it peer reviewed? Was the editorial board reputable? And then one of the deans covered his head and said, "You know, if this keeps up, we're going to have to read the faculty's work." Only at Cal.

But there's two other suggestions on the table right now that I wanted to mention just as a second way of making the points that this is a cultural change, not just a technology change. The first is the idea, and it's come up before today, the idea of collaborative filtering. That maybe the way to evaluate scholarship is not through authoritative filters before the fact, but by taking a look at the use and impact of the information after it's published. The citation indices have been used that way. One of the things that the counting of the use of information on the Web allows you to do is to take a look at the impact and use of information. And maybe that's just as affective, and maybe even it's a better way of understanding the quality if information than filtering by editorial boards. It certainly is feasible technically.

The second is the discussion at Cal Tech recently by provosts-and I know Stan was part of that and perhaps others here were as well-that we can separate the certification process from the publication process. That the AAU can set up a national system evaluating the quality of scholarship through review boards that are really separated from the publication process, and getting some sense of the quality of information without having to publish it and buy it back. I'm on the AAU task force that's pursuing that. Generally I regard the idea of higher education organizing anything on a national basis-I usually think of the NCAA as the example of our organizational prowess, and I think the publishers must laugh at us when we talk about organizing anything. Those two suggestions are at least on the table, taking the advantage of technology to solve the problem of certification by separating it from the publication process itself.

Well, on the word information which Paul began, I'm probably the last person in the United States to buy an OED on paper, and I got a really good price on it, too. They couldn't believe anybody wanted to buy it. And the word information goes back to religious education, in fact in the middle ages, and it had to do with putting form into the mind through the reading of books. And a number of comments have been made, Mark among others, that this kind of information also puts form into the mind. I think that's the most interesting: the electronic publishing, the creation of new genres of knowledge. That's really where the exciting revolution is. I don't see digital as a replacement of print, except in this transitional stage. That what the revolution, what the digital revolution is is creating new genres, things like scientific visualization, multi-media arts. Those are new libraries. They're not replacements for print libraries.

Well, I was told to be controversial. I hope I succeeded.



