Paul Mosher:What we're trying to do in the spirit of what
we're talking about today is identify the academic domain on Internet.
Since clearly both advertising and entertainment control vast
economic resources, if somebody doesn't worry about management
of the content of the Internet--if we don't do it, nobody will.
I'd like to welcome you to the session on Digital Libraries: The
Revolution in Scholarly Information. "Information"
is the information in the "Information Age". It's the
subject of scholarly information. Information is both the food
and the product of colleges and universities that we all represent.
It's the only, so far as I know, absolutely inexhaustible human
resource. As long as Homosapiens exists, there will be information.
Information, however, is a very Silly Putty word, I think, for
those of you who think about it. Now it is, and I will use it
in the broadest sense here to mean really words, data, knowledge,
objects, sounds, information properly so called--you name it, it's
included I think today.
Now, the amount of information available in the world has increased
exponentially each year--this is published information--since about
the middle of the 18th century, the Age of Enlightenment, spurred
by the growth and spread of popular learning, the popular press,
and the invention of scientific information. The rate of information
growth has further accelerated as a result of the introduction
of digital information. Earlier you talked about junk. Even
the non-junk is becoming too much.
During the last several hundred years, societies have looked
to libraries to provide repositories for information and to make
it available to readers, lookers, listeners, and scholars. The
beginning of the Information Age, which I believe is more accurately
termed "The Age of the Small Machine", has radically
transformed the nature of libraries. For two generations, technologists
predicted the death of libraries, the function of which would
simply be replaced by machine functions. Not only has that disappearance
failed to take place, but the role of libraries in the Age of
the Small Machine is becoming clearer, if not truly clear. In
fact, all kind of assemblies or stores of digital information
or data exist. They are often called, another approximate term,
"digital libraries".
What are digital libraries? Will they end up replacing libraries
as we have known them? What does it all mean? What will be the
role of libraries in a higher education environment, transformed
by the uses of technology? To answer these and certainly to introduce
others, we have two very different kinds of librarians with us
today who are going to talk about digital libraries, different
kinds of digital libraries, and what they may mean for us.
The first of our two speakers, and I will introduce them seriatim
and then ask if you'll hold your comments until both have spoken,
is Michael Lesk, who on the one hand I identified as Chief Scientist
at Bellcore, on the other hand I identified from his home page
as big enchilada of the computer science research department at
Bellcore. And any of you who are interested in extremely knowledgeable
and entertaining home pages should look up Michael's, if only
to see his three photographs: the professional one, the amateur
one, and the coin operated one. Michael worked with the group
that built UNIX and wrote UNIX tools for word processing. He's
worked on a large chemical information system, the Core Project,
with Cornell, OCLC, ACS, and CAS. He's been a visiting professor
in computer science at University College, London and has received
a number of other honors and awards. I'm looking forward to the
appearance of his new promised book, Practical Digital Libraries:
Books, Bites, and Bucks which is supposed to appear this
summer.
Michael Lesk: O.K. What I'm going to say, basically,
is that technically, we know how to build digital libraries.
What we don't know how to do is make a self-sustaining system
that will pay for itself. And so the message I'm going to give
you is that libraries need to be expansionist. I don't know how
to solve the economic problems within the context of libraries
as they're defined today. I see some possibilities if we expand
the concept and include some of the other economic flows. This
is going to be an eat or be eaten rule world.
We can build desk top journal delivery today. It's available
from people like J-STOR. It provides better services to the
readers. They can get things on their desk. They don't have
to walk all the way to the library. But it doesn't produce any
more money for the libraries. I have yet to see the government
department that says, "You've given us American Economic
Review on-line on our desktop. We'll vote for more money
for the library next go around." Web publishing. We can
get professor's papers out into the world faster, but by a system
that doesn't involve any payment so that somebody can support
it. We can do long distance cooperation. I've heard a lot
this morning about long distance cooperation, but it breaks down
the individual loyalties within the universities. We can scan
the contents of book stacks cheaper than we can build central
campus libraries, but there's no university I know where those
are exchangeable kinds of money--library operations and a building.
We can also imagine, from the competitive side, that publishers
will say, "Wait a minute. There's a big market for undergraduate
text books. I'll deliver those straight to the students, bypass
the library." Well, then what's left in the library that
needs existence? So, I summarize with this line from Yogi Berra
that what we have here is an insurmountable opportunity. What?
Pogo? sorry. I saw it from Yogi Berra. Maybe he was quoting
Pogo.
Dean Farrington this morning talked about 15th century people
who didn't believe in printing, and as it happens I have some
quotes with me by people who said the world has gotten along without
printing; printing will never be as good as copying out by hand.
And in fact, 2,000 years before these people, Socrates told his
people that writing was just a crutch for memory, that, you know,
people who wrote things down instead of memorized them would have
the show of wisdom but not the reality. A free reprint of a Scientific
American article on the Internet to any of the classicists
in the audience if you could tell me which Platonic dialogue.
Audience Member: Phaedrus.
Lesk: Right.
Audience Member: It was written down.
Lesk: Here's another one. This is a Wall Street Journal
ad from a British building materials conglomerate named Hanson.
It's got a picture of a chip and it says, "This is the latest
thing in microprocessors. In 12 months it will be obsolete.
This is a brick. In 12 months it will still be the latest thing
in bricks. We're Hanson. We invest in things of permanent value."
So there's a lot of luddites And of course, we've seen some
of the promises of new libraries before. But on the other hand
there we're also making a lot of progress. This is the drawing
Life magazine made for Vannavar Bush's memex in 1945,
the viewing screens and microfilm reels and the little chemical
processing plant to create new microfilm, all inside the disk.
But more seriously about the progress. This is a chart of the
number of bytes of text on the Web. It's on a log scale. It's
going up a factor of ten every year. It's now at two terabits.
The Library of Congress is normally thought to be about 20 terabits,
which means basically next year there'll be as much text on the
Web as there is in the Library of Congress. You may say, "Well,
poo poo. I don't believe your numbers." So if I'm off by
12 months, you know, big deal. It will be another year. In terms,
of course, of English material in the last two years, the Web
is already ahead of the Library of Congress. In terms of evaluated
material, well, that's another story.
Everybody uses libraries on-line. Let me ask a question. Every
one of you, think of the last time you had to look up something.
There was something you wanted to know and you didn't remember
it. And I want to ask for a show of hands, looking it up on paper
versus looking it up on the screen. How many on paper? How many
on a screen? O.K., overwhelmingly for the screens. That's the
way it is. As I've said, here's a lot of examples of researchers
who use, depend critically, on the electronic libraries. The
preprint physics server at Los Alamos by Paul Ginsparg. The
protein in genome data bases. 4,000 books, all US legal decisions.
Now the problem is there are still tenure committees who sort
of have this attitude of "Well, what's a digital library?"
A lot of people know the story of Greg Crane who got denied tenure
with no consideration given to the Perseus work. But on the other
side, there are undergraduates going around saying, "What's
a paper library?" One MIT professor told me that he had
to put a stipulation on his students: 10% of all the references
in their papers had to be not URL's. And I was visiting Cornell
and one of the professors shared that he complained to one of
his undergraduates about no paper references, and the student
threw his hands up and said, "I don't do libraries."
There's a lot of stuff in digital libraries. I don't have the
demos that Silicon Graphics had, but I have my own things. O.K.
We're getting into audio, into video, into images. Now you have
to understand, images are, indeed, expensive. This is three bytes.
This is 1,000 bytes. That's 12,000 bytes. Nevertheless, large
image libraries are common. We don't know how to search them,
but we're getting them and were going to be dependent on them.
We get sound. For several years I ran a system where I like
to listen to the news in my office, and I couldn't listen to it
at the right time. So I plugged a radio into my work station
in the US and I digitized national public radio all day long,
and then I could listen to the news programs whenever I wanted
to. I also got a friend in the UK to plug a radio into his computer,
and it recorded Radio Four all day long. And I'd drag over the
news program every morning, and I could listen to the BBC news
if I had the time. And I could speed it up, and I could clip
things out and I could save them. What?
Audience Member: Do you ever go back and listen to them?
Lesk: Yes! In fact, I still cite one of the results
which I don't know a written source for, which somebody on NPR
said that 10% of the expenses in the clothing industry were wages
and 27% were information costs. And so, you know, better information
was worth more than low wage labor. And I've never seen that
anywhere else, so I still have to cite it.
There's also a lot of interest in maps. These are four maps
of Cranford, NJ. I'm afraid perhaps only the people in the front
can see this. This line is the same railroad in each map. This
is 1878, the modern USGS quad spot satellite imagery aerial photographs.
This stuff is now all digital and on the Web. This stuff is
all digital and will be on NASA's site soon. This admittedly
we still have to scan. The old maps-the Library of Congress has
just announced they're going to be able to scan all of the Cranford
Fire Insurance maps. This is an example of what you might do.
There's a little lake here which is disappeared from all the
modern representations. Presumably it got filled in. If you
wanted to dig a big basement at that point in Cranford, you'd
probably like to know about that.
This is one reason why the Congress is willing to support work
on the Internet. This is the US balance of trade in data and
information services. We run a ten to one positive balance of
trade with the rest of the world and--anybody here ever logged
into a Japanese information server? They log into ours all the
time. That's why. So Congress thinks this is good. Now, in
fact, let me try another one. How many of you have looked up
something on the Web when you knew that it was available on paper
in your local library? Good show of hands. How many of you have
gone to your local library to look up something which you knew
was on the Web? O.K. One. Not many. So we actually have a
lot of enthusiasm for this, but how do we pay for it?
Here's some numbers. Library budgets. Typical in the United
States. Library spends something on buildings, services, processing,
acquisitions. Suppose it goes electronic. Well, you save some
money on building. Everything else goes up. And no university
lets you transfer building money to other services, so you're
behind the eight ball. What about the publishers? These are
numbers from the American Economic Association, courtesy of Malcolm
Getz. They get 38% of their revenue from individual membership
subscriptions. They only spend 23% on printing. So they look
at these numbers and say, "If we switched to all electronic
distribution, and the library copies serve the purpose within
the university, we probably lose most of our membership subscriptions.
We wouldn't save enough to compensate. We'd have to double the
bill for libraries." That's not very attractive. So we
don't know how to make that one work.
Here's another one we don't know how to make work. Here's the
comparable cost of scanning books versus building libraries.
J-STOR is paying 20 cents a page to scan books. That would be
about $60 . The Cornell "CLASS" project paid about $30 per book
scanned. The Core project was about seven cents a page. That
would be about $21 per book. The Making of America has just put
out a lot of scanning at eight and a half cents a page. I think
the price could come down to $3 a book if we really did this on
a large scale. J-STOR may show whether that's true. But we're
talking about numbers from $20 to $30 a book.
What about building? Cornell built a new stack recently at $30
a book--or $20. Berkeley built one at $30. Now, admittedly, the
Berkeley one, here it's yours, Peter, it's, you know it's built
to withstand a Force A earthquake, but it is more expensive than
it would have been to scan the books. I'm sorry what?
Audience Member: You scan it once and everybody can have
it.
Lesk: Yes! That's the J-STOR principle. Share the scanning.
J-STOR negotiated deals with it's things. I mean, you know, each of
these libraries, the British Library
and the Bibilotecque de France, cost far more to build than it
would have cost to scan their content. In fact, somebody once
told me back in the days of microfilm that if, instead of building
the British Library, they had microfilmed every book in London,
then taken all the books and put them in warehouses in North Wales,
built a smaller building at the same price per square foot in
London to hold the film, and then to deal with the complaints
of the readers that the books were out in North Wales, given each
reader a lifetime season ticket on British Rail to north Wales,
they would still have saved 100 million pounds over what actually
went on.
Now, there are a lot of problems. One of the things libraries
worry about is permanence and preservation. And in fact, Don
recently wrote, Don Waters, recently wrote a report on preservation,
but let's see. How many people know the first telephone call?
"Watson, come here, I need you!" We know that one.
We know the first telegram. And these both are over a hundred
years old. Anybody know the first e-mail? We do not know the
first e-mail. We don't even know in which city it was sent.
So we really have problems on preservation. We're going to have
to learn that one. Now, what are some of the--
Audience Member: Just going to have to make it up!
Lesk: I mean one of my friends in the early '80's went
looking for the first e-mail message, and even though it was less
than 20 years, or about 20 years after the date, she could not
pin it down. Nobody had accurate enough notes. All right.
What are some of the problems? Well, some of these have been
already--quality. James Stafford says that Usenet is a herd of
elephants with diarrhea. There was a comment this morning about
did anyone teach these students how to evaluate things they found
on the Web. It's not just students that have the problem. A
number of American reporters have been writing stories about some
town in Northern Ireland based on information they got of a Sinn Fein
Web site without realizing that Sinn Fein had generated it.
Loyalty. I asked this question before. I went through one random journal, the only journal
for which Bellcore still had 30 years on paper in it's library.
I counted the number of papers which had been co-authored where
all of the co-authors were from the same institution, and it ran
about 30%. And now it's started to drop off. And I say this
is the influence of Fax machines and e-mail. People can collaborate
with anyone in the world.
Do we get shared experiences? Universities have been introducing
things like core curriculum so that all the students will have
something in common, and we heard about that from Morgan Friedman
this afternoon. Well, what happens if everybody is out on the
Web doing their own thing? The other side of that is can we preserve
diversity? Is there a danger that if everybody is looking for
the greatest multi-media presentation, that it won't be affordable
to make many of them? Arn deWhiter, the head of research at Elsevier,
once said to me, "Look," he said, "we publish a
dozen elementary college physics textbooks. No problem. Do
it all the time. We may be able to fund one really good CD ROM
with animation and everything else. Now maybe you don't care
if we only have one college physics CD ROM. Do you really want
there to be only one American history CD ROM? One philosophy
CD ROM? Maybe that's dangerous."
Equality of access. How do we get the Web out everywhere? You
know, a trivial matter: eight percent of the men in this country
are color blind. There are all sorts of things you'd like to
make work, and a big thing is recognition. How do we reward people
for what they do on the Web? Any of you work at places where
the tenure committees actually value on-line publication? I guess
the man from Glasgow, Professor Davies, can say that you do because
the research assessment exercise requires you to. But it's not
that common.
Finally, I listened to Doug Van Houweling talking about distance
learning and we're going to help out these small universities,
and I worry about the small university president who says, "Oh,
I can't afford faculty members to teach Russian, and I don't have
enough students who want it. I'll buy it from the University
of Michigan." Well, next year it'll be math. The year after
that it'll be--you know, why have a history-why have a library
or a faculty at all? Let's just buy everything in from big universities.
And that points--there was a comment by Susan Fuhrman that there
wasn't much work on educational research. That's a very serious
problem. There's very little good development of course where
we're not studying it. In theory, self-paced instruction should
be a big win. Everybody believes that self-paced instruction
should produce an enormous saving. But the reality is we haven't
achieved it. We can do math and language drill, and other than
that we're looking at a lot of failures with programmed instruction
and research, and then the balance of research--
And of course getting out to diversity. Actually, speaking of
diversity, whole lot of discussion about gender access
to computers. Anybody know what's the best selling CD ROM in
the country? The best non-software CD ROM: Barbie fashion designer.
First one. And all your universities have problems. There prevails
today an extensive and wasteful competitive duplication of plant
and personnel among American universities, particularly in the
graduate schools. Thorsten Veblen in 1918. So anyway.
I want to wind up with telling you what should universities do.
All right? O.K. One comment is, you want to use the Web. You
all have university presses. Well, some of you have probably
abolished them recently. The Web is the alternative. You have
to get out there and say, "Look. We're going to have-we're
going to make the University of Pennsylvania Web page a place
that people are proud to publish. Maybe we'll have all sorts
of student Web pages with people giving SEPTA schedules or something
like that, but in addition we're going to have some area which
will be as prestigious as the university press and can economically
survive when the university press can't." And that means
giving awards, encouraging people to developing tools, and encouraging
bonding through the--we've heard some wonderful stories from Myra
Lotto about students bonding through local coursework. We need
more stories like that. We're going to find that students won't
care where they were physically located. In fact, maybe professors
won't care if you can get the whole Stanford
library on-line, maybe you can do all the research without being
at Stanford. And it's much easier to get jobs at other universities.
We need to teach people how to find and evaluate things. The
Web does have this huge pile of junk. You do need some skills.
We have a tradition in libraries of teaching people about imprints
and binding and, you know, where did something come from. We
need the Web equivalent. So how do you look at something and
decide whether it's any good? And you're not going to be able
rely on whether it's on decent paper or newsprint.
And another thing I think we don't do enough of is we don't support
the recognition of new forms of creativity. We still have an
academic system in which the written English essay is all important,
and people in art and music have been complaining for generations
that their paintings, their compositions didn't get equal attention.
Well, this is now going to come with a vengeance on software,
on what do we do to say to students, "If you do really
exciting art or music--if you develop a good tool that lets people
who are not professional musicians do something useful with music--"
this is something that should be heavily rewarded.
Also collaboration techniques. We don't have a large supply
of people who are good at both writing and art and music.
You know, as I think back through history, well we've got Blake
and we've got Rossetti and maybe you can come up with one or two
more but it's not common. But how do we see that we can encourage
people to collaborate, so that a student who knows art and a student
who--a student who can draw and a student who can write can work
together? And I think we need to know how to do this and how
to reward it, and the universities have to work in this area or
you won't make it. Other people will come in.
O.K. That's my argument as to what should be one.