Stanford University Home

World eBook Lbirary

Google's library plan 'a huge help'
By Jefferson Graham, USA TODAY
PALO ALTO, Calif. — Stanford University senior Will Oremus puts in a full day as a student and editor-in-chief of the school newspaper. He doesn't get around to studying until after midnight, when the campus library is closed.

The University of Michigan's Buhr Library.
By Morris Richardson II, AP

So consider Oremus thrilled that Stanford has partnered with Internet search giant Google (GOOG) to digitize library books, which eventually will be searchable from dorm rooms.

"If it was possible to access things online without leaving my room, that would be a huge help," he says. "For research papers, there's lots of books you can only get at the library. It would be really cool to get them on your computer."

That is the plan, but it's expected to take five to 10 years for it to become a reality. (Related: Making books readable on computer proves trying task)

Google is picking up the estimated $150 million tab to have employees on-site at the Harvard, Stanford, Oxford and University of Michigan libraries, plus the New York Public Library, begin scanning books, page by page.

Google Library

As has been reported quite widely, Google has begun a massive digitization project with five libraries:

The total covered by existing agreements is said to be 15 million. Each is estimated to cost $10 to scan. Stanford's scanning unit is said to be able to do 100,000 pages a day. Oxford's scanning unit is said to be able to do 10,000 books per week. If all of them are that speed then by my math it will take a little over five years to scan them all. Similarly, the University of Michigan says the project will take six years.

How digitizing process works

Digitizing a book can be almost as tough as writing one. Here's how it's done:

1: Convert each page into a digital image by scanning or photographing it.
This process is often done by hand, usually by low-wage workers in India and China. These workers place pages onto an ordinary computer scanner, one page at a time. At top speed, they can scan about 100 pages an hour.
Automated scanning machines, much like copy machines, do exist. But most require the pages to be cut out, a poor choice for rare books. Recently, start-ups such as Kirtas Technologies of Victor, N.Y., have introduced automated book-scanning machines that can turn pages. Kirtas' book scanner uses a robotic arm to flip pages past a 16-megapixel Canon digital camera.

2: Clean and crop the image
A computer program processes the image. It crops it to the proper size, centers the text on the page, and removes smudges and other errors.

3: Run word-recognition software
Before this step, each page of the book is stored as a picture. The picture can be posted online, but searching through the text is impossible. To do that, the computer must "read" the book using optical character recognition software. This software looks for letters in the image, and turns them into words. When the process works, the text of the book can be turned into an ordinary word-processing document that can be searched, edited, copied and pasted.

4: Store, and post the book
The book is now in its electronic form, and must be stored just like any other computer file. It can be put on a CD or stored on a hard drive.

Contributing: Gregg Toppo

Deals with Google to accelerate library digitization projects for Stanford, others


In December, Stanford announced that it is one of five libraries cooperating with Google Inc. in a project to make millions of books from their collections available electronically to readers worldwide without charge. Along with Harvard University, the University of Michigan, the University of Oxford and the New York Public Library, Stanford will loan books to Google to be added to an electronic repository that could become the world's largest digital library.

"This is a great leap forward," said Michael A. Keller, university librarian and publisher of the Stanford University Press and the HighWire Press, Stanford's online co-publishing service for scholarly journals. For years, Stanford has been digitizing texts to make them more accessible and, as of January 2005, Highwire Press has helped publish more than 800,000 free full-text journal articles. But in the case of books, the university's efforts have been limited for technical and financial reasons, Keller said. "The Google arrangement catapults our effective digital output from the boutique scale to the truly industrial."

Google was founded in 1998 by Stanford doctoral students Larry Page and Sergey Brin. Google and Stanford have been talking with one another about the project since very early in the development of the idea, Keller said.

Both Stanford and Google are committed to respecting the rights of publishers and copyright holders of the books scanned, he said. Users will be able to browse the full text of works in the public domain. According to a Google press release, library books that are still in copyright will show up in Google search results, but users will see only bibliographic information and a few small text snippets unless permission is granted from publishers to show more.

The project's unveiling last month made national and world headlines and prompted speculation about the effect it might have on the future of libraries and publishing. Keller talked recently about what the project will mean for Stanford.

When will the actual scanning of books begin?

Lots of logistical issues remain to be worked out—things like transport, selection, physical control, sorting, etc. Google staff are working with Stanford University Libraries to develop detailed initial plans for the project.

How many of Stanford's more than 7.5 million books will be digitized?

Stanford has great hope of digitizing all its books eventually, so that each one can be made as accessible and addressable as possible. That said, the process will take quite a few years and we really do not know how Google's grand ambitions will play out over time. For that reason, we left the question open as to how many Stanford books Google would handle. The agreement with Google neither calls out specific collections nor specifies a minimum or maximum number of books to be digitized. At this point, we're not really worried about digitizing the last book.

How will they be selected?

That is still being determined. We most likely will begin with a few hundred thousand books that were not converted from the Dewey Decimal cataloging system when the libraries began using the Library of Congress classification system. That is intrinsically an older collection, so more of the books are likely to be in the public domain. We also will factor such things as current location and condition, as well as attempting to create as little disruption for our readers as possible.

How will the books be digitized?

Stanford will loan books from its library collections to Google, which will scan them at their Mountain View headquarters. Once digitized, the books will be returned to Stanford and re-shelved. We'll require that books be turned around fairly quickly. Google has promised not to damage the books, and we are taking them at their word. We are strongly committed to get as much work done as possible, without disrupting the services we provide to our readers.

How do you expect this project to affect the library and its mission?

.I have been committed to finding ways of digitizing information for years. The Google plan allows us to accelerate our digitization schemes by orders of magnitude. I intend that our eventual use on campus of the digitized book files will be a tremendous asset to the Stanford community.

Some people seem to believe the effect will be to make the physical books redundant—that we can simply discard the books and convert our book stacks to offices and labs. I disagree strongly. In fact, I believe having books in digital form will actually increase the use of the physical books. The digital files will be great for searching and targeting material for study, but many of us prefer the hard copy original in hand for careful reading. So, in my opinion, it is not an either-or proposition; the book provides a valuable reading experience different from the valuable searching/scanning/excerpting work with the digital version.

Now the downside of the Google plan from the library's operational point of view is the work at our end: selecting books, protecting fragile or damaged books before they go off for digitizing, resorting, etc. Physically handling hundreds of thousands or millions of books is labor intensive by its nature.

When will Stanford materials appear in Google?

As of this writing, no timeline has been set.

Can faculty ask that certain books be digitized?

We already have a process for targeting books for digitization, and such needs should be communicated through the librarians of specific collections.

The Google library project reportedly was nicknamed Project Ocean. How do you respond to the criticism that the Internet is already a sea of information that's difficult to intelligently navigate?

There is obviously a huge amount of information on the web. However, information is not quite a generic commodity: Having millions of pages available online is of no immediate value if the information you need is represented only in a book on a shelf to which you do not have access. Further, not all information is of equal validity, integrity, accuracy, legitimacy, etc. The Google book digitization project will unlock a very large amount of relatively high-quality information of known, traceable origin, with proper bibliographic references. And, of course, that information will be searchable through Google. So I would say its net effect is to improve the chances that web users can obtain legitimate representations of the information they seek, thus improving the value and maybe even decreasing the chaotic quality of the web.

I also expect that the existing tools for extracting information will improve with the large-scale availability of full-text material. The tools that are emerging now will give us the ability to extract ideas from online content, rather than simply perform keyword searches.

Michael Keller

Google library project named as one of ten most important emerging technologies for humanity by futurist Mike Adams

The Google library project -- an ambitious effort to digitize hundreds of thousands of texts from prestigious libraries -- has been named the single most important emerging technology for humanity by futurist Mike Adams in his free downloadable ebook, "The Ten Most Important Emerging Technologies For Humanity." In the downloadable book, available at, Adams cites the Global Electronic Library as the #1 technology needed to uplift humanity due to its ability to enhance the accessibility of knowledge.

 Google Press Release

Google Checks Out Library Books

The Libraries of Harvard, Stanford, the University of Michigan, the University of Oxford, and The New York Public Library Join with Google to Digitally Scan Library Books and Make Them Searchable Online

MOUNTAIN VIEW, Calif. - December 14, 2004 - As part of its effort to make offline information searchable online, Google Inc. (NASDAQ: GOOG) today announced that it is working with the libraries of Harvard, Stanford, the University of Michigan, and the University of Oxford as well as The New York Public Library to digitally scan books from their collections so that users worldwide can search them in Google.

"Even before we started Google, we dreamed of making the incredible breadth of information that librarians so lovingly organize searchable online," said Larry Page, Google co-founder and president of Products. "Today we're pleased to announce this program to digitize the collections of these amazing libraries so that every Google user can search them instantly.

"Our work with libraries further enhances the existing Google Print program, which enables users to find matches within the full text of books, while publishers and authors monetize that information," Page added. "Google's mission is to organize the world's information, and we're excited to be working with libraries to help make this mission a reality."

Today's announcement is an expansion of the Google Print™ program, which assists publishers in making books and other offline information searchable online. Google is now working with libraries to digitally scan books from their collections, and over time will integrate this content into the Google index, to make it searchable for users worldwide.

"We believe passionately that such universal access to the world's printed treasures is mission-critical for today's great public university," said Mary Sue Coleman, President of the University of Michigan.

For publishers and authors, this expansion of the Google Print program will increase the visibility of in and out of print books, and generate book sales via "Buy this Book" links and advertising. For users, Google's library program will make it possible to search across library collections including out of print books and titles that weren't previously available anywhere but on a library shelf.

Users searching with Google will see links in their search results page when there are books relevant to their query. Clicking on a title delivers a Google Print page where users can browse the full text of public domain works and brief excerpts and/or bibliographic data of copyrighted material. Library content will be displayed in keeping with copyright law. For more information and examples, please visit

Source: USA TODAY research

University of California-Berkeley professor John Battelle, who runs the influential Searchblog, says Google's library project has huge implications. "The idea that the world's knowledge, as held through books and libraries, is opening up to all via a Web browser cannot be understated," he says. "People will find books they never knew existed." He thinks it will take a year before university and public library books begin to show up in Google's index in a meaningful way.

In October, Google began working with publishers to make portions of their books searchable in the Google index. Several have signed up for the service, including John Wiley & Sons, Hyperion and Scholastic.

Rob Enderle, an independent analyst with The Enderle Group, predicts that Google's library program will motivate the publishing industry to get serious about having more searchable online books. "The object is to get the books read, and search engines will be where more readers will find out about books than at the bookstore," he says.

At the school level, the popularity of the Internet and easier access to information has made teachers concerned about a rise in student plagiarism.

But Rutgers professor Donald McCabe says having complete books online could make plagiarism easier to detect: "It'll provide ... a greater possibility of being caught."

The idea of bringing the local library to the world, and making out-of-print books available, sounds great to author Avery Corman. His latest novel, A Perfect Divorce, was just released. But six prior books, including Oh, God! and Kramer vs. Kramer, have been out of print for years. When a book is no longer in print, "It's like it's disappeared into a black hole," he says. "The only place anybody can read them is at the library. If this helps get the book to more people, I'm all for it."

World eBook Lbirary


© World eBook Library, 2003, All rights reserved world wide.