Part 16
The Xerox prototype scanning system provided a number of important features for capturing this diverse material. Technicians used multiple threshold settings, filters, line art and halftone definitions, autosegmentation, windowing, and software-editing programs to optimize image capture. At the same time, this project focused on production. The goal was to make scanning as affordable and acceptable as photocopying and microfilming for preservation reformatting. A time-and-cost study conducted during the last three months of this project confirmed the economic viability of digital scanning, and these findings will be discussed here.
From the outset, the Cornell Xerox Project was predicated on the use of nonproprietary standards and the use of common protocols when standards did not exist. Digital files were created as TIFF images which were compressed prior to storage using Group 4 CCITT compression. The Xerox software is MS DOS based and utilizes off-the shelf programs such as Microsoft Windows and Wang Image Wizard. The digital library is designed to be hardware-independent and to provide interchangeability with other institutions through network connections. Access to the digital files themselves is two-tiered: Bibliographic records for the computer files are created in RLIN and Cornell's local system and access into the actual digital images comprising a book is provided through a document control structure and a networked image file-server, both of which will be described.
The presentation will conclude with a discussion of some of the issues surrounding the use of this technology as a preservation tool (storage, refreshing, backup).
Pamela ANDRE and Judith ZIDAR
The National Agricultural Library (NAL) has had extensive experience with raster scanning of printed materials. Since 1987, the Library has
## participated in the National Agricultural Text Digitizing Project (NATDP)
a cooperative effort between NAL and forty-five land grant university libraries. An overview of the project will be presented, giving its history and NAL's strategy for the future.
An in-depth discussion of NATDP will follow, including a description of the scanning process, from the gathering of the printed materials to the archiving of the electronic pages. The type of equipment required for a stand-alone scanning workstation and the importance of file management software will be discussed. Issues concerning the images themselves will be addressed briefly, such as image format; black and white versus color; gray scale versus dithering; and resolution.
Also described will be a study currently in progress by NAL to evaluate the usefulness of converting microfilm to electronic images in order to improve access. With the cooperation of Tuskegee University, NAL has selected three reels of microfilm from a collection of sixty-seven reels containing the papers, letters, and drawings of George Washington Carver. The three reels were converted into 3,500 electronic images using a specialized microfilm scanner. The selection, filming, and indexing of this material will be discussed.
Donald WATERS
Project Open Book, the Yale University Library's effort to convert 10, 000 books from microfilm to digital imagery, is currently in an advanced state of planning and organization. The Yale Library has selected a major vendor to serve as a partner in the project and as systems integrator. In its proposal, the successful vendor helped isolate areas of risk and uncertainty as well as key issues to be addressed during the life of the project. The Yale Library is now poised to decide what material it will convert to digital image form and to seek funding, initially for the first phase and then for the entire project.
The proposal that Yale accepted for the implementation of Project Open Book will provide at the end of three phases a conversion subsystem, browsing stations distributed on the campus network within the Yale Library, a subsystem for storing 10,000 books at 200 and 600 dots per inch, and network access to the image printers. Pricing for the system implementation assumes the existence of Yale's campus ethernet network and its high-speed image printers, and includes other requisite hardware and software, as well as system integration services. Proposed operating costs include hardware and software maintenance, but do not include estimates for the facilities management of the storage devices and image servers.
Yale selected its vendor partner in a formal process, partly funded by the Commission for Preservation and Access. Following a request for proposal, the Yale Library selected two vendors as finalists to work with Yale staff to generate a detailed analysis of requirements for Project Open Book. Each vendor used the results of the requirements analysis to generate and submit a formal proposal for the entire project. This competitive process not only enabled the Yale Library to select its primary vendor partner but also revealed much about the state of the imaging industry, about the varying, corporate commitments to the markets for imaging technology, and about the varying organizational dynamics through which major companies are responding to and seeking to develop these markets.
Project Open Book is focused specifically on the conversion of images from microfilm to digital form. The technology for scanning microfilm is readily available but is changing rapidly. In its project requirements, the Yale Library emphasized features of the technology that affect the technical quality of digital image production and the costs of creating and storing the image library: What levels of digital resolution can be achieved by scanning microfilm? How does variation in the quality of microfilm, particularly in film produced to preservation standards, affect the quality of the digital images? What technologies can an operator effectively and economically apply when scanning film to separate two-up images and to control for and correct image imperfections? How can quality control best be integrated into digitizing work flow that includes document indexing and storage?
The actual and expected uses of digital images--storage, browsing, printing, and OCR--help determine the standards for measuring their quality. Browsing is especially important, but the facilities available for readers to browse image documents is perhaps the weakest aspect of imaging technology and most in need of development. As it defined its requirements, the Yale Library concentrated on some fundamental aspects of usability for image documents: Does the system have sufficient flexibility to handle the full range of document types, including monographs, multi-part and multivolume sets, and serials, as well as manuscript collections? What conventions are necessary to identify a document uniquely for storage and retrieval? Where is the database of record for storing bibliographic information about the image document? How are basic internal structures of documents, such as pagination, made accessible to the reader? How are the image documents physically presented on the screen to the reader?
The Yale Library designed Project Open Book on the assumption that microfilm is more than adequate as a medium for preserving the content of deteriorated library materials. As planning in the project has advanced, it is increasingly clear that the challenge of digital image technology and the key to the success of efforts like Project Open Book is to provide a means of both preserving and improving access to those deteriorated materials.
SESSION IV-B
George THOMA
In the use of electronic imaging for document preservation, there are several issues to consider, such as: ensuring adequate image quality, maintaining substantial conversion rates (through-put), providing unique identification for automated access and retrieval, and accommodating bound volumes and fragile material.
To maintain high image quality, image processing functions are required to correct the deficiencies in the scanned image. Some commercially available systems include these functions, while some do not. The scanned raw image must be processed to correct contrast deficiencies-- both poor overall contrast resulting from light print and/or dark background, and variable contrast resulting from stains and bleed-through. Furthermore, the scan density must be adequate to allow legibility of print and sufficient fidelity in the pseudo-halftoned gray material. Borders or page-edge effects must be removed for both compactibility and aesthetics. Page skew must be corrected for aesthetic reasons and to enable accurate character recognition if desired. Compound images consisting of both two-toned text and gray-scale illustrations must be processed appropriately to retain the quality of each.
SESSION IV-C
Jean BARONAS
Standards publications being developed by scientists, engineers, and business managers in Association for Information and Image Management (AIIM) standards committees can be applied to electronic image management (EIM) processes including: document (image) transfer, retrieval and evaluation; optical disk and document scanning; and document design and conversion. When combined with EIM system planning and operations, standards can assist in generating image databases that are interchangeable among a variety of systems. The applications of different approaches for image-tagging, indexing, compression, and transfer often cause uncertainty concerning EIM system compatibility, calibration, performance, and upward compatibility, until standard implementation parameters are established. The AIIM standards that are being developed for these applications can be used to decrease the uncertainty, successfully integrate imaging processes, and promote "open systems." AIIM is an accredited American National Standards Institute (ANSI) standards developer with more than twenty committees comprised of 300 volunteers representing users, vendors, and manufacturers. The standards publications that are developed in these committees have national acceptance and provide the basis for international harmonization in the development of new International Organization for Standardization (ISO) standards.
This presentation describes the development of AIIM's EIM standards and a new effort at AIIM, a database on standards projects in a wide framework of imaging industries including capture, recording, processing, duplication, distribution, display, evaluation, and preservation. The AIIM Imagery Database will cover imaging standards being developed by many organizations in many different countries. It will contain standards publications' dates, origins, related national and international projects, status, key words, and abstracts. The ANSI Image Technology Standards Board requested that such a database be established, as did the ISO/International Electrotechnical Commission Joint Task Force on Imagery. AIIM will take on the leadership role for the database and coordinate its development with several standards developers.
Patricia BATTIN
Characteristics of standards for digital imagery:
* Nature of digital technology implies continuing volatility.
* Precipitous standard-setting not possible and probably not desirable.
* Standards are a complex issue involving the medium, the hardware, the software, and the technical capacity for reproductive fidelity and clarity.
* The prognosis for reliable archival standards (as defined by librarians) in the foreseeable future is poor.
Significant potential and attractiveness of digital technology as a preservation medium and access mechanism.
Productive use of digital imagery for preservation requires a reconceptualizing of preservation principles in a volatile, standardless world.
Concept of managing continuing access in the digital environment rather than focusing on the permanence of the medium and long-term archival standards developed for the analog world.
Transition period: How long and what to do?
* Redefine "archival."
* Remove the burden of "archival copy" from paper artifacts.
* Use digital technology for storage, develop management strategies for refreshing medium, hardware and software.
* Create acid-free paper copies for transition period backup until we develop reliable procedures for ensuring continuing access to digital files.
SESSION IV-D
Stuart WEIBEL The Role of SGML Markup in the CORE Project (6)
The emergence of high-speed telecommunications networks as a basic feature of the scholarly workplace is driving the demand for electronic document delivery. Three distinct categories of electronic publishing/republishing are necessary to support access demands in this emerging environment:
1.) Conversion of paper or microfilm archives to electronic format 2.) Conversion of electronic files to formats tailored to electronic retrieval and display 3.) Primary electronic publishing (materials for which the electronic version is the primary format)
OCLC has experimental or product development activities in each of these areas. Among the challenges that lie ahead is the integration of these three types of information stores in coherent distributed systems.
The CORE (Chemistry Online Retrieval Experiment) Project is a model for the conversion of large text and graphics collections for which electronic typesetting files are available (category 2). The American Chemical Society has made available computer typography files dating from 1980 for its twenty journals. This collection of some 250 journal-years is being converted to an electronic format that will be accessible through several end-user applications.
The use of Standard Generalized Markup Language (SGML) offers the means to capture the structural richness of the original articles in a way that will support a variety of retrieval, navigation, and display options necessary to navigate effectively in very large text databases.
An SGML document consists of text that is marked up with descriptive tags that specify the function of a given element within the document. As a formal language construct, an SGML document can be parsed against a document-type definition (DTD) that unambiguously defines what elements are allowed and where in the document they can (or must) occur. This formalized map of article structure allows the user interface design to be uncoupled from the underlying database system, an important step toward interoperability. Demonstration of this separability is a part of the CORE project, wherein user interface designs born of very different philosophies will access the same database.
NOTES: (6) The CORE project is a collaboration among Cornell University's Mann Library, Bell Communications Research (Bellcore), the American Chemical Society (ACS), the Chemical Abstracts Service (CAS), and OCLC.
Michael LESK The CORE Electronic Chemistry Library
A major on-line file of chemical journal literature complete with graphics is being developed to test the usability of fully electronic access to documents, as a joint project of Cornell University, the American Chemical Society, the Chemical Abstracts Service, OCLC, and Bellcore (with additional support from Sun Microsystems, Springer-Verlag, DigitaI Equipment Corporation, Sony Corporation of America, and Apple Computers). Our file contains the American Chemical Society's on-line journals, supplemented with the graphics from the paper publication. The indexing of the articles from Chemical Abstracts Documents is available in both image and text format, and several different interfaces can be used. Our goals are (1) to assess the effectiveness and acceptability of electronic access to primary journals as compared with paper, and (2) to identify the most desirable functions of the user interface to an electronic system of journals, including in particular a comparison of page-image display with ASCII display interfaces. Early experiments with chemistry students on a variety of tasks suggest that searching tasks are completed much faster with any electronic system than with paper, but that for reading all versions of the articles are roughly equivalent.
Pamela ANDRE and Judith ZIDAR
Text conversion is far more expensive and time-consuming than image capture alone. NAL's experience with optical character recognition (OCR) will be related and compared with the experience of having text rekeyed. What factors affect OCR accuracy? How accurate does full text have to be in order to be useful? How do different users react to imperfect text? These are questions that will be explored. For many, a service bureau may be a better solution than performing the work inhouse; this will also be discussed.
SESSION VI
Marybeth PETERS
Copyright law protects creative works. Protection granted by the law to authors and disseminators of works includes the right to do or authorize the following: reproduce the work, prepare derivative works, distribute the work to the public, and publicly perform or display the work. In addition, copyright owners of sound recordings and computer programs have the right to control rental of their works. These rights are not unlimited; there are a number of exceptions and limitations.
An electronic environment places strains on the copyright system. Copyright owners want to control uses of their work and be paid for any use; the public wants quick and easy access at little or no cost. The marketplace is working in this area. Contracts, guidelines on electronic use, and collective licensing are in use and being refined.
Issues concerning the ability to change works without detection are more difficult to deal with. Questions concerning the integrity of the work and the status of the changed version under the copyright law are to be addressed. These are public policy issues which require informed dialogue.
*** *** *** ****** *** *** ***
Appendix III: DIRECTORY OF PARTICIPANTS
PRESENTERS:
Pamela Q.J. Andre Associate Director, Automation National Agricultural Library 10301 Baltimore Boulevard Beltsville, MD 20705-2351 Phone: (301) 504-6813 Fax: (301) 504-7473 E-mail: INTERNET: PANDRE@ASRR.ARSUSDA.GOV
Jean Baronas, Senior Manager Department of Standards and Technology Association for Information and Image Management (AIIM) 1100 Wayne Avenue, Suite 1100 Silver Spring, MD 20910 Phone: (301) 587-8202 Fax: (301) 587-2711
Patricia Battin, President The Commission on Preservation and Access 1400 16th Street, N.W. Suite 740 Washington, DC 20036-2217 Phone: (202) 939-3400 Fax: (202) 939-3407 E-mail: CPA@GWUVM.BITNET
Howard Besser Centre Canadien d'Architecture (Canadian Center for Architecture) 1920, rue Baile Montreal, Quebec H3H 2S6 CANADA Phone: (514) 939-7001 Fax: (514) 939-7020 E-mail: howard@lis.pitt.edu
Edwin B. Brownrigg, Executive Director Memex Research Institute 422 Bonita Avenue Roseville, CA 95678 Phone: (916) 784-2298 Fax: (916) 786-7559 E-mail: BITNET: MEMEX@CALSTATE.2
Eric M. Calaluca, Vice President Chadwyck-Healey, Inc. 1101 King Street Alexandria, VA 223l4 Phone: (800) 752-05l5 Fax: (703) 683-7589
James Daly 4015 Deepwood Road Baltimore, MD 21218-1404 Phone: (410) 235-0763
Ricky Erway, Associate Coordinator American Memory Library of Congress Phone: (202) 707-6233 Fax: (202) 707-3764
Carl Fleischhauer, Coordinator American Memory Library of Congress Phone: (202) 707-6233 Fax: (202) 707-3764
Joanne Freeman 2000 Jefferson Park Avenue, No. 7 Charlottesville, VA 22903
Prosser Gifford Director for Scholarly Programs Library of Congress Phone: (202) 707-1517 Fax: (202) 707-9898 E-mail: pgif@seq1.loc.gov
Jacqueline Hess, Director National Demonstration Laboratory for Interactive Information Technologies Library of Congress Phone: (202) 707-4157 Fax: (202) 707-2829
Susan Hockey, Director Center for Electronic Texts in the Humanities (CETH) Alexander Library Rutgers University 169 College Avenue New Brunswick, NJ 08903 Phone: (908) 932-1384 Fax: (908) 932-1386 E-mail: hockey@zodiac.rutgers.edu
William L. Hooton, Vice President Business & Technical Development Imaging & Information Systems Group I-NET 6430 Rockledge Drive, Suite 400 Bethesda, MD 208l7 Phone: (301) 564-6750 Fax: (513) 564-6867
Anne R. Kenney, Associate Director Department of Preservation and Conservation 701 Olin Library Cornell University Ithaca, NY 14853 Phone: (607) 255-6875 Fax: (607) 255-9346 E-mail: LYDY@CORNELLA.BITNET
Ronald L. Larsen Associate Director for Information Technology University of Maryland at College Park Room B0224, McKeldin Library College Park, MD 20742-7011 Phone: (301) 405-9194 Fax: (301) 314-9865 E-mail: rlarsen@libr.umd.edu
Maria L. Lebron, Managing Editor The Online Journal of Current Clinical Trials l333 H Street, N.W. Washington, DC 20005 Phone: (202) 326-6735 Fax: (202) 842-2868 E-mail: PUBSAAAS@GWUVM.BITNET
Michael Lesk, Executive Director Computer Science Research Bell Communications Research, Inc. Rm 2A-385 445 South Street Morristown, NJ 07960-l9l0 Phone: (201) 829-4070 Fax: (201) 829-5981 E-mail: lesk@bellcore.com (Internet) or bellcore!lesk (uucp)
Clifford A. Lynch Director, Library Automation University of California, Office of the President 300 Lakeside Drive, 8th Floor Oakland, CA 94612-3350 Phone: (510) 987-0522 Fax: (510) 839-3573 E-mail: calur@uccmvsa
Avra Michelson National Archives and Records Administration NSZ Rm. 14N 7th & Pennsylvania, N.W. Washington, D.C. 20408 Phone: (202) 501-5544 Fax: (202) 501-5533 E-mail: tmi@cu.nih.gov
Elli Mylonas, Managing Editor Perseus Project Department of the Classics Harvard University 319 Boylston Hall Cambridge, MA 02138 Phone: (617) 495-9025, (617) 495-0456 (direct) Fax: (617) 496-8886 E-mail: Elli@IKAROS.Harvard.EDU or elli@wjh12.harvard.edu
David Woodley Packard Packard Humanities Institute 300 Second Street, Suite 201 Los Altos, CA 94002 Phone: (415) 948-0150 (PHI) Fax: (415) 948-5793
Lynne K. Personius, Assistant Director Cornell Information Technologies for Scholarly Information Sources 502 Olin Library Cornell University Ithaca, NY 14853 Phone: (607) 255-3393 Fax: (607) 255-9346 E-mail: JRN@CORNELLC.BITNET
Marybeth Peters Policy Planning Adviser to the Register of Copyrights Library of Congress Office LM 403 Phone: (202) 707-8350 Fax: (202) 707-8366
C. Michael Sperberg-McQueen Editor, Text Encoding Initiative Computer Center (M/C 135) University of Illinois at Chicago Box 6998 Chicago, IL 60680 Phone: (312) 413-0317 Fax: (312) 996-6834 E-mail: u35395@uicvm..cc.uic.edu or u35395@uicvm.bitnet
George R. Thoma, Chief Communications Engineering Branch National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894 Phone: (301) 496-4496 Fax: (301) 402-0341 E-mail: thoma@lhc.nlm.nih.gov
Dorothy Twohig, Editor The Papers of George Washington 504 Alderman Library University of Virginia Charlottesville, VA 22903-2498 Phone: (804) 924-0523 Fax: (804) 924-4337
Susan H. Veccia, Team leader American Memory, User Evaluation Library of Congress American Memory Evaluation Project Phone: (202) 707-9104 Fax: (202) 707-3764 E-mail: svec@seq1.loc.gov
Donald J. Waters, Head Systems Office Yale University Library New Haven, CT 06520 Phone: (203) 432-4889 Fax: (203) 432-7231 E-mail: DWATERS@YALEVM.BITNET or DWATERS@YALEVM.YCC.YALE.EDU
Stuart Weibel, Senior Research Scientist OCLC 6565 Frantz Road Dublin, OH 43017 Phone: (614) 764-608l Fax: (614) 764-2344 E-mail: INTERNET: Stu@rsch.oclc.org
Robert G. Zich Special Assistant to the Associate Librarian for Special Projects Library of Congress Phone: (202) 707-6233 Fax: (202) 707-3764 E-mail: rzic@seq1.loc.gov
Judith A. Zidar, Coordinator National Agricultural Text Digitizing Program Information Systems Division National Agricultural Library 10301 Baltimore Boulevard Beltsville, MD 20705-2351 Phone: (301) 504-6813 or 504-5853 Fax: (301) 504-7473 E-mail: INTERNET: JZIDAR@ASRR.ARSUSDA.GOV
OBSERVERS:
Helen Aguera, Program Officer Division of Research Room 318 National Endowment for the Humanities 1100 Pennsylvania Avenue, N.W. Washington, D.C. 20506 Phone: (202) 786-0358 Fax: (202) 786-0243
M. Ellyn Blanton, Deputy Director National Demonstration Laboratory for Interactive Information Technologies Library of Congress Phone: (202) 707-4157 Fax: (202) 707-2829
Charles M. Dollar National Archives and Records Administration NSZ Rm. 14N 7th & Pennsylvania, N.W. Washington, DC 20408 Phone: (202) 501-5532 Fax: (202) 501-5512
Jeffrey Field, Deputy to the Director Division of Preservation and Access Room 802 National Endowment for the Humanities 1100 Pennsylvania Avenue, N.W. Washington, DC 20506 Phone: (202) 786-0570 Fax: (202) 786-0243
Lorrin Garson American Chemical Society Research and Development Department 1155 16th Street, N.W. Washington, D.C. 20036 Phone: (202) 872-4541 Fax: E-mail: INTERNET: LRG96@ACS.ORG
William M. Holmes, Jr. National Archives and Records Administration NSZ Rm. 14N 7th & Pennsylvania, N.W. Washington, DC 20408 Phone: (202) 501-5540 Fax: (202) 501-5512 E-mail: WHOLMES@AMERICAN.EDU
Sperling Martin Information Resource Management 20030 Doolittle Street Gaithersburg, MD 20879 Phone: (301) 924-1803
Michael Neuman, Director The Center for Text and Technology Academic Computing Center 238 Reiss Science Building Georgetown University Washington, DC 20057 Phone: (202) 687-6096 Fax: (202) 687-6003 E-mail: neuman@guvax.bitnet, neuman@guvax.georgetown.edu
Barbara Paulson, Program Officer Division of Preservation and Access Room 802 National Endowment for the Humanities 1100 Pennsylvania Avenue, N.W. Washington, DC 20506 Phone: (202) 786-0577 Fax: (202) 786-0243
Allen H. Renear Senior Academic Planning Analyst Brown University Computing and Information Services 115 Waterman Street Campus Box 1885 Providence, R.I. 02912 Phone: (401) 863-7312 Fax: (401) 863-7329 E-mail: BITNET: Allen@BROWNVM or INTERNET: Allen@brownvm.brown.edu
Susan M. Severtson, President Chadwyck-Healey, Inc. 1101 King Street Alexandria, VA 223l4 Phone: (800) 752-05l5 Fax: (703) 683-7589
Frank Withrow U.S. Department of Education 555 New Jersey Avenue, N.W. Washington, DC 20208-5644 Phone: (202) 219-2200 Fax: (202) 219-2106
(LC STAFF)
Linda L. Arret Machine-Readable Collections Reading Room LJ 132 (202) 707-1490
John D. Byrum, Jr. Descriptive Cataloging Division LM 540 (202) 707-5194
Mary Jane Cavallo Science and Technology Division LA 5210 (202) 707-1219
Susan Thea David Congressional Research Service LM 226 (202) 707-7169
Robert Dierker Senior Adviser for Multimedia Activities LM 608 (202) 707-6151
William W. Ellis Associate Librarian for Science and Technology LM 611 (202) 707-6928
Ronald Gephart Manuscript Division LM 102 (202) 707-5097
James Graber Information Technology Services LM G51 (202) 707-9628
Rich Greenfield American Memory LM 603 (202) 707-6233
Rebecca Guenther Network Development LM 639 (202) 707-5092
Kenneth E. Harris Preservation LM G21 (202) 707-5213
Staley Hitchcock Manuscript Division LM 102 (202) 707-5383
Bohdan Kantor Office of Special Projects LM 612 (202) 707-0180
John W. Kimball, Jr Machine-Readable Collections Reading Room LJ 132 (202) 707-6560
Basil Manns Information Technology Services LM G51 (202) 707-8345
Sally Hart McCallum Network Development LM 639 (202) 707-6237
Dana J. Pratt Publishing Office LM 602 (202) 707-6027
Jane Riefenhauser American Memory LM 603 (202) 707-6233
William Z. Schenck Collections Development LM 650 (202) 707-7706
Chandru J. Shahani Preservation Research and Testing Office (R&T) LM G38 (202) 707-5607
William J. Sittig Collections Development LM 650 (202) 707-7050
Paul Smith Manuscript Division LM 102 (202) 707-5097
James L. Stevens Information Technology Services LM G51 (202) 707-9688
Karen Stuart Manuscript Division LM 130 (202) 707-5389
Tamara Swora Preservation Microfilming Office LM G05 (202) 707-6293
Sarah Thomas Collections Cataloging LM 642 (202) 707-5333
END *************************************************************
Note: This file has been edited for use on computer networks. This editing required the removal of diacritics, underlining, and fonts such as italics and bold.
kde 11/92