Skip Navigation

  About the Center


  Terms of Use



  Contact Us

  Best Practices


  Operations Manual

  EPC Home

  OSU Library Home

  OSU Home

  University Catalogs

The OSU Library
Electronic Publishing Center

The Vision | The Process | The Staff

The Vision...

The OSU Library participates in the creation and maintenance of the emerging global digital library by digitizing and sharing electronic information.

The OSU Library Electronic Publishing Center, founded in 1996, will pursue this vision by expanding and enhancing access to published and unpublished materials of potential interest to the academic community and general public, especially those unique to OSU or the State of Oklahoma.

The Process...

Why Digitize?
There is a wealth of information in older printed materials and in special collections documents such as letters and diaries. However, many of these documents are in poor condition, and are too fragile for frequent use. It is important to capture a digital copy of these works before they deteriorate completely. Researchers can use a digitized version for most purposes, saving wear and tear on the original. By digitizing the unique documents in special collections and archives, we make them available to a far wider audience. Researchers no longer have to travel to the place where the document is held; the document can come to them. People who would never be allowed to handle rare documents - schoolchildren, college students, casual researchers, hobbyists - can actually use these historical artifacts in their studies.

Digitization is a long and complicated process.  There are many steps involved, as illustrated in the flowchart below.  Every project is different, but the four basic stages include:

Stage 1. Select Material

Stage 2. Convert normal text into electronic text

Stage 3. Format electronic text for the Internet

Stage 4. Create website for access and navigation

To learn more about each step in the process, click any box in the flowchart.

Step 1: Selection Step 2: Copyright Clearance Step 3: Preservation of Originals Step 4: Scanning and OCR or Rekeying Step 5: XML Mark-up

Digitization Process Flow Chart

Stage 1

In order to be considered for digitization, materials must go through a selection process.  To determine eligibility materials should fulfill the following criteria:

  • Meet the research needs of faculty, students, and scholars within and beyond the OSU Community. In assessing what material meets the needs of our constituency, consideration should be given to the scholarly content of the material; the uniqueness of the material; and the demand for the material.
  • Benefit from increased access and should contribute to the Library's service and collection development missions. Materials that are difficult to access in their original formats or that would benefit from increased speed or depth of access via electronic delivery formats should be given priority.
  • Have clear ownership and copyright clearance. Before a digitization project is undertaken, the Library needs to secure sound legal advice about the ownership and rights to reproduce or publish materials electronically.
  • Be of interest to potential partners. Materials that would be of interest to campus and outside partners, both collaborators on the content and potential sources of funding and other support, should be given strong consideration.

Also, before selecting materials, consideration for their preservation is made from the following perspectives: a) Items should not be digitized wherein the scanning process is detrimental to the item itself; b) Items that receive heavy patron use and are quickly deteriorating should be selected for imaging in order to preserve the original. Although data migration is an ongoing concern, digital editions will not be considered preservation quality reformatting for original editions until technological issues are resolved and standards are widely accepted.

A specific checklist of attributes, access, infrastructure and preservation concerns are included in the "Suggested Collections/Materials to be Digitized" form, available on the Library's web site at, or from the Suggestions link on the navigation bar. The Collection Development Committee will make decisions as to which suggested materials will be chosen for digitization. Established collection development criteria and policies will be utilized. Selection for digitization requires that materials have enduring value and be available in a sufficient number or quantity that they form a significant and unique research corpus. Further, the decision to digitize must take into account many factors, as evidenced by the criteria on "Suggested Collections/Materials to be Digitized" form.

In selecting materials, the OSU Library will actively seek out partners, both collaborators on specific projects and supporting partners to supply funding or technical assistance. Institutions such as the Oklahoma Department of Libraries, the State Historical Society, other academic libraries, and other organizations in Oklahoma or out of state will be approached for long-range planning on digitization projects. Foundations and/or corporate sponsors will be approached, and the Director of Library Development and Outreach will facilitate the Library's efforts to prepare grants and solicit monies from funding agencies and corporations. In addition, the Library respects cultural traditions of different ethnic and racial groups in preparing its digital collections; consultation with tribal or other interested organizations will be conducted prior to digitizing potentially sensitive materials. Back to flowchart.

Copyright: The #1 Concern
Securing copyright permission is an overriding concern with all projects. The most immediate problem involving copyright and digitization is identifying what collections or parts of collections can be legally mounted on our web server. The rigor of establishing copyright clearance is not grounds for automatic dismissal of potential projects; however, ease of establishing permission will influence the priority of projects. Digitization projects with clear rights or easily obtained rights should be undertaken first. While these projects are undertaken, rights can be sought for subsequent projects. Back to flowchart.

Many of the materials to be digitized will be in a deteriorating state. We will perform all necessary repairs to the original materials before beginning digitization. Preservation of the original is our primary concern, and we will take every precaution to protect the originals from damage. While digitization of fragile materials can prevent wear and tear on the original and can thus act as a preservation tool, it is in no way a substitute for the original material.

Stage 2

To Scan or Re-key?
The condition of the materials will determine how they are converted to electronic form. Very fragile materials, anything printed before 1940, and any manuscripts will have to be re-typed, because the optical character recognition ("OCR") software used to convert a scanned image to text will be unable to recognize the textual characters. We use an overhead scanning device that is less damaging to books than a flatbed scanner. If the print is clear enough to OCR, the documents will be scanned, OCR'd, and saved as text files. Whether scanned and OCR'd or re-keyed, all text will be proofread. Our goal is 99.95% accuracy. Back to flowchart.

Stage 3

Web Design and XML
Standards for metadata, scanning and storage developed by the Colorado Digitization Project (now a part of the BCR Collaborative Digitization Program) will be utilized. The BCR CDP Best Practices & Publications are available at It is most desirable to employ non-application specific encoding, such as XML, as this is the standard used by the major digitization projects internationally. XML (Extensible Markup Lnguage) is an application-, platform- and vendor-imdependent format that allows you to mark up a text's structure rather than just specify the layout and appearance as we do in HTML. By using XML, we achieve several goals:

  • The structural mark-up indicates the major divisions of the text (e.g., "chapter", "section", "verse") AND various characteristics of the text (names of people and places, dates, spelling irregularities).
  • The file is in an archival format that will easily migrate to new platforms as they emerge.
  • XML is emerging as the new standard on the Web. We anticipate that there will be affordable software available in the near future that will allow us to take advantage of XML's structural nature (e.g., fielded seraching).

Once we have scanned, OCR'd, or re-keyed the text, it will be saved as a plain text file. We will then encode it in XML using the TEI-Lite DTD. A DTD, or Document Type Definition, sets the rules for an XML document. The TEI-Lite DTD was developed specifically for text encoding in humanities disciplines by the Text Encoding Initiative. This is one of the most time-consuming aspects of any project. We will index the XML text for searching using an indexing program.

Stage 4

Finished Product
In order to display the XML files on the web, we must prepare stylesheets that will tell browsers how to display the files. The staff will design the website, and we are then ready to present the collection on the web.  Depending on the size and complexity of a project, and because of our dedication to preservation and accuracy, it can take several months to complete a project. Once a project is finished, however, the final product may be used and enjoyed by countless people for years to come. Visit our collections to view the results of our efforts. Back to top.

The OSU Library Electronic Publishing Center is located at
103 Oklahoma State University Library Annex
Stillwater, OK 74078