The OSU Library
Electronic Publishing Center
The OSU Library participates in the creation and maintenance
of the emerging global digital library by digitizing and sharing
The OSU Library Electronic Publishing Center, founded in 1996,
will pursue this vision by expanding and enhancing access to published
and unpublished materials of potential interest to the academic
community and general public, especially those unique to OSU or
the State of Oklahoma.
There is a wealth of information in older printed materials and
in special collections documents such as letters and diaries.
However, many of these documents are in poor condition, and are
too fragile for frequent use. It is important to capture a digital
copy of these works before they deteriorate completely. Researchers
can use a digitized version for most purposes, saving wear and
tear on the original. By digitizing the unique documents in special
collections and archives, we make them available to a far wider
audience. Researchers no longer have to travel to the place where
the document is held; the document can come to them. People who
would never be allowed to handle rare documents - schoolchildren,
college students, casual researchers, hobbyists - can actually
use these historical artifacts in their studies.
Digitization is a long and complicated process. There are
many steps involved, as illustrated in the flowchart below.
Every project is different, but the four basic stages include:
Stage 1. Select Material
Stage 2. Convert normal text into electronic text
Stage 3. Format electronic text for the Internet
Stage 4. Create website for access and navigation
To learn more about each step in the process, click any box in
In order to be considered for digitization, materials must go
through a selection process. To determine eligibility materials
should fulfill the following criteria:
- Meet the research needs of faculty, students, and scholars
within and beyond the OSU Community. In assessing what material
meets the needs of our constituency, consideration should be
given to the scholarly content of the material; the uniqueness
of the material; and the demand for the material.
- Benefit from increased access and should contribute to the
Library's service and collection development missions. Materials
that are difficult to access in their original formats or that
would benefit from increased speed or depth of access via electronic
delivery formats should be given priority.
- Have clear ownership and copyright clearance. Before a digitization
project is undertaken, the Library needs to secure sound legal
advice about the ownership and rights to reproduce or publish
- Be of interest to potential partners. Materials that would
be of interest to campus and outside partners, both collaborators
on the content and potential sources of funding and other support,
should be given strong consideration.
Also, before selecting materials, consideration for their preservation
is made from the following perspectives: a) Items should not be
digitized wherein the scanning process is detrimental to the item
itself; b) Items that receive heavy patron use and are quickly
deteriorating should be selected for imaging in order to preserve
the original. Although data migration is an ongoing concern, digital
editions will not be considered preservation quality reformatting
for original editions until technological issues are resolved
and standards are widely accepted.
A specific checklist of attributes, access, infrastructure and
preservation concerns are included in the "Suggested Collections/Materials
to be Digitized" form, available on the Library's web site
or from the Suggestions link on the navigation bar. The
Collection Development Committee will make decisions as to which
suggested materials will be chosen for digitization. Established
collection development criteria and policies will be utilized.
Selection for digitization requires that materials have enduring
value and be available in a sufficient number or quantity that
they form a significant and unique research corpus. Further, the
decision to digitize must take into account many factors, as evidenced
by the criteria on "Suggested
Collections/Materials to be Digitized" form.
In selecting materials, the OSU Library will actively seek out
partners, both collaborators on specific projects and supporting
partners to supply funding or technical assistance. Institutions
such as the Oklahoma Department of Libraries, the State Historical
Society, other academic libraries, and other organizations in
Oklahoma or out of state will be approached for long-range planning
on digitization projects. Foundations and/or corporate sponsors
will be approached, and the Director of Library Development and
Outreach will facilitate the Library's efforts to prepare grants
and solicit monies from funding agencies and corporations. In
addition, the Library respects cultural traditions of different
ethnic and racial groups in preparing its digital collections;
consultation with tribal or other interested organizations will
be conducted prior to digitizing potentially sensitive materials.
Copyright: The #1 Concern
Securing copyright permission is an overriding concern
with all projects. The most immediate problem involving copyright
and digitization is identifying what collections or parts of collections
can be legally mounted on our web server. The rigor of establishing
copyright clearance is not grounds for automatic dismissal of
potential projects; however, ease of establishing permission will
influence the priority of projects. Digitization projects with
clear rights or easily obtained rights should be undertaken first.
While these projects are undertaken, rights can be sought for
Back to flowchart.
Many of the materials to be digitized will be in a deteriorating
state. We will perform all necessary repairs to the original materials
before beginning digitization. Preservation of the original is
our primary concern, and we will take every precaution to protect
the originals from damage. While digitization of fragile materials
can prevent wear and tear on the original and can thus act as
a preservation tool, it is in no way a substitute for the original
To Scan or Re-key?
The condition of the materials will determine how they are
converted to electronic form. Very fragile materials, anything
printed before 1940, and any manuscripts will have to be re-typed,
because the optical character recognition ("OCR") software
used to convert a scanned image to text will be unable to recognize
the textual characters. We use an overhead scanning device that
is less damaging to books than a flatbed scanner. If the print
is clear enough to OCR, the documents will be scanned, OCR'd,
and saved as text files. Whether scanned and OCR'd or re-keyed,
all text will be proofread. Our goal is 99.95% accuracy.
Back to flowchart.
Web Design and XML
Standards for metadata, scanning and storage developed by
the Colorado Digitization Project (now a part of the BCR Collaborative Digitization Program) will be utilized. The BCR CDP Best Practices & Publications are available at
It is most desirable to employ non-application specific encoding,
such as XML, as this is the standard used by the major digitization
projects internationally. XML (Extensible Markup Lnguage) is an
application-, platform- and vendor-imdependent format that allows
you to mark up a text's structure rather than just specify the
layout and appearance as we do in HTML. By using XML, we achieve
- The structural mark-up indicates the major divisions of the
text (e.g., "chapter", "section", "verse")
AND various characteristics of the text (names of people and
places, dates, spelling irregularities).
- The file is in an archival format that will easily migrate
to new platforms as they emerge.
- XML is emerging as the new standard on the Web. We anticipate
that there will be affordable software available in the near
future that will allow us to take advantage of XML's structural
nature (e.g., fielded seraching).
Once we have scanned, OCR'd, or re-keyed the text, it will be
saved as a plain text file. We will then encode it in XML using
the TEI-Lite DTD. A DTD, or Document Type Definition, sets the
rules for an XML document. The TEI-Lite DTD was developed specifically
for text encoding in humanities disciplines by the
Text Encoding Initiative. This
is one of the most time-consuming aspects of any project. We will
index the XML text for searching using an indexing program.
In order to display the XML files on the web, we must
prepare stylesheets that will tell browsers how to display the
files. The staff will design the website, and we are then ready
to present the collection on the web. Depending on the size
and complexity of a project, and because of our dedication to
preservation and accuracy, it can take several months to complete
a project. Once a project is finished, however, the final product
may be used and enjoyed by countless people for years to come.
collections to view the results
of our efforts. Back
The OSU Library Electronic Publishing Center is located at
103 Oklahoma State University Library Annex
Stillwater, OK 74078