ALCTS - Association of Library Collections & Technical Services

CC:DA/TF/Position paper/4
Reissued as: CC:DA/TF/TEI/3

May 30, 1995

Call for CC:DA Action on the TEI Header
A Position Paper

Prepared by
Sherry Kelley, UCLA Library
Bradford Eden, NEEDS Cataloger

In Consultation With
Edward Gaynor, University of Virginia Library,
Cataloging Services Department

May 30, 1995

Please note that the purpose of this document is to facilitate the work of the Committee and to provide a means for outreach to both library and non-library cataloging communities. This document is intended for the exclusive use of CC:DA and its cataloging constituencies, and is presented for discussion in the ongoing process of rule revision. Under no circumstances should the information here be copied or re-transmitted without prior consultation with the current Chair of CC:DA.


The CC:DA charge: "To make a continuing assessment of the state of the art and suggest the direction of change in the field of descriptive cataloging; to recommend solutions to problems relating not only to bibliographic description but also to choice and form of access points, other than subject access; to initiate proposals for additions to and revisions of the cataloging code currently adopted by ALA; … to develop official ALA positions on such proposals in consultation with other appropriate ALA units and organizations in the U.S.A. … " (ALA Handbook of Organization, 1994/1995)

An important new direction for the field of descriptive cataloging is the growth of electronic "publishing" and the increase in electronic documents as a percentage of materials collected by libraries. Electronic documents are structured in many ways, from flat files such as ASCII to those encoded following international standards such as Standard Generalized Markup Language (SGML) and the Text Encoding Initiative (TEI) guidelines. We are concerned with the latter category of document in this position paper but wish to point out that the creation of TEI-conformant documents and their accompanying TEI headers is just one of many new challenges for bibliographic control and description created by electronic publishing. The Committee on Cataloging: Description and Access (CC:DA) should join, even lead, the dialogue going on in MARBI, CONSER, and various OCLC projects, including the OCLC Internet Resources Cataloging Project, concerning these materials.

It is appropriate that CC:DA do so on two counts. The first is that catalogers are struggling to describe electronic documents with a patchwork of rules from the Anglo-American Cataloguing Rules, 2nd ed., 1988 revision (AACR2r), especially Chapter nine, Computer files. These are inadequate for a number of reasons, chief of which are the strong print and commercial publisher orientation of the rules. Secondly, the number of electronic text projects to convert printed texts to TEI-conformant documents is increasing. Each TEI-conformant document carries its own "bibliographic record" in the form of a TEI header. Libraries will be the chief users of these headers as surrogates for title pages, as potential access records in their OPACs, and as source records for descriptive catalogers. CC:DA can and should help standardize the creation and use of these headers.

Text Encoding Initiative

The Text Encoding Initiative is a major international academic effort to establish guidelines for the encoding and interchange of electronic texts. (E. van Herwijnen, Practical SGML, 2nd ed., 1994. p. 53)

The encoding scheme used by the TEI guidelines is an application of a system known as the Standard Generalized Markup Language (SGML). It is an international standard (ISO 8879) for the description of marked-up electronic text. SGML defines methods of representing text in electronic form through the use of coding conventions. TEI uses a subset of SGML for its encoding scheme. By way of comparison, MARC is a markup language, as are word processing coding conventions that indicate how a document will display in print format. (TEI P3, Cumulative Draft. Chapter 2. A Gentle Introduction to SGML, p. 19.)

The TEI Header

Every TEI-conformant text has an encoded set of descriptions prefixed to it. This is known as the TEI header and consists of four major parts: file description, encoding description, text profile, and revision history. The file description is mandatory and is our chief concern in this position paper because it contains "a full bibliographical description of the computer file from which a user of the text could derive a proper bibliographic citation, or which a librarian or archivist could use in creating a catalogue entry recording its presence within a library or archive." (TEI P3, Chapter 5) In documenting information about the text, its source, its encoding and its revisions, headers "provide an analogue to the title page attached to a printed work." (Ibid.)

Seven data elements may appear in the file description (<fileDesc> element of the <teiHeader> element), three of which are mandatory. The mandatory elements are:

    title statement (<titleStmt>): information about the title of a work and those responsible for its intellectual content.

    publication statement (<publicationStmt>): information about the publication or distribution of an electronic or other text.

    source description (<sourceDesc>): bibliographic description of the copy text(s) from which an electronic text was derived or generated.

Optional elements are:

    edition statement (<editionStmt>): information about one edition of a text.

    extent (<extent>): information about the approximate size of the electronic text as stored on some carrier medium, specified in any convenience units.

    series statement (<seriesStmt>): information about the series, if any, to which a publication belongs.

    notes statement (<notesStmt>): any notes providing information about a text additional to that recorded in other parts of the bibliographic description. (Guidelines for Electronic Text Encoding and Interchange, version P3, 1994. Chapter 24.)

These elements clearly parallel descriptive cataloging areas 1 - 7 of AACR2r. Collaboration between the authors of the TEI and the descriptive cataloging community as represented by CC:DA in the further development of header elements would be mutually beneficial, since these serve as bibliographic records and as title page equivalents. Some collaboration has already occurred, as indicated by references in the TEI documentation to use of AACR2 in formulating data element definitions.

University of Virginia Library, Cataloging Services Department Experience

The University of Virginia Library's Electronic Text Center provides a good illustration of the possible interconnections between staff creating TEI-conformant documents and catalogers. University of Virginia Library was one of the first to create electronic texts and headers following TEI guidelines and to produce MARC records for its online public access catalog. Early in the project, cataloging department staff created headers following TEI, Chapter 24, AACR2r, and their own local policies. A set of procedures grew out of this process that provided sufficient guidance for staff from the Electronic Text Center to assume responsibility for the creation of headers. Once created, headers are stored in an online file to be reviewed by cataloging staff. Separate MARC records are created from the headers and the headers themselves are edited to conform to AACR2r as appropriate. For example, names are entered in the header in their authorized form. (Cataloging Procedures Manual, Chapter 12, Part B: Electronic Texts. University of Virginia Library, Cataloging Services Dept.)

A useful byproduct from this project for the cataloging community was the incorporation of suggestions made by University of Virginia Library catalogers into the TEI, third edition.


There are many electronic text projects using SGML. The Center for Electronic Texts in the Humanities, the Berkeley Finding Aid Project, and UVA Library Electronic Text Center are examples. Internationally, notable examples include the WEBDOC and RIDDLE projects. Bibliographic records in the form of headers are being created to accompany texts, and to serve as surrogate catalog records in separate hypermedia databases, with links to the texts. These databases are built on multiple and sometimes proprietary platforms. CC:DA should strongly support efforts to record data in headers according to AACR2 as means to standardize their data content, in anticipation of a time when the header can be used as a record that will appear in Online Public Access Catalogues (OPACs), either through links that are opaque to the user, or through SGML/MARC reversible mapping programs.

The potential for direct integration into the OPAC makes the TEI header a candidate for CC:DA review. Records are being created that consist of descriptive elements that must be AACR2-conformant in order to integrate, virtually or physically, with AACR2/MARC records. Not only must these records be AACR2-conformant, but AACR2 must be revised to guide cataloging staff in the preparation and use of headers for electronic texts. As stated in Chapter 5.7 of the TEI Guide, Note for Library Cataloguers: "The (TEI) file header is not a library catalogue record, and so will not make all of the distinctions essential in standard library work … It is the intention of the developers, however, to ensure that the information required for a catalogue record be retrievable from the TEI file header, and moreover that the mapping from one to the other be as simple and straightforward as possible."


We recommend that such a Task Force be formed by CC:DA to be charged with but not limited to the following:
  1. Investigate ways CC:DA and the editors of the TEI might collaborate to inform each other on implementation and development of header data elements.
  2. Consider amending AACR2 to instruct catalogers in the use of TEI headers as title page substitutes.
  3. Investigate possible collaboration with MARBI and other organizations to standardize encoding conventions in support of reversible mapping between MARC and SGML, or to support seamless integration of TEI-conformant headers into MARC databases. CC:DA would be concerned with the appropriate definition of data content, not data encoding.


To be added.

Examples from the Cataloging Procedures Manual,
Chapter 12, Part B: Electronic Texts.
University of Virginia Library, Cataloging Services Dept.

TEI header template:

<!DOCTYPE TEI.2 system 'teilite.dtd'>
  <teiHeader type=aacr2>
          The work's title [a machine-readable transcription]
          The work's author, last name first
            Creation of machine-readable version:
            creator of electronic version
            Conversion to TEI.2-conformant markup:
            University of Virginia Library Electronic
            Text Center
        ca. XXX kilobytes
          University of Virginia Library
          Charlottesville, Va.
        <idno type="ETC">
          collection and ID, e.g. Modern English, AusEmma
          <p>Place where text can be found,
             e.g. Available from: Oxford Text Archive</p>
          <p>Available commercially from:</p>
          Current year
        <p>Name of electronic series, if any</p>
          Illustrations have been included from the
            print version.
          Any other notes.
               The work's title
               The author's name, first name first
                 e.g. Editor / Translator / Annotator
             <p>Edition information, e.g. 1st ed.</p>
               place of publication
               date of publication
             Name of print series</p>
        <p>Prepared for the University of Virginia
           Library Electronic Text Center</p>
        <p>All quotation marks retained as data</p>
        <p>Spell-check and verification made against printed
           text using WordPerfect spell checker</p>
        <p>All unambiguous end-of-line hyphens have been
           removed, and the trailing part of a word has been
           joined to the preceding line.</p>
        <p id=ETC>Keywords in the header are a local
           Electronic TextCenter scheme to aid in
           establishing analytical groupings</p>
        <p>ID elements are given for each page element and
           are composed of the text's unique cryptogram and
           the given page number, as in AusEmma1 for page
           one of Jane Austen's Emma.</p>
          First published date
          languages used in the text
            fiction or non-fiction; poetry, prose, or drama
          date of changes
          who made the changes
          what was done
  <text id=xxxxxxx>
      <div0 type="xxx">

OCLC Workform for Electronic Texts:

Type:  m  Bib lvl:    m  Source:  d  Lang: eng
File: d Enc lvl: I Govt pub: s Ctry: vau
Audience:   Mod rec:   Frequen: n Regulr:    
Desc: a Dat tp: s Dates:  ____ , ____

040 VA@ |c VA@
049 VA@@
099 |a Modern |a English
1__ _ <author>
245 1_   <title> |h [computer file] / |c <author>
256 Computer data (1 file : ca. kilobytes)
260 Charlottesville, Va. : |b University of Virginia Library, |c <date>.
516 Text (SGML)
538 Mode of access: Internet. Host:
546 Text in French and English.
500 Title from TEI header.
500 Prepared for the University of Virginia Library Electronic Text Center.
508 <Credits note>
516 Conversion to TEI.2-conformant markup.
516 Tagging checked and parsed against teilite.dtd.
516 All quotation marks retained as data. All unambiguous end-of-line hyphens have been removed, and the trailing part of the word has been joined to the preceding line.
516 ID elements are given for each page element and are composed of the text's unique cryptogram and the given page number, as in AusEmma1 for page one of Jane Austen's Emma.
500 Available (commercially) from:
530 Also available as ASCII text via campus gopher GWIS.
504 Includes bibliographical references.
534 |p Transcribed from: |a author. |t title / author. ed. |c place : publisher, date. |e p. : ill. ; cm. |f (series)
534 |p Transcribed from: |n Source unknown.
6__ __  
7__ __  
710 20 University of Virginia. |b Library. |b Electronic Text Center.
856 0 Virginia.EDU |m The Electronic Text Center, Alderman Library, University of Virginia, Charlottesville, VA 22903 (804) 924-3230 |u mailto://
856 7 |u browse.html |2 http