A Brief Guide to Canonical Text Services

Contents

What is CTS?

Canonical Text Services identify and retrieve passages of text cited by canonical reference. Citations are expressed as CTS URNs. Text passages are structured in XML that can be validated against some schema or DTD.

Where CTS URNs define a permanent notation for citing texts, independent of any technology, Canonical Text Services provide a network service that can equate XML documents with the work referred to by a CTS URN, and can retrieve a well-formed XML fragment for a passage referred to in a CTS URN.

See Contents

The CTS architecture and design goals

The Canonical Text Services protocol defines interaction between a client and a server program using the HTTP protocol: clients submit requests, with parameters included as HTTP GET parameters; the CTS response is structured in XML validating against the CTS reply schemas. While a user could therefore interact directly with a CTS by pointing a web browser at URLs formed according to the CTS specification, the purpose of the service is to provide services to software that recognizes CTS URNs.

The vocabulary of requests (highlights summarized below) allows a client to discover metadata about the collection of texts served by a specific CTS instance, as well as to retrieve passages of text.

The server's metadata catalog, called a "text inventory," identifies a means (such as a Relax NG schema) for validating the XML realization of a document, and describes how the canonical citation scheme of the CTS URN maps on to the XML representation.

Version 3 of CTS introduced three important changes. First, in CTS 3, documents may validate against any standard method chosen by the service's administrator, such as Relax NG schemas, XML schemas, or DTDs. As part of this change, CTS 3 now supports XML namespaces. Second, different parts of a document may be cited using different citation schemes. (E.g., a preface might be cited differently from the main body of a work.) Third, an optional extension that implementations may choose either to support or ignore deals with the topological relation of URNs. (For more information, see URN topology.)

See Contents

Interacting with a CTS: the principal requests

Programs (and the programmers who write them) can interact with a CTS using any of the nine defined requests. The request name is always included in an HTTP parameter named request; for all requests except the metadata request GetCapabilities, a CTS URN is always included in an HTTP parameter named urn. Consider this possible series of exchanges between a client program interested in hexameter poetry, and a CTS at the address http://machine/service.

See Contents

Managing a CTS: the TextInventory

A CTS implementation might manage the service's metadata in any way it chooses. It might store the data in a database with a form-based user interface, for example. But the metadata is presented to client applications as XML validating against the CTS TextInventory schema, so we will survey the main components of the TextInventory as they appear serialized to XML.

The TextInventory includes three main parts: a list of standard citation schemes; a list of the individual TextGroups, Works, Editions, Translations, and Exemplars of documents known to the server; and a list of organization units called Collections. The list of groups, works, etc., is a hierarchical organization used to identify works uniquely, according to some familiar, well established convention. The collections on the other hand allow the administrator of a CTS to group sets of works together for any purpose.

Of these three sections, the most important is the list of groups and works. It is organized as follows

The Text Inventory: Groups and Works

The list of works contains a list of…

So, for example, a TextInventory entry for the Homeric Hymn to Athena could contain the following information:

TextGroup: tlg0013 (Homeric Hymns)

Work: tlg011 (Hymn to Athena)

Edition: chs01 (CHS electronic edition based on readings in Allen's OCT edition)

Online: local document reference = tlg0013/tlg0013.tlg011.chs02.xml

Translation: chs02 (English translation by Hugh Evelyn-White now in the public domain)

&c.

Each Online element—be it an edition, translation, or exemplar— contains three elements: one identifies how the XML document can be validated, a second identifies the citation scheme with an identifier from the list of citation schemes used in this service, and a third element contains a recursive list of citation elements mapping each level of the citation scheme to part of the XML document. Our example of the Homeric Hymn to Athena cites by a single level, the poetic line, and could be documented like this:

<online docname="tlg0013/tlg0013.tlg011.chs01.xml" srcid="OCT">
  <validate schema="http://katoptron.holycross.edu/schemas/teip5/teip5core.rng"/>
  <citationScheme schemaId="poeticline" canonical="yes"/>
  <citationMapping defaultNSAbbr="tei">
    <citation label="line" xpath="/l[@n = '?']" scope="/TEI/text/body"/>
  </citationMapping>
</online>

An online element for the two-tiered citation of the Iliad illustrates the usage of the citation element's scope and xpath attributes. Each provide templates for XPath expressions, in which question marks (?) can be replaced by the value of one level of a citation. The xpath attribute identifies an XML unit corresponding to a level of the citation scheme; the scope attribute identifies a context in the document where this xpath applies. (The two are distinct because a document's markup might include markup between levels of the citation scheme.)

<online docname="tlg0012/tlg0012.tlg001.hmt-msA.xml" srcid="OCT">
  <validate schema="http://katoptron.holycross.edu/schemas/teip5/teip5core.rng"/>
  <citationScheme schemaId="bookAndPoeticline" canonical="yes"/>
  <citationMapping defaultNSAbbr="tei">
    <citation label="book" xpath="/div[@type='book' @n = '?']" scope="/TEI/text/body">
      <citation label="line" xpath="/l[@n = '?']" scope="/TEI/text/body/div[@type = 'book' and @n = '?']">
    </citation>
  </citationMapping>
</online>

See Contents

Further information

More detailed information about version 3 of CTS is currently in preparation; links will be posted here when it is made available from the project's sourceforge site.