Canonical Text Services identify and retrieve passages of text cited by canonical reference. Citations are expressed as CTS URNs. Text passages are structured in XML that can be validated against some schema or DTD.
Where CTS URNs define a permanent notation for citing texts, independent of any technology, Canonical Text Services provide a network service that can equate XML documents with the work referred to by a CTS URN, and can retrieve a well-formed XML fragment for a passage referred to in a CTS URN.
The Canonical Text Services protocol defines interaction between a client and a server program using the HTTP protocol: clients submit requests, with parameters included as HTTP GET parameters; the CTS response is structured in XML validating against the CTS reply schemas. While a user could therefore interact directly with a CTS by pointing a web browser at URLs formed according to the CTS specification, the purpose of the service is to provide services to software that recognizes CTS URNs.
The vocabulary of requests (highlights summarized below) allows a client to discover metadata about the collection of texts served by a specific CTS instance, as well as to retrieve passages of text.
The server's metadata catalog, called a "text inventory," identifies a means (such as a Relax NG schema) for validating the XML realization of a document, and describes how the canonical citation scheme of the CTS URN maps on to the XML representation.
Version 3 of CTS introduced three important changes. First, in CTS 3, documents may validate against any standard method chosen by the service's administrator, such as Relax NG schemas, XML schemas, or DTDs. As part of this change, CTS 3 now supports XML namespaces. Second, different parts of a document may be cited using different citation schemes. (E.g., a preface might be cited differently from the main body of a work.) Third, an optional extension that implementations may choose either to support or ignore deals with the topological relation of URNs. (For more information, see URN topology.)
Programs (and the programmers who write them) can interact with a CTS
using any of the nine defined requests. The request name is
always included in an HTTP parameter named
request; for all
requests
except the metadata request GetCapabilities,
a CTS URN is always included in an HTTP parameter named
urn. Consider this possible series of
exchanges between a client program interested in
hexameter poetry, and a CTS
at the address http://machine/service.
GetCapabilities: What texts does the
service include, and how do I cite them?
http://machine/service?request=GetCapabilities
The GetCapabilities request takes no
further parameters. The reply includes the complete
TextInventory, or metadata catalog, for the service.
From this information,
a client can determine everything the service
has to offer: what texts are online, what their
citation scheme looks like, whether the service supports
optional features such as URN topology.
(For more on the information included in a TextInventory,
see below.)
The following entry for a an edition of the Homeric
Hymn to Athena
includes the information that
the Homeric Hymns are text group tlg0013
in the greekLit CTS namespace, and
that tlg011 is the short Homeric Hymn
to Athena. (We could therefore identify this work succinctly with the CTS URN
urn:cts:greekLit:tlg0013.tlg011.)
It further tells us that the Hymn to Athena
is cited by poetic line, and that citation values
for poetic lines are encoded on the @n
attribute of the TEI schema's l element.
But how do we determine what line numbers are valid
references? For that, we can use the
GetValidReff request.
<textgroup projid="greekLit:tlg0013">
<groupname xml:lang="eng">Homeric Hymns</groupname>
<work xml:lang="grc-c" projid="greekLit:tlg011">
<title xml:lang="eng">Hymn to Athena</title>
<edition label="chs" projid="greekLit:chs01">
<online docname="tlg0013/tlg0013.tlg011.xml" srcid="OCT">
<validate schema="http://katoptron.holycross.edu/schemas/teip5/teip5core.rng"/>
<citationScheme canonical="yes" schemaId="poeticline"/>
<citationMapping defaultNSAbbr="tei">
<citation scope="/TEI/text/body" xpath="/l[@n = '?']" label="line"/>
</citationMapping>
</online>
</edition>
</work>
</textgroup>
GetValidReff: What citation values are valid?
http://machine/service?request=GetValidReff&urn=urn:cts:greekLit:tlg0013.tlg011
The urn parameter to this request
identifies the Homeric Hymn to Athena. The body of the reply
includes a complete list of every CTS URN that is valid for this
very short poem, in the order in which they appear in the text,
and could look like this:
<reply>
<reff>
<urn>urn:cts:greekLit:tlg0013.tlg011:1</urn>
<urn>urn:cts:greekLit:tlg0013.tlg011:2</urn>
<urn>urn:cts:greekLit:tlg0013.tlg011:3</urn>
<urn>urn:cts:greekLit:tlg0013.tlg011:4</urn>
<urn>urn:cts:greekLit:tlg0013.tlg011:5</urn>
</reff>
</reply>
Optionally, GetValidReff requests may include
a level parameter, defining the depth of the citation
scheme to consider. For a work with a single level of citation, such
as a poem cited by lines, that option is irrelevant,
but if wanted to discover valid references for books of the
Iliad
(rather than lines)
included in a CTS, we could submit a GetValidReff request
with a value of 1 for the level parameter.
If our GetCapabilities
reply tells us that the Iliad is work tlg001
in text group tlg0012 in the
greekLit namespace, the request would be:
http://machine/service?request=GetValidReff&urn=urn:cts:greekLit:tlg0012.tlg001&level=1
The reply would include only 24 URNs (one for each book of the Iliad), resolved only to the first level (books) of the citation hierarchy, not the second level of individual lines.
If we subsequently wanted to discover what line numbers
are valid within book 10 of the Iliad, we could submit
a urn limited to that book:
http://machine/service?request=GetValidReff&urn=urn:cts:greekLit:tlg0012.tlg001:10
GetPassage: What is the text of this passage?
http://machine/service?request=GetPassage&urn=urn:cts:greekLit:tlg0013.tlg011:1
Applications might choose to batch process and store metadata about texts, and even lists of valid reference values, but the heart of the interaction between a CTS and client programs is retrieving passages of text for a given URN. The body of the reply contains a well-formed XML fragment with the requested passage of text framed by all its parent elements. The sample request above asks for line 1 of the Homeric Hymn to Athena; the body of a reply could look like this if the text were marked up in TEI-conformant XML:
<reply>
<TEI>
<text>
<body>
<l n="1">Παλλάδ' Ἀθηναίην ἐρυσίπτολιν ἄρχομ' ἀείδειν</l>
</body>
</text>
</TEI>
</reply>
GetPrevNextUrn: What is the following (or
preceding) passage?
http://machine/service?request=GetPrevNextUrn&urn=urn:cts:greekLit:tlg0013.tlg011:2
The string making up the reference component of a URN is arbitrary (e.g., it is perfectly
legitimate for a line labelled "320" to precede a line labelled "319"), but URNs have
an inherent order: the document order of the text units they refer to.
While applications can parse the results of a GetValidReff
to determine what URNs precede or follow a given URN, it is also possible to request
this information directly. The example asks for the URNs preceding and following
line 2 of the Homeric Hymn to Athena. The body of the reply would be:
<reply>
<prevnext>
<prev>urn:cts:greekLit:tlg0013.tlg011:1</prev>
<next>urn:cts:greekLit:tlg0013.tlg011:3</next>
</prevnext>
</reply>
GetPassagePlus: Can we simplify this
exchange?
Applications supporting navigation of a text regularly need to
submit GetPassage and GetPrevNextUrn
in tandem. To simplify this (and cut in half the number of client/server
round trips needed to navigate a text), the GetPassagePlus
request works exactly like the GetPassage request,
except that it packages in the reply both the XML of the requested passage,
and the prevnext element of a GetPrevNextUrn
request.
A CTS implementation might manage the service's metadata in any way it chooses. It might store the data in a database with a form-based user interface, for example. But the metadata is presented to client applications as XML validating against the CTS TextInventory schema, so we will survey the main components of the TextInventory as they appear serialized to XML.
The TextInventory includes three main parts: a list of standard citation schemes; a list of the individual TextGroups, Works, Editions, Translations, and Exemplars of documents known to the server; and a list of organization units called Collections. The list of groups, works, etc., is a hierarchical organization used to identify works uniquely, according to some familiar, well established convention. The collections on the other hand allow the administrator of a CTS to group sets of works together for any purpose.
Of these three sections, the most important is the list of groups and works. It is organized as follows
The list of works contains a list of…
…one or more TextGroup elements (e.g. “Homer,” “Aristotle”, “inscriptions from a given site”).
Textgroups are traditional, convenient groupings of texts such as “authors” for literary works, or corpus collections for epigraphic or papyrological texts. Each TextGroup has a unique identifier, one or more titles (allowing titles in different languages), and consists of…
…one or more Work elements (e.g. “Iliad,” “Ἀθηναίων Πολιτεία”)
Works are notional entities, each with an identifier unique within this TextGroup. Each work includes one or more titles, and, optionally, may be instantiated in…
…zero or more Edition elements and/or Translation elements
Editions and translations are specific versions of a notional work, that may be represented by multiple physical copies. Each has an identifier unique within the Work. The TextInventory may here list bibliographic information. since the Canonical Text Services protocol allows editors to work with information about texts that are online and texts that are not. Further, an Edition or Translation may optionally contain …
…zero or more Exemplar elements.
Exemplars are specific physical copies of an Edition or Translation. Each has an identifier unique within its containing Edition or Translation. Documenting individual examplars can be particularly important for early print editions, but would also allow an epigraphic editor the option of treating multiple copies of a single inscription as exemplars of an edition.
If the server can deliver an electronic version at the level of the Edition element, the Translation element or one of their Exemplars, that element will contain…
…one Online element
The Online element contains information about the citation scheme of that electronic text. (See details below.) It also includes information that a server implementation could use to translate the abstract reference into terms used for local retrieval, such as a filename or database lookup.
So, for example, a TextInventory entry for the Homeric Hymn to Athena could contain the following information:
TextGroup: tlg0013 (Homeric Hymns)
Work: tlg011 (Hymn to Athena)
Edition: chs01 (CHS electronic edition based on readings in Allen's OCT edition)
Online: local document reference =
tlg0013/tlg0013.tlg011.chs02.xml
Translation: chs02 (English translation by Hugh Evelyn-White now in the public domain)
&c.
Each Online element—be it an edition, translation, or exemplar—
contains three elements: one identifies how the XML document can be validated,
a second identifies the citation scheme with an identifier from the
list of citation schemes used in this service, and a third element contains a recursive list
of citation
elements mapping each level of the citation scheme to part of the XML document.
Our example of the Homeric Hymn to Athena cites by a single level, the poetic line,
and could be documented like this:
<online docname="tlg0013/tlg0013.tlg011.chs01.xml" srcid="OCT">
<validate schema="http://katoptron.holycross.edu/schemas/teip5/teip5core.rng"/>
<citationScheme schemaId="poeticline" canonical="yes"/>
<citationMapping defaultNSAbbr="tei">
<citation label="line" xpath="/l[@n = '?']" scope="/TEI/text/body"/>
</citationMapping>
</online>
An online element for the two-tiered citation of the Iliad
illustrates the usage of the citation element's
scope and xpath attributes.
Each provide templates for XPath expressions, in which question marks
(?) can be replaced by the value of one level of a
citation.
The xpath attribute
identifies an XML unit corresponding to a level of the citation scheme;
the scope attribute
identifies a context in the document
where this xpath applies.
(The two are distinct because a document's markup might include markup between
levels of the citation scheme.)
<online docname="tlg0012/tlg0012.tlg001.hmt-msA.xml" srcid="OCT">
<validate schema="http://katoptron.holycross.edu/schemas/teip5/teip5core.rng"/>
<citationScheme schemaId="bookAndPoeticline" canonical="yes"/>
<citationMapping defaultNSAbbr="tei">
<citation label="book" xpath="/div[@type='book' @n = '?']" scope="/TEI/text/body">
<citation label="line" xpath="/l[@n = '?']" scope="/TEI/text/body/div[@type = 'book' and @n = '?']">
</citation>
</citationMapping>
</online>
More detailed information about version 3 of CTS is currently in preparation; links will be posted here when it is made available from the project's sourceforge site.