5.3 Structural Tags

A feature tag identifies a unit of text for what it is. A structural tag is a feature tag which links that unit to another unit or units, either by subsuming or nesting the former in the latter, or by doing the reverse (so that the former always contains the latter whenever the latter occurs), or by identifying a sequence in the appearance of these units. Feature tags say something about the unit in itself. Structural tags relate the unit to other units.

COCOA has to register the features and the structural relations of any unit implicitly with divisional or x-type tags. In SGML, any feature tag can contain information about its relationship to any other tag.

5.3.1 SGML Structural Tags

A very useful feature of SGML is the ability to define tags, within an attached Document-Type Definition (DTD), as being hierarchically children of, or parents of, one or more other tags. RET texts are distributed without DTD. If you wish to use a RET text within an SGML editor, you should define the DTD so that it best corresponds to your interpretation of the text. See Appendix 2 for a discussion of this type of command file.

A TEI-conformant scheme for the folio text of King Lear might look as follows. Note that font is declared either in the "rendition" attribute of a tag or in the <em> ("emphasis") tag and that special characters like long-s are declared with entities.

               <div0 type="play-tragedy">
               <lb><fw type="page"
               <lb><head rend="rl">THE TRAGEDIE OF</lb>
               <lb>KING LEAR.</lb></head>
                  <lb><div1 type="act" n="1" 
                    lang="l">A&ctlig;us Primus.
                    <div2 type="scene" n="1"
                       <lb<fw type="col" 
                          type="entrance" rend="i">Enter 
                          Kent, Glouce&longstlig;er, and 
                       <lb><sp rend="r"><speaker 
                       <lb>&blockI; Thought the King had more
                          affe&ctlig;ed the</lb>
                       <lb>Duke of <em
                          then<em rend="i"> Cornwall.</em>
                       <p><lb><sp rend="r"><speaker 
                          rend="i">Glou.</speaker> It 
                          did alwayes &longs;eeme &longs;o to vs : 
                    ... </sp>
The DTD for this scheme might specify a hierarchy that is eight levels deep: <sp> (speeches) can only occur within <div2> (scenes), <div2> only within <div1> (acts), <div1> only within <div0> (plays), <div0> only within <body> (the main part of the text, as distinguished from <front> [front matter or preliminaries] and <back> [back matter]), <body> only within <text>, <text> only within <group>, and then <group> only within a higher <text>. Actually a ninth level may be said to occur too: the <speaker> tag only occurs inside the <sp> (speech) tag. The <group> tag is used because Lear is part of a collection of play-texts.

Each level in the hierarchy can have a number of attributes. For example, each tag can be numbered (with the n attribute). Most important, the lower levels of the hierarchy, sometimes called the children, inherit the characteristics of the levels above them, sometimes called the parents. Text-retrieval software would thus be able to set boundaries during word-selection--for example, asking for all words spoken by Cordelia only in Act 3. SGML software might thus tell you, automatically, for any spoken word cited from a play, the hierarchical structure of elements characterizing that word (e.g., play.act.scene.speaker.speech.line). The first two functions are important to searching within large distributed libraries of online texts, but less so to individual scholarly workstations. The range of correlations that can be made among words and the features of all words in the text is greater in SGML than in COCOA.

There are, however, some relatively free-floating TEI tags, such as <lb> (line-break) and <fw> (forme-work), which belong under no structural tag but are found everywhere; and these create difficulties for the tagging of early texts. The <fw> tag (II, 995-96), which handles bibliographical information in a text, is a flat tag, without hierarchical relations and having only three attributes, type (e.g., header, footer, pagination, signature, and catchword) and rend (e.g., roman, italic, etc.). This is unsatisfactory for the purposes of transcribing the bibliographical structure of Renaissance printed books and manuscripts. While TEI guidelines allow for multiple hierarchies -- "in any kind of text, the encoder may wish to record the physical structure of the volume, page, column, and line, as well as the formal or logical structure of chapters and paragraphs or acts and scenes, etc." (751) -- the guidelines argue that they should be logical, content-oriented ones, not physical ones.

RET editions are tagged to suggest simultaneous structures but do not specify exactly how they are to be handled within SGML software. The nested structures generally anticipated for RET texts are the following seven:

  1. Bibliographical structure is tagged by nested <bkdv1> (gathering or quire), <bkdv2> (page), <bkdv3> (column or line), and <bkdv4> (line) tags. This hierarchy might be declared in a DTD.

  2. Content structure is tagged by <--dv1>, <--dv2>, <--dv3>, and <--dv4> tags, although in a DTD these would not be nested. For example, a dictionary might be encoded with a four-level structure: definitions of a word or a phrase, within an entry, word-entries under headings for each alphabetic letter, and letters within the dictionary itself:
       <ttdv1 type="dictionary">
          <ttdv2 type="alpha">
             <ttdv3 type="entry"> 
                <ttdv4 type="lemma"> ...
                <ttdv4 type="phrase"> ...

  3. Press and textual variants are encoded <app><rdg resp="">...</rdg></app> where the attribute resp indicates the source of the variant inserted between the <app> tags.

  4. Quotations with bibliographical references might be encoded
          <quot> ...</quot>
             <author> ... </author>
             <title> ...</title>
    Bibliographical elements are added or not as they appear in the text.

  5. The titlepage is represented
          <docTitle> ... </docTitle>
             <pubPlace> ... </pubPlace>
             <publish> ... </publish>
             <docDate> ... </docDate>
          <imprimat> ... </imprimat>
    Other bibliographical elements are used as needed.

  6. Speeches in dramatic texts are encoded <sp who=><speaker> ... </speaker> ... </sp>, assuming that speech prefixes precede the lines of text. Stage directions may be nested within both <sp> and <speaker> tags as needed.

All other tags may occur anywhere in the text.

Encoded in this way, the opening of King Lear is as follows. Note that font is declared in a "font" tag and that special characters like long-s are declared within braces.

      <bkdv1 type="gathering" format="folio" n="65"
        <bkdv2 type="page" n="283" sig="qq2r"
            side="outer" forme="2">
          <bkdv3 type="col" n="0">
              <bkdv4 type="line" n="1"><f 
              <bkdv4 type="line" n="2"><head><f 
                 type="rl">THE TRAGEDIE OF
              <bkdv4 type="line" n="3">KING
              <bkdv4 type="line" n="4"><plydv1 
                 type="act" n="1"><f
                 type="i"><lang type="l"><head>
                 A{ct}us Primus. <plydv2 type="scene"
                 n="1"> Sc{oe}na
                 Prima.</head><f type="r"></lang>
    <bkdv3 type="col" n="1">
             <bkdv4 type="line" n="5"><stage
                 type="entrance">Enter Kent, Glouce{{s}t}er, and
             <bkdv4 type="line" n="6"><plydv3 
                 n="1"><speaker who="Kent">
                 <f type="i">Kent.</speaker></bkdv4>
             <bkdv4 type="line" n="7"><f 
                 type="r"> Thought the King had more a{ff}e{ct}ed
             <bkdv4 type="line" n="8">Duke of <f 
                 <f type="i">Cornwall.</plydv3></bkdv4>
             <bkdv4 type="line" n="9"><plydv3 
                 n="2"><speaker who="Gloucester">
                 <p><f type="i">Glou.<f 
                 type="r"> </speaker> 
                 It did alwayes {s}eeme {s}o to vs : But</bkdv4>
       ... </plydv3>
        </bkdv2> ....
  </plydv2> ....
</plydv1> ...
The word a{ff}e{ct}ted in Kent's first speech, for example, is tagged

  1. as on book-line 7 (bkdv4) of column 1 (bkdv3) on page 283 (bkdv2) of gathering 65 (bkdv1),

  2. as in speech 1 (plydv3) of scene 1 (plydv2) of act 1 (plydv1),

  3. in a speech by Kent (speaker), and

  4. in roman type (f).

The page number 283, in contrast, has no place in the play structure. It is just on book-line 1 (bkdv4) of column 0 (bkdv3) on page 283 (bkdv2) of gathering 65 (bkdv1). (Column 0 is a page with no columns.) Headings, stage directions, and speech prefixes belong in both book and play structures, however, and are tagged for being what they are so that they cannot be confused with speeches. Every piece of text has precise coordinates in at least one structure, for reference purposes.

The book-structure codes identify elements in the making of the book itself: the line of type (bkdv4), the column in which it appears (bkdv3), the page and forme in which it is set (bkdv2), and the gathering in which it is bundled (bkdv1). These four are truly a nested hierarchy. Folio gatherings consist of sheets of two pages on each side of the paper (printed in what are called outer and inner formes, one for each side of each sheet), and lines always fall within columns, and columns within pages.

Note that SGML, unfortunately, cannot represent the way in which pages were originally nested in an inner or an outer forme (i.e., side of a forme), and in which these sides are nested within a single forme. For example, the First Folio consists largely of folio sheets gathered in sixes (2( in 6s). Each sheet held two pages on each side. Three such sheets were placed on top of one another, to produce six sides, and then were all folded, to produce the twelve pages in the gathering. Assuming that that the printer imposes the formes in the order of the signatures (the one containing page 1 or signature A1r first), and that he always selects to print the outer forme before the inner forme, the book as structured during its making can be represented as follows:

Page   Right/Left on Forme    Forme Side:      Forme:
No.   Side: Signature/Page   Inner/Outer      Folio

1      A1r                  outer            1
12     A6v                  outer            1
2      A1v                  inner            1
11     A6r                  inner            1
3      A2r                  outer            2
10     A5v                  outer            2
4      A2v                  inner            2   
9      A5r                  inner            2
5      A3r                  outer            3
8      A4v                  outer            3
6      A3v                  inner            3
7      A4r                  inner            3
The electronic text of course adopts the order of the sheets as they are gathered and folded, not as they were mounted on the press originally. Only if we re-sorted the electronic text by forme, then by side, and finally by left/right position on side, could we recognize this order within SGML; and then, of course, we could not represent the order of the pages as we read them.

5.3.2 COCOA Structural Tags

Two groups of COCOA feature tags simulate some aspects of text structure and thus put some order to tagging complexity. The first is the two x-type series: by assuming that any <tt> tag inherits the prevailing value of the last <bk> tag but one, a hierarchy is set up. It is possible then to create x-type series further down in this hierarchy. For example, novels might be divided into first-person narrative, third-person narrative, and omniscient-author passages (e.g., nt 1st, nt 3rd, etc.). The second is the book-, play-, poem-, and text-division tags. Each numbered division tag (e.g., <plydv3> inherits the current values of the preceding play-division tags (i.e., plydv1, plydv2). In this way, text tags of increasing specificity inherit the values of the preceding, more general <tt> and <bkt> tags.

With TACT, this inheritance is managed by creating citation references that string these tags together in sequences such as $bkt $plydv1 $plydv2 $plydv3.

Note that COCOA offers us no way to state explicitly and unambiguously the hierarchical relationships ("parent-child") among tags or to manage cross-references among tags.