4.1 Introduction

Nothing is more important in preparing an electronic text than the symbols in the original book or manuscript is written. These symbols may be called, loosely, the work's character set. Yet the character set is just a collection of distinguishable shapes. Usually, they can be classified unambiguously. Inventorying them is the first step towards deciding how a text may be entered into computer-readable form. Individually or in combination, however, characters take on symbolic value when they become the alphabet out of which words are made, the boundary forms that separate those words, and the abbreviations that fuse several alphabetic symbols into one unit. The alphabet consists of an ordered group of alphabetic letters and numbers. The sequence of these letters is called the collation sequence or alphanumeric order. The characters in the alphabet combine to make up alphanumeric strings that we call words or integers. However, words and integers can only be recognized when we have additional characters, first boundary characters and second abbreviations. We use boundary characters like word-separators, punctuation marks and diacritics to specify when a word starts and stops. We use abbreviations to reduce sequences of letters or numbers into single symbols.

Thus a character is any single unit in a writing system, that is, any unit that on any occasion in a text stands alone in a distinctive way. In English and other languages using the Roman alphabet, these characters include letters, numbers, boundary symbols (such a space, tab, carriage-return/line-feed, punctuation marks, brackets such as parentheses and braces, quotation marks, the apostrophe, accents, and a host of special units like the asterisk) and abbreviations like the ampersand. The Renaissance compositor's typebox might be considered the basic character-set for most RET books, but hand-written scripts such as Anglicana, secretary, bastard, and italic hands offer many additional characters.

Characters fall into several symbol sets, each one of which has a function that may be stated by giving it a symbolic name. There are at least five symbols: alphabet (letter-number), word separator, diacritic, punctuation (all boundary characters), and abbreviation (which include both alphabetic and boundary characters).

Note that some Renaissance characters may bear more than one symbolic value. For example, the comma may serve simultaneously as word-separator and punctuation mark, and double crossed-l may be ambiguously both letter-number and abbreviation. Today, boundary characters sometimes form words -- e.g., the exasperated cry, "!!##?!" -- and there will always be disagreement whether a compound hyphen is a true diacritic. Yet there is also no agreement in the English Renaissance on which characters are and are not letter-number symbols.

For example, A Newe Booke of Copies (1574) gives many plates of what appear to be ordered English alphabets for secretary, italic, and other hands of the period, but letter-forms and letter-combinations are treated ambiguously in these plates. Plate 19, a secretary hand, has 53 single or combinatory forms from a to &. The modern letter j only occurs in ij, and u does not appear, unless it is identical with the form that appears where we expect the letter n. Richard Mulcaster, in The First Part of the Elementarie (1582) appears to recognize only 24 letters in his alphabet, to judge by his "Generall Table" of English words, which has no sections for words beginning with either j or u. Yet both characters exist in his book, the former in medial position, and the latter in terminal position within roman numbers (e.g., iij and iu tify at the bottom of p. 197). It is not clear whether the writer of the 1574 manual on handwriting, and Mulcaster, regarded j and u as alternate characters for the same two letter-numbers i and v, or as letter-symbols in their own right that simply never occurred initially and thus had no special section in his word-index.

Other Renaissance writers do not help clarify this point. Martin Billingsley in The Pens Excellencie (1618) lists the two forms in an alphabet of lowercase letters and ligatures, but not among their uppercase equivalents. Yet Joseph Moxon's Mechanick Exercises, vol. 2 (1683), in giving "the true Shape of Christophel Van Dijcks aforesaid Letters" in seven plates, lists both j and v in capital (and thus initial) and lowercase forms in the alphabet of only four plates (11-12, 14, 17). Plate 13 gives only capital I and V, plate 15 only capital J and U, and plate 16 (labelled 26) only capital I or J (it is unclear which) and U.

The Renaissance appears to have recognized character sets and to have organized them into hands (e.g., italic and secretary) but may not have understood that the unit letter-forms in these hands could be represented symbolically as a single alphabet. This issue arises in the treatment of j and u.

Printed books and manuscripts of this period have many more characters than writing-masters and grammarians recognize in their lists. There is no contemporary reference work that lists them, let alone explains how what kinds of symbolic function these characters assumed. The Renaissance character set, then, remains both partial (we have yet to collect the data) and indeterminate (we interpret the symbolic roles of the letters, since we have no clear guidance in this from the period itself). We do not know which characters are letter-numbers, diacritics, or word-separators, singly or in combination (for example, was the letter now called digraph ({ae} regarded as two conflated characters, an a and an e, or a ligature like ct, which compositors and some writing masters treated it as a single unit?). Thus the boundaries of word-integers are uncertain. We also do not know certainly how to expand Renaissance abbreviations, whether shortened words, brevigraphs, or elisions; and anyway some forms that RET regards as abbreviated--for example, swash-es--may well have been regarded then as letters.

The same kind of doubts apply to the Renaissance collation sequence, the order in which letter-numbers are arranged so as to form the alphabet. In the Renaissance, the student's abecedarium or the writing master's book gave interpretations of this collating sequence, but they sometimes disagreed. They agreed that the letter-forms i/j and u/v were mutually exclusive, i and v being used mainly as the first letter in a word, and j and u in post-initial positions. This would imply that they occupied the same place in the alphabetical sequence, j and u being sorted, in or at the end of words, after h and t, respectively. Yet u and v appear together, side-by-side but in different orders, depending on whether the script is secretary or italic, according to a writing book published in 1574 and to Moxon's Exercises in 1683. Copies (1574) lists u before v in plates 4, 6 and 8 (italic), and u after v in plates 5, 9, and 10 (secretary). Billingsley preserves this secretary-hand order in 1618. Moxon's seven plates also order the 26-letter aA-zZ range alternately with v before u (plates 11-12, 17) and with u before v (plate 14).

It is possible, then, that the English Renaissance people thought that collating sequence varied by script or hand (secretary, italic) as well as by language. If so, the Renaissance did not understand the difference between a physical set of characters and a symbolic alphabet. Even today, of course, ISO has yet to define a standard collating sequence for the Roman alphabet. It is little wonder that the Renaissance English did not do so.

Words or integers, that is, combinations of letters or integers, are also ambiguous. These are the smallest units of language that may be said to have meaning. Formally, they consist of letter-numbers, each containing at least one character, and they have boundary characters called word-separators before and after. Word-integers may also include diacritics, which are characters that form part of a word but do not necessarily affect its sorting (in other words, diacritics in this sense are invisible). Because meaning is itself symbolic, the word being a sign for other things, word-integers are themselves symbolic. They cannot be defined as simply as "character-string between boundary characters."

Because it is debatable whether the Renaissance recognized the existence of diacritics, its concept of word-integers is obscure. To approach an answer, we would have to know if More, Udall, Spenser, Sidney, Shakespeare, Bacon, and Hooker, for example, regarded our possessive God's as one word or two (e.g., God his). If there was no standard way of describing possessive forms like this, then the word itself was also indeterminate.

These uncertainties all suggest that Renaissance electronic editions cannot assume anything about the functions or symbolic roles of the characters in its character set. The tagging system should uniquely identify characters while postponing demands that they be declared to take on any one role.

The TEI Guidelines relegate hands, scripts, and typeface to the status of accidental or procedural attributes of content elements. For TEI, such accidentals are matters of "rendition,". For example, Francis Bacon begins his essay on truth with a sentence that uses among small capitals, italics, and roman type.

WH<f type="sc">a<f type="r$quot;>t <f type="i"> is Truth <f type="r$quot;> ; |said ie{|st}ing <f type="i$quot;> Pilate <f type="r$quot;> ; And would not {|st}ay for an An|swer .
TEI would use, not the RET <f type="$quot;> tag -- it identifies a means to an end (font) rather than the end itself (emphasis, if both "is Truth" and "Pilate" are in fact emphasized by virtue of being italicized) -- but a <em type="$quot;> tag. However, if the Renaissance treats secretary or italic ambiguously, then clearly a tagging system accountable to this period must be less interpretative, and more granular, than TEI's. TEI argues that, in exchanging electronic texts, we must put a standard way of defining functionality and content before what it calls rendition, of which font and layout are key features, but in the Renaissance it appears that font and script must be interpreted carefully. That is, their roles are substantive questions.

Accordingly, RET editions do procedural markup as if it were substantive, not accidental.