The elements described here constitute a simple subset of the full mechanisms for encoding such information described in full in chapter 11 Representation of Primary Sources , which should be adequate to most commonly encountered situations. These include names section 3. In the same way, the following section section 3. The full story may be found in chapter 16 Linking, Segmentation, and Alignment ; the tags presented here are intended to be usable for a wide variety of simple applications.
Sections 3. These may appear either within chunk-level elements such as paragraphs, or between them. Several kinds of lists are catered for, of an arbitrary complexity. The section on notes discusses both notes found in the source and simple mechanisms for adding annotations of an interpretive nature during the encoding; again, only a subset of the facilities described in full elsewhere specifically, in chapter 17 Simple Analytic Mechanisms is discussed.
Next, section 3. Some reference systems have attained canonical authority and should be recorded to make the text useable in normal work; in other cases, a convenient reference system should be created by the creator or analyst of an electronic text. Like lists and notes, the bibliographic citations discussed in section 3. A range of possibilities is presented for the encoding of bibliographic citations or references, which may be treated as simple phrases within a running text, or as highly-structured components suitable for inclusion in a bibliographic database.
Additional elements for the encoding of passages of verse or drama whether prose or verse are discussed in section 3. The chapter concludes with a technical overview of the structure and organization of the module described here. The paragraph is the fundamental organizational unit for all prose texts, being the smallest regular unit into which prose can be divided. Prose can appear in all TEI texts, even those that are primarily of another genre e. Paragraphs can contain any of the other elements described within this chapter, as well as some other elements which are specific to individual text types.
We distinguish phrase-level elements, which must be entirely contained within a paragraph or similar structure and cannot appear except within one, from chunks , which can appear between, but not within, paragraphs, and from inter-level elements, which can appear either within a single paragraph or between paragraphs. The class of phrases includes emphasized or quoted phrases, names, dates, etc. The class of inter-level elements includes bibliographic citations, notes, lists, etc. The class of chunks includes the paragraph itself, and other elements which have similar structural properties, notably the ab anonymous block element described in Because paragraphs may appear in different base or additional tag sets, their possible contents may differ in different kinds of documents.
In particular, additional elements not listed in this chapter may appear in paragraphs in certain kinds of text. However, the elements described in this chapter are always by default available in all kinds of text. The paragraph is marked using the p element:.
Biblical Reference | Yale Divinity School Bookstore
More usually, however, paragraphs have no firm internal structure, but contain prose encoded as a mix of characters, entity references, phrases marked as described in the rest of this chapter, and embedded elements like lists, figures, or tables. Since paragraphs are usually explicitly marked in Western texts, typically by indentation, the application of the p tag usually presents few problems.
Punctuation marks cause two distinct classes of problem for text markup: the marks may not be available in the character set used, and they may be significantly ambiguous. To some extent, the availability of the Unicode character set addresses the first of these problems, since it provides specific code points for most punctuation marks, and also the second to the extent that it distinguishes glyphs such as stop, comma, and hyphen which are used with different functions. Where punctuation itself is the subject of study, the element pc punctuation character may be used to mark it explicitly, as further discussed in Where the character used for a punctuation mark is not available in Unicode, the g element and other facilities described in chapter 5 Characters, Glyphs, and Writing Modes may also be used to mark its presence.
Punctuation is itself a form of markup, historically introduced to provide the reader with an indication about how the text should be read. As such, it is unsurprising that encoders will often wish to encode directly the purpose for which punctuation was provided, as well as, or even instead of, the punctuation itself. We discuss some typical cases below. The Full stop period may mark orthographic sentence boundaries, abbreviations, decimal points, or serve as a visual aid in printing numbers. These usages can be distinguished by tagging S-units, abbreviations, and numbers, as described in sections However, there are independent reasons for tagging these, whether or not they are marked by full stops, and the polysemy of the full stop itself is perhaps no different from that of any other character in the writing system.
The Question mark and exclamation mark usually mark the end of orthographic sentences, but may also be used as a mid-sentence comment by the author! Such usages may be distinguished by marking S-units, in which case the mid-sentence uses of these punctuation marks may be left unmarked, or tagged using the pc element discussed in Dashes are used for a variety of purposes: as a mark of omission, insertion, or interruption; to show where a new speaker takes over in dialogue ; or to introduce a list item.
- Just The Way It Was.
- Contradictions Between the Book of Mormon and the Bible | Mormons in Transition.
- Contradictions Between the Book of Mormon and the Bible;
- Freedom From Back Pain.
- Islands of Dogs: A Collection of Poetry and Haiku about Mexico City.
In the latter two cases particularly, it is clearly desirable to mark the function as well as its rendition using the elements q or item , on which see section 3. Quotation marks may be removed from text contained by q or quote elements on editorial grounds, or they may be marked in a variety of ways; see the discussion of quotation and related features in section 3. Apostrophes should be distinguished from single quote marks.
- How to Take Care of Your Cat (The Concise Collections).
- The Nutanix Bible.
- Local Redistribution and Local Democracy: Interest Groups and the Courts.
- Three Styles Available.
As with hyphens, this disambiguation is best performed by selecting the appropriate Unicode character, though it may also be represented by using appropriate XML markup for quotations as suggested above. However, apostrophes have a variety of uses. In English they mark contractions, genitive forms, and occasionally plural forms. Full disambiguation of these uses belongs to the level of linguistic analysis and interpretation.
Parentheses and other marks of suspension such as dashes or ellipses are often used to signal information about the syntactic structure of a text fragment.
Full disambiguation of their uses also belongs to the level of linguistic analysis and interpretation, and will therefore need to use the mechanisms discussed in chapter 17 Simple Analytic Mechanisms. Where punctuation marks are disambiguated by tagging their assumed function in the text for example, quotation , it may be debated whether they should be excluded or left as part of the text.
In the case of quotation marks, it may be more convenient to distinguish opening from closing marks simply by using the appropriate Unicode character than to use the q element, with or without an indication of rendition. Where segmentation of a text is performed automatically, the accuracy of the result may be considerably enhanced by a first pass in which the function of different punctuation characters is explicitly marked.
This need not be done for all cases, but only where the structural function of the punctuation markup for example as a word or phrase delimiter is ambiguous. Thus, dots indicating abbreviation might be distinguished from dots indicating sentence end, and exclamation or question marks internal to a sentence distinguished from those which terminate one.
Furthermore, when encoding historical materials, it may be considered essential to retain the original punctuation, whether by using an appropriate character code, if this is available or using the g element where it is not or by an explicit encoding using pc. The particular method adopted will vary depending upon the feature concerned and upon the purpose of the project.
Church Publishes First LDS Edition of the Bible
Hyphenation as a phenomenon is generally of most concern when producing formatted text for display in print or on screen: different languages and systems have developed quite sophisticated sets of rules about where hyphens may be introduced and for what reason. These generally do not concern the text encoder, since they belong to the domain of formatting and will generally be handled by the rendition software in use.
In this section, we discuss issues arising from the appearance of hyphens in pre-existing formatted texts which are being re-encoded for analysis or other processing. Historically, the hard hyphen has been used in printed or manuscript documents for two distinct purposes. In many languages, it is used between words to show that they function as a single syntactic or lexical unit. For example, in French, est-ce que ; in English body-snatcher , tea-party etc.
It may also have an important role in disambiguation for example, by distinguishing say a man-eating fish from a man eating fish. Such usages, although possibly problematic when a linguistic analysis is undertaken, are not generally of concern to text encoders: the hyphen character is usually retained in the text, because it may be regarded as part of the way a compound or other lexical item is spelled.
Deciding whether a compound is to be decomposed into its constituent parts, and if so how, is a different question, involving consideration of many other phenomena in addition to the simple presence of a hyphen. When it appears at the end of a printed or written line however, the hard hyphen generally indicates that—contrary to what might be expected—a word is not yet complete, but continues on the next line or over the next page or column or other boundary.
The hyphen character is not, in this case, part of the word, but just a signal that the word continues over the break. Unfortunately, few languages distinguish these two cases visually, which necessarily poses a problem for text encoders. Suppose, for example, that we wish to investigate a diachronic English corpus for occurrences of tea-pot and teapot , to find evidence for the point at which this compound becomes lexicalized. Any case where the word is hyphenated across a linebreak, like this:.
A similar range of possibilities applies equally to the representation of other common punctuation marks, notably quotation marks, as discussed in 3. The ambiguity of the end-of-line hyphen also causes problems in the way a processor identifies such tokens in the absence of explicit markup. If token boundaries are not explicitly marked for example using the seg or w elements , for most languages a processor will rely on character class information to determine where they are to be found: some punctuation characters are considered to be word-breaking, while others are not.
In XML, the newline character in text data is a kind of whitespace, and is therefore word breaking. However, it is generally unsafe to assume that whitespace adjacent to markup tags will always be preserved, and it is decidedly unsafe to assume that markup tags themselves are equivalent to whitespace. The lb , pb , and cb elements are notable exceptions to this general rule, since their function is precisely to represent or replace line, page, or column breaks, which, as noted above, are generally considered to be equivalent to whitespace.
These elements provide a more reliable way of preserving the lineation, pagination, etc of a source document, since the encoder should not assume that untagged line breaks etc. To control the intended tokenization, the encoder may use the break attribute on such elements to indicate whether or not the element is to be regarded as equivalent to whitespace.
This attribute can take the values yes or no to indicate whether or not the element corresponds with a token boundary. The value maybe is also available, for cases where the encoder does not wish or is unable to determine whether the orthographic token concerned is broken by the line ending. As a final complication, it should be noted that in some languages, particularly German and Dutch, the spelling of a word may be altered in the presence of end of line hyphenation. For example, in Dutch, the word opaatje granddad , occurring at the end of a line may be hyphenated as opa-tje , with a single letter a.
An encoder wishing to preserve the original form of this orthographic token in a printed text while at the same time facilitating its recognition as the word opaatje will therefore need to rely on a more sophisticated process than simply removing the hyphen. This is however essentially the same as any other form of normalization accompanying the recognition of variations in spelling or morphology: as such it may be encoded using the choice element discussed in 3.
This section deals with a variety of textual features, all of which have in common that they are frequently realized in conventional printing practice by the use of such features as underlining, italic fonts, or quotation marks, collectively referred to here as highlighting. After an initial discussion of this phenomenon and alternate approaches to encoding it, this section describes ways of encoding the following textual features, all of which are conventionally rendered using some kind of highlighting:. By highlighting we mean the use of any combination of typographic features font, size, hue, etc.
In conventionally printed modern texts, highlighting is often employed to identify words or phrases which are regarded as being one or more of the following:. The textual functions indicated by highlighting may not be rendered consistently in different parts of a text or in different texts. For example, a foreign word may appear in italics if the surrounding text is in roman, but in roman if the surrounding text is in italics. For this reason, these Guidelines distinguish between the encoding of rendering itself and the encoding of the underlying feature expressed by it.
Highlighting as such may be encoded by using one of the global attributes rend , rendition , or style see further 1. This allows the encoder both to specify the function of a highlighted phrase or word, by selecting the appropriate element described here or elsewhere in these Guidelines, and to further describe the way in which it is highlighted, by means of an attribute.
If the encoder wishes to offer no interpretation of the feature underlying the use of highlighting in the source text, then the hi element may be used, which indicates only that the text so tagged was highlighted in some way. The hi element is provided by the model. The possible values carried by the rend attribute are not formally defined in this version of the Guidelines.
It may be used to document any peculiarity of the way a given segment of text was rendered in the original source text, and may thus express a very large range of typographic or other features, by no means restricted to typeface, type size, etc. The style attribute, by contrast, defines the way the source text was rendered using a formally defined style language, such as the W3C standard Cascading Stylesheet Language Lie and Bos eds.
The complementary rendition attribute is used to point to one or more fragments expressed using such a language which have been predefined in the TEI header using the rendition element discussed in section 2.