OSIS Bibles

From CrossWire Bible Society
Revision as of 20:38, 13 December 2009 by David Haslam (talk | contribs) (Supplying alternative quotation marks: There is further information about English quotation marks and their usage in [http://en.wikipedia.org/wiki/Quotation_marks].)

Jump to: navigation, search

OSIS

OSIS is an XML Schema definition for Bibles and other Biblical research texts, which enables ministries and other organizations to collaborate more easily. Traditionally, these organizations have stored their documents in disparate, proprietary markups, making it difficult when they wish to share in service with each other. OSIS provides a common markup for multiple visions.

CrossWire is committed to supporting the OSIS initiative. We have developed OSIS import and export tools which work with our SWORD engine, making OSIS documents available to all of our SWORD software.

The latest OSIS Schema definition and supporting information is available at: http://www.bibletechnologies.net

Introduction

This page is for practical examples of how to encode a Bible in OSIS 2.1.1 for building a SWORD module with osis2mod. It represents CrossWire's experience and best practices in creating modules.

Every OSIS SWORD module must be created from a well-formed and valid OSIS 2.1.1 document. While it is a desirable goal for any such document to be acceptable, SWORD has some particular requirements which are discussed here.

The schema for OSIS 2.1.1 can be found at http://www.bibletechnologies.net/osisCore.2.1.1.xsd.

The March 2006 version of the OSIS Manual may be found here (PDF).

A good example of an OSIS document can be found at http://www.crosswire.org/~dmsmith/kjv2006.

See also OSIS Book Name Abbreviations.

General structure

An OSIS document is a well-formed XML document, valid according to the OSIS schema. You can find the full normative description on the OSIS Website [1].

To produce a Bible, you can use this template:

<?xml version="1.0" encoding="UTF-8"?>
<osis
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns="http://www.bibletechnologies.net/2003/OSIS/namespace"
	xmlns:osis="http://www.bibletechnologies.net/2003/OSIS/namespace"
	xsi:schemaLocation="http://www.bibletechnologies.net/2003/OSIS/namespace http://www.bibletechnologies.net/osisCore.2.1.1.xsd">
	<osisText osisIDWork="{NAME}" osisRefWork="bible" xml:lang="{LANG}" canonical="true">
		<header>
			{HEADER}
		</header>
		<div type="bookGroup">
			{BODY}
		</div>
	</osisText>
</osis>

With the following values:

{NAME}
Normalized name of the Bible version (Usually 3 letters for language, 3 for translation)
{LANG}
IETF language code-- ISO 639-1 codes are preferred, and ISO 639-3 codes are preferred when ISO 639-1 codes do not exist for the given language. See [2] for a list of codes.
{HEADER}
Description of the included text; see below
{BODY}
Text; see below

For text without any character outside ASCII, you can use US-ASCII encoding (usually for english text). For every other language, please use UTF-8 and NFC. See the tools section if you need to convert.

Header

<header>  
  <work osisWork={Name}/>
</header>

Body

Here is the general structure of the body content:

<div type="bookGroup">
	<title>Old Testament</title>
	<div type="book" osisID="Gen" canonical="true">
		<title type="main" short="Genesis">Genesis</title>
		<chapter osisID="Gen.1" n="1">
			<verse sID="Gen.1.1" osisID="Gen.1.1" n="1"/>In the
			beginning...<verse eID="Gen.1.1"/>
			<verse sID="Gen.1.1" osisID="Gen.1.2" n="2"/>The earth was
			formless and void...<verse eID="Gen.1.2"/>
			...
		</chapter>
	</div>
</div>

Note any <div> defaults canonical to false. You need to set it to true on elements representing the structure of the original text. Also, a quirk of the SWORD compilation process is that the only kind of content which reliably displays outside of <verse> elements are titles.

OSIS Milestones

OSIS allows for two potentially overlapping structures: Document structure (BSP) and verse structure (BCV).

Document structure is dominated by book, sections and paragraphs (BSP), additionally with titles, quotes and poetic material. While verse structure is indicated by book, chapter and verse numbers (BCV). While a SWORD module requires verse structure, the best way to encode a module with deep markup is with document structure. Osis2mod is responsible for transforming document structure into verse structure.

Because these two systems can overlap and because XML does not allow for overlapping elements, OSIS defines a milestone mechanism for both document and verse structure elements.

For:

<X  ... attribute list ...>
...
</X>

the milestoned form is:

<X sID="g1" ... attribute list .../>
...
<X eID="g1"/>

According to the OSIS manual, for any given element X that defines a milestoneable form, all the instances of X in the document must use one form or the other and may not use both. The value of the sID must be unique within the document.

An XML validator cannot validate whether milestones are used properly. It cannot validate:

  • that an element is consistently either milestoned or not.
  • that for each element with an sID that there is a paired element with an eID.
  • that each paired sID/eID have the same attribute value.

Some notes regarding OSIS:

  • For an OSIS document to be valid it must use the non-milestoned version of <div> and <lg>.
  • There is no milestoned version of the <p> element. From a practical perspective, this means that the milestoned version of <verse> should be used when paragraphs are used.
  • The milestoned version of <chapter> must be used when the paragraph is spanning over a chapter.

Recommended Approach

  • For chapters, use <chapter>...</chapter> container elements (except in the rare case that other container elements cross chapter boundaries)
  • For verses, use milestone elements (unless container elements will suffice)
  • For paragraphs, use the <p>...</p> container element
  • For poetry, use container elements <lg>...</lg> to indicate stanzas (or other types of line groups) and <l>...</l> to indicate lines
  • For quoted text, use the <q>...</q> container element
  • For translation changes, use the <transChange>...</transChange> container element

Examples

Marking Paragraphs

There is no milestoned version of the <p> element. Typically paragraphs surround whole verses. That is, they start and end between verses. If a paragraph begins or ends in a verse and extends beyond that verse, then the whole document must use the milestoned version of <verse>.

<div type="book" osisID="Gen" canonical="true">
	<title type="main">LE PREMIER LIVRE DE MOÏSE dit LA GENÈSE</title>
	<chapter osisID="Gen.1" chapterTitle="Chapitre 1"><title type="chapter">Chapitre 1</title>
		<p>
			<verse sID="Gen.1.1" osisID="Gen.1.1" n="1"/>Au commencement Dieu créa les 
			cieux et la terre.<verse eID="Gen.1.1"/>
		</p>
		<p>
			<verse sID="Gen.1.2" osisID="Gen.1.2" n="2"/>Et la terre était désolation et
			vide, et il y avait des ténèbres sur la face de l'abîme. Et l'Esprit de Dieu
			planait sur la face des eaux.<verse eID="Gen.1.2"/>
		</p>
		<p>
			<verse sID="Gen.1.3" osisID="Gen.1.3" n="3"/>Et Dieu dit : Que la lumière
			soit. Et la lumière fut.<verse eID="Gen.1.3"/>
			<verse sID="Gen.1.4" osisID="Gen.1.4" n="4"/>Et Dieu vit la lumière, qu'elle
			était bonne ; et Dieu sépara la lumière d'avec les ténèbres.
			<verse eID="Gen.1.4"/>
			<verse sID="Gen.1.5" osisID="Gen.1.5" n="5"/>Et Dieu appela la lumière Jour ;
			et les ténèbres, il les appela Nuit. Et il y eut soir, et il y eut matin : 
			&#8212; premier jour.<verse eID="Gen.1.5"/>
		</p>
...
Result

(1) Au commencement Dieu créa les cieux et la terre.

(2) Et la terre était désolation et vide, et il y avait des ténèbres sur la face de l'abîme. Et l'Esprit de Dieu planait sur la face des eaux.

(3) Et Dieu dit : Que la lumière soit. Et la lumière fut. (4) Et Dieu vit la lumière, qu'elle était bonne ; et Dieu sépara la lumière d'avec les ténèbres. (5) Et Dieu appela la lumière Jour ; et les ténèbres, il les appela Nuit. Et il y eut soir, et il y eut matin : — premier jour.

Note: osis2mod converts a paragraph start into <div type="paragraph" sID="genX"/> and a paragraph end into <div type="paragraph" eID="genX"/>.

Marking Quotations

Most of the SWORD applications will show a chapter at a time and some will show isolated verses. This means that all of the SWORD applications show partial quotations, such as the Sermon on the Mount which begins in Matt 5 and ends in Matt 7.

Default Quotation Marks

By default, SWORD will use " for quotations. The following discusses various ways to influence this.

Indicating the nesting of a quote

When a quote is contained in a quote, it is customary to set the level attribute to indicate the depth of the nesting. For example, Jeremiah 23:38 is part of a larger quote and has a back and forth dialog of nested quotes:

But if you say,
<q level="2" sID="1"/>
	The burden of the Lord,
<q level="2" eID="1"/>
thus says the Lord,
<q level="2" sID="3"/>
	Because you have said these words,
	<q level="3" sID="4"/>
		The burden of the Lord,
	<q level="3" eID="4"/>
	when I sent to you, saying,
	<q level="3" sID="5"/>
		You shall not say,
		<q level="4" sID="6"/>
			The burden of the Lord,
		<q level="4" eID="6"/>
	<q level="3" eID="5"/>

A couple of things to note about this verse. First, the level attribute is on both the sID and the eID pair, matching in value. Second, this is an example of a verse that has a quote that starts in the middle and finishes in another verse.

In this case, SWORD will use the level to determine whether to use " or ' for quotes. Odd levels will use " and even levels will use '.

Supplying alternative quotation marks

The quote element has a marker attribute that can be used to control the quotation marks. SWORD applications will always use this value when rendering the quote. When the marker attribute is present but empty, it will render no quotation mark at all.

To specify "curly" quotes you can use the following values:

Description Char HTML Entity Unicode
Opening double quote &#8220; U+201C
Closing double quote &#8221; U+201D
Opening single quote &#8216; U+2018
Closing single quote &#8217; U+2019

To use different marks to start and end a quote, use the milestoned version of the quote.

<q marker="“" sID="qN"/> ... <q marker="”" eID="qN"/>

Quotation marks have a variety of forms in different languages and in different media. See Quotation mark, non-English usage.

There is further information about English quotation marks and their usage in [3].

Continuation Quotation Marks

The <milestone type="cQuote"/> can be used to indicate the presence of a continued quote. If the marker attribute is present, it will use that otherwise it will use a straight double quote, ". Since there is no level attribute on the milestone element, it is best to specify the marker attribute.

Marking the Words of Christ

To indicate that a quote is something that Jesus said use who="Jesus".

	<verse osisID="Luke.22.35 sID="Luke.22.35"/>
	Then Jesus asked them, <q who="Jesus" marker="">When I sent you without purse,
	bag or sandals, did you lack anything?</q>
	<verse eID="Luke.22.35"/>
Result

Then Jesus asked them, When I sent you without purse, bag or sandals, did you lack anything?

Marking poetic material

Poetry is marked up with <lg>, line group, and <l>, line, elements. The line element supports indentation with the level attribute. When the level attribute is not present or it is level="1", this should be interpreted as the first level of the line group. When level="2" it is indented relative to level="1". The same is true for each subsequent level.

While OSIS defines a milestoned version of the <lg> element, the use of it will not produce a valid XML document.

The level attribute is used to indicate indentation. A value of 1 means no indentation, the same as not specifying a level attribute. A value of 2 means to indent one. And so forth.

  <chapter osisID="Exod.15" chapterTitle="Chapitre 15"><title type="chapter">Chapter 15</title>
    <p>
      <verse sID="Exod.15.1" osisID="Exod.15.1" n="1"/>
      Then sang Moses and the children of Israel this song unto the LORD, and spake, saying,
    </p>
    <lg>
      <l level="1">I will sing unto the LORD, for he hath triumphed gloriously:</l>
      <l level="2">the horse and his rider hath he thrown into the sea.</l>
      <verse eID="Exod.15.1"/>

      <verse sID="Exod.15.2" osisID="Exod.15.2" n="2"/>
      <l level="1">The LORD is my strength and song, and he is become my salvation:</l>
      <l level="2">he is my God, and I will prepare him an habitation;</l>
      <l level="2">my father's God, and I will exalt him.</l>
      <verse eID="Exod.15.2"/>

      <verse sID="Exod.15.3" osisID="Exod.15.3" n="3"/>
      <l level="1">The LORD is a man of war:</l>
      <l level="2">the LORD is his name.</l>
      <verse eID="Exod.15.3"/>

      <verse sID="Exod.15.4" osisID="Exod.15.4" n="4"/>
      <l level="1">Pharaoh's chariots and his host hath he cast into the sea:</l>
      <l level="2">his chosen captains also are drowned in the Red sea.</l>
      <verse eID="Exod.15.4"/>

      <verse sID="Exod.15.5" osisID="Exod.15.5" n="5"/>
      <l level="1">The depths have covered them:</l>
      <l level="2">they sank into the bottom as a stone.</l>
      <verse eID="Exod.15.5"/>
...
Result

(1) Then sang Moses and the children of Israel this song unto the LORD, and spake, saying,

I will sing unto the LORD, for he hath triumphed gloriously:
the horse and his rider hath he thrown into the sea.
(2) The LORD is my strength and song, and he is become my salvation:
he is my God, and I will prepare him an habitation;
my father's God, and I will exalt him.
(3) The LORD is a man of war:
the LORD is his name.
(4) Pharaoh's chariots and his host hath he cast into the sea:
his chosen captains also are drowned in the Red sea.
(5) The depths have covered them:
they sank into the bottom as a stone.

Marking with Strong's Numbers

To mark up Strong's numbers, you first need to declare a workID in the header of the OSIS document:

  <header>
    ...
    <work osisWork="strong">
      <refSystem>Dict.Strongs</refSystem>
    </work>
    ...
  </header>

SWORD does not actually use this declaration, but it is required to have a proper OSIS document.

And while OSIS allows arbitrary workIDs, SWORD can only handle "strong" and a few variants.

<w lemma="strong:H0853 strong:H03045">knew</w>

The <w> element is used to surround the text that is represented by the Strong's number. It may be that the text is a phrase and it may be that more than one Strong's number defines the text.

When more than one Strong's number defines the text, each must be prefixed with a workID and must be separated from each other by a space. (While OSIS allows for the defining of default workIDs, SWORD requires that the workIDs be used.)

The actual Strong's Number should indicate whether it is Hebrew (H) or Greek (G) followed by the number. The number can be 0 filled up to 5 digits as in H00001.

Marking with Morphology

In a similar manner to marking with Strong's numbers, morphology can also be noted. Since morphology regards the original language, Strong's numbers will be shown at the same time.

As with Strong's numbers, a workID needs to be defined. Here we are defining one for Robinson's Morphology Codes. And while SWORD will ignore this declaration, "robinson" is hard-coded into SWORD for Greek morphology codes.

  <header>
    ...
    <work osisWork="robinson">
      <refSystem>Dict.Robinson</refSystem>
    </work>
    ...
  </header>

Example markup of Robinson's Morphology Codes in the KJV module:

<w lemma="strong:G3588 strong:G80" morph="robinson:T-APM robinson:N-APM" src="7 8">his brethren</w>

In this example, lemma, morph and src form parallel arrays. The first strong: mapping to the first robinson: and the first src value.

The workID should be name of a current, future, or potential lexicon module in which the morphology code could be looked up. For example, morph="packard:D" represents a reference to morphology code "D" in a module named Packard, whether or not a Packard module has been created or released. (Currently, SWORD offers lexicon modules named Robinson and Packard, both for Greek morphology.)

The src attribute is used here to indicate the word position in the original Greek.

Marking with Other Lemmas

The lemma attribute of the <w> element can contain any number of other lemmas. Like Strong's numbers and morphology codes, these need to have a workID declared in the header. SWORD presumes that these lemma workIDs all start with "lemma." (note the final period). The portion of the workID following "lemma." should be name of a current, future, or potential lexicon module in which the lemma could be looked up. For example, lemma="lemma.TWOT:271" represents a reference to lemma #271 in a module named TWOT (i.e the Theological Workbook of the Old Testament), whether or not a TWOT module has been created or released. As far as SWORD is concerned, there can be any number of these space-delimited values in a lemma attribute and they can be in any order, even interspersed among the "strong:" lemmas.

SWORD has the ability to show or hide non-Strong's lemmas as a group.

Marking the Divine Name

The <divineName> is reserved for translations of YHWH. These occur in the Old Testament as Lord, God and Yah. Not every Lord or God is a translation of this.

The content of the divineName element is the word Lord, God or Yah, not in all upper case (i.e. not LORD, GOD, or YAH). SWORD will either convert it to small-caps or uppercase.

Note, if it is the use is possessive it is permissible to have the following:

   <divineName>Lord's</divineName>

When also marking with Strong's numbers you will need to do it one of two ways:

   <divineName><w lemma="strong:H3068">Lord's</w><divineName>
or
   <w lemma="strong:H3068">of the <seg><divineName>Lord</divineName></seg></w>

The latter form uses a hack to allow the embedding of <divineName> in a <w>, since OSIS does not allow for this, but does allow for <seg> to be in a <w> and to contain <divineName>.

Marking Sections and Titles

A section is marked with:

<div type="section">
...
</div>

In OSIS the <title> element is used to provide general headings. Titles should be placed at the top of the container that they title, not before.

<div type="book">
   <title>A book title</title>
   <chapter>
       <title type="chapter">A title chapter</title>
       <div type="section">
            <title>A section title</title>
            ...
       </div>
       ...
      </chapter>
</div>

Using type="chapter" or type="main" is needed by osis2mod to distinguish chapter titles from verse titles. When SWORD stores an OSIS document it does so as an index of verses. It has special indexes for book and chapter titles. SWORD does not store the <verse> tags. So when it comes to storing a title in the following verse, osis2mod generates special markup to indicate that the title stands before the verse. SWORD uses this to place the verse number.

Note: The <head> element is used to provide headings for tables, lists and cast groups. There are errors in the OSIS 2.1.1 manual that use the <head> incorrectly.

Marking Notes

	<verse sID="Gen.1.1" osisID="Gen.1.1" n="1"/>Au commencement 
	Dieu<note osisRef="Gen.1.1" osisID="Gen.1.1!1" n="1"><hi type="italic">en hébreu</hi> : Élohim,
	(<hi type="italic">pluriel d</hi>'Éloah, le Dieu suprême), la Déité, <hi type="italic">dans
	le sens absolu</hi>.</note> créa les cieux et la terre.<verse eID="Gen.1.1"/>

	<verse sID="Gen.1.2" osisID="Gen.1.2" n="2"/>Et la terre était désolation et 
	vide<note osisRef="Gen.1.2" osisID="Gen.1.2!1" n="2">le vide.</note>, et il y avait des ténèbres
	sur la face de l'abîme. Et l'Esprit de Dieu planait sur la face des eaux.<verse eID="Gen.1.2"/>
Result
  1. Au commencement Dieu¹ créa les cieux et la terre.
  2. Et la terre était désolation et vide², et il y avait des ténèbres sur la face de l'abîme. Et l'Esprit de Dieu planait sur la face des eaux.
  • ¹ en hébreu: Élohim, (pluriel d'Éloah, le Dieu suprême), la Déité, dans le sens absolu.
  • ² le vide.

The note should be attached to what it refers to, either after (as is the case here) or before. There should no additional space surrounding the note, but only what is in the text.

These notes can have any type other than crossReference.

Marking Cross-References Notes

SWORD provides the ability for a user to show or hide cross-references. To achieve this you embed one or more <reference> elements in a <note type="crossReference">...</note>. If this is not done, then the cross-references will always show inline in the text.

<note type="crossReference" n="t" osisID="Jer.24.7!crossReference.t" osisRef="Jer.24.7">
  <reference osisRef="Jer.32.39">ch. 32:39</reference>; 
  <reference osisRef="Deut.30.6">Deut. 30:6</reference>; 
  <reference osisRef="Ezek.11.19">Ezek. 11:19</reference>; 
  <reference osisRef="Ezek.36.26-Ezek.36.27">36:26, 27</reference>
</note>

Here is a breakdown.

Regarding the <note> element:

type="crossReference" is one of the predefined OSIS note types. SWORD looks for this value to show/hide cross-references.

n="t" provides the author's desired footnote marker for the note. A couple of SWORD applications use this, but most manufacture their own marker.

The osisID is given based upon the location of the note. In order to not conflict with the verse's osisID and to construct a unique id, the ! (extension mark) is used to further qualify. This is followed by the note's type and n value, separated by a dot.

This note pertains to a single verse and it is given in osisRef.


Regarding the <reference> elements:

The <reference> element is replaced by SWORD with a link to the reference with the text of the element being shown as link text.

While the osisRef can point to multiple verses, most SWORD applications cannot handle a link that goes to more than one verse or a contiguous range of verses. Here we see that each reference is separated by punctuation.

Marking Variants

The text of the Bible <seg type="x-variant" subType="x-1">may</seg>
<seg type="x-variant" subType="x-2">can</seg> contain variant readings.
Use the seg element <seg type="x-variant" subType="x-1">in order </seg>to mark them.

Sword will recognize the <seg> tag, with type="x-variant" as marking variants present in different versions of a text. The element subType should be added, with a value of x-1 or x-2 to indicate whether the reading is the primary or secondary variant. At present, Sword supports only 2 different readings per text.

Tools

Charset conversion

Padma

Padma is a system for transforming Indic text between various public and proprietary formats. This extension applies the technology to Mozilla based applications. Padma is available as an extension for Firefox, Thunderbird, Netscape, Mozilla suite and SeaMonkey platforms. Padma can automatically transform web pages that use dynamic font schemes to Unicode.

Padma can be customised to include a user supplied conversion. This implies that its use is not restricted to Indic texts. See [4].

Valid OSIS test

A valid XML document one that is well-formed and conforms to the formal definition provided in a schema (or DTD). A document cannot have elements, attributes, or entities not defined in the schema. A schema can also define how entities may be nested, the possible values of attributes, etc.

Many programs capable of schema validation exist. Most XML editors (XML Copy Editor, Oxygen, XMLSpy, Topologi, etc.) support some sort of XML schema validation. The Windows based text editor Notepad++ supports Unicode and has an XML Tools plugin which can perform validation.

xmllint

libxml2, available for Linux, Windows, & MacOS, includes a command-line validator called xmllint. To check that a document is valid against OSIS schema, use the following command. (You need Internet access to validate your document.)

$ xmllint --noout --schema http://www.bibletechnologies.net/osisCore.2.1.1.xsd myfile.osis.xml

To install xmllint, simply install libxml2 via your distribution's standard package management system in Linux or download the Windows binary from our mirror.

Online XML Validators

The external links section of http://en.wikipedia.org/wiki/XML_Validation lists at least three online validators. Some or all of these can validate against external XML schema.

Creating a SWORD Module

Use osis2mod to create the module.