Difference between revisions of "OSIS Bibles"

From CrossWire Bible Society
Jump to: navigation, search
m
(Recommended approach: if those attributes are omitted for the '''seg''' element as a simpler kludge.)
 
(345 intermediate revisions by 11 users not shown)
Line 1: Line 1:
=Introduction=
+
==OSIS==
This page is for practical examples of how to encode a Bible in OSIS 2.1.1 for building a Sword module with osis2mod. It represents CrossWire's experience and best practices in creating modules.
+
OSIS is an XML Schema definition for Bibles and other Biblical research texts, which enables ministries and other organizations to collaborate more easily.  Traditionally, these organizations have stored their documents in disparate, proprietary markups, making it difficult when they wish to share in service with each other.  OSIS provides a common markup for multiple visions.
  
Every OSIS Sword module must be created from a well-formed and valid OSIS 2.1.1 document. While it is a desirable goal for any such document to be acceptable, Sword has some particular requirements which are discussed here.
+
CrossWire is committed to supporting the OSIS initiative.  We have developed OSIS import and export tools which work with our SWORD engine, making OSIS documents available to all of our SWORD software.
  
The schema for OSIS 2.1.1 can be found at http://www.bibletechnologies.net/osisCore.2.1.1.xsd.
+
The latest OSIS Schema definition and supporting information was once available at: [http://www.bibletechnologies.net/].<BR> However, the '''BTG''' no longer exists. See http://ebible.org/osis/
  
The March 2006 version of the OSIS Manual may be found at http://www.bibletechnologies.net/utilities/fmtdocview.cfm?id=28871A67-D5F5-4381-B22EC4947601628B.
+
==Introduction==
 +
This page is for practical examples of how to encode a Bible in OSIS 2.1.1 for building a SWORD module with [[osis2mod]]. It represents CrossWire's experience and best practices in creating modules.
 +
 
 +
Every OSIS SWORD module must be created from a well-formed and valid OSIS 2.1.1 document. While it is a desirable goal for any such document to be acceptable, SWORD has some particular requirements which are discussed here.
 +
 
 +
The schema for '''OSIS 2.1.1''' that was formally at [http://www.bibletechnologies.net/osisCore.2.1.1.xsd] is preserved at http://crosswire.org/osis/osisCore.2.1.1.xsd. 
 +
<br />We are maintaining an updated schema at: http://www.crosswire.org/~dmsmith/osis/osisCore.2.1.1-cw-latest.xsd that may be used in place of the official BibleTechnologies URL for validating OSIS files.
 +
<br />See also [[OSIS_211_CR#CrossWire_updated_schema| CrossWire updated schema]].
 +
 
 +
The March 2006 version of the OSIS Manual may be found [http://www.crosswire.org/OSIS/OSIS%202.1.1%20User%20Manual%2006March2006.pdf here] (PDF).
  
 
A good example of an OSIS document can be found at http://www.crosswire.org/~dmsmith/kjv2006.
 
A good example of an OSIS document can be found at http://www.crosswire.org/~dmsmith/kjv2006.
 +
<BR>The latest releases are found under http://www.crosswire.org/~dmsmith/kjv2011/
  
=General structure=
+
See also [[OSIS Book Abbreviations|OSIS Book Name Abbreviations]].
 +
 
 +
==General structure==
  
 
An OSIS document is a ''well-formed XML'' document, valid according to the ''OSIS schema''.
 
An OSIS document is a ''well-formed XML'' document, valid according to the ''OSIS schema''.
You can find the full normative description on the OSIS Website [http://www.bibletechnologies.net/].
+
You can the most current version [http://www.crosswire.org/~dmsmith/osis/osisCore.2.1.1-cw-latest.xsd here].<ref>See also [[OSIS Tutorial#The_Root_Element|root element]]</ref>
  
 
To produce a Bible, you can use this template:
 
To produce a Bible, you can use this template:
Line 24: Line 36:
 
xmlns:osis="http://www.bibletechnologies.net/2003/OSIS/namespace"
 
xmlns:osis="http://www.bibletechnologies.net/2003/OSIS/namespace"
 
xsi:schemaLocation="http://www.bibletechnologies.net/2003/OSIS/namespace http://www.bibletechnologies.net/osisCore.2.1.1.xsd">
 
xsi:schemaLocation="http://www.bibletechnologies.net/2003/OSIS/namespace http://www.bibletechnologies.net/osisCore.2.1.1.xsd">
<osisText osisIDWork="{NAME}" osisRefWork="bible" xml:lang="{LANG}" canonical="true">
+
<osisText osisIDWork="{NAME}" osisRefWork="Bible" xml:lang="{LANG}" canonical="true">
 
<header>
 
<header>
 
{HEADER}
 
{HEADER}
Line 37: Line 49:
 
With the following values:
 
With the following values:
 
; {NAME}: Normalized name of the Bible version (Usually 3 letters for language, 3 for translation)
 
; {NAME}: Normalized name of the Bible version (Usually 3 letters for language, 3 for translation)
; {LANG}: ISO-639 [http://lcweb.loc.gov/standards/iso639-2/langhome.html] language code
+
; {LANG}: IETF language code-- ISO 639-1 codes are preferred, and ISO 639-3 codes are preferred when ISO 639-1 codes do not exist for the given language. See [http://www.sil.org/iso639-3/codes.asp] for a list of codes.
 
; {HEADER}: Description of the included text; see below
 
; {HEADER}: Description of the included text; see below
 
; {BODY}: Text; see below
 
; {BODY}: Text; see below
  
For text without any character outside ASCII, you can use US-ASCII encoding (usually for english text). For every other language, please use UTF-8 and NFC. See the ''tools'' section if you need to convert.
+
For text without any character outside ASCII, you can use US-ASCII encoding (usually for english text). For every other language, please use UTF-8 and NFC. See the [[OSIS Bibles#Tools|tools]] section if you need to convert.
  
==Header==
+
<references />
 +
===Header===
 +
This is the minimum contents of the header required for validation. It is also what <tt>usfm2osis.pl</tt> produces. A valid OSIS header may include more than this.
 +
<header> 
 +
  <work osisWork="{Name}"/>
 +
</header>
  
 
+
===Body===
 
 
==Body==
 
  
 
Here is the general structure of the body content:
 
Here is the general structure of the body content:
 
 
<pre>
 
<pre>
 
<div type="bookGroup">
 
<div type="bookGroup">
Line 56: Line 70:
 
<div type="book" osisID="Gen" canonical="true">
 
<div type="book" osisID="Gen" canonical="true">
 
<title type="main" short="Genesis">Genesis</title>
 
<title type="main" short="Genesis">Genesis</title>
<chapter osisID="Gen.1" n="1">
+
<chapter osisID="Gen.1" chapterTitle="CHAPTER 1.">
<verse sID="Gen.1.1" osisID="Gen.1.1" n="1"/>In the
+
<title type="chapter">CHAPTER 1.</title>
beginning...<verse eID="Gen.1.1"/>
+
<verse sID="Gen.1.1" osisID="Gen.1.1"/>In the beginning ...
<verse sID="Gen.1.1" osisID="Gen.1.2" n="2"/>The earth was
+
<verse eID="Gen.1.1"/>
formless and void...<verse eID="Gen.1.2"/>
+
<verse sID="Gen.1.2" osisID="Gen.1.2"/>And the earth was without form ...
 +
<verse eID="Gen.1.2"/>
 
...
 
...
 
</chapter>
 
</chapter>
Line 66: Line 81:
 
</div>
 
</div>
 
</pre>
 
</pre>
 +
'''Notes:'''
 +
# The top level bookGroup division is not mandatory.
 +
# If they are used, the bookGroup division for the New Testament should have a similar structure.
 +
# Any '''div''' element defaults canonical to false. You need to set it to true on elements representing the structure of the original text.
  
Note any <tt>&lt;div&gt;</tt> defaults canonical to false. You need to set it to true on elements representing the structure of the original text.
+
===OSIS Milestones===
 +
OSIS allows for two potentially overlapping structures: Document structure (BSP) and verse structure (BCV).
  
=Examples=
+
Document structure is dominated by book, sections and paragraphs (BSP), additionally with titles, quotes and poetic material. While verse structure is indicated by book, chapter and verse numbers (BCV). While a SWORD module requires verse structure, the best way to encode a module with deep markup is with document structure. [[Osis2mod]] is responsible for transforming document structure into verse structure.
==Marking Paragraphs==
 
  
 +
Because these two systems can overlap and because XML does not allow for overlapping elements, OSIS defines a milestone mechanism for both document and verse structure elements.
 +
 +
For:
 +
&lt;X  ... attribute list ...>
 +
...
 +
&lt;/X>
 +
the milestoned form is:
 +
&lt;X sID="g1" ... attribute list .../>
 +
...
 +
&lt;X eID="g1"/>
 +
 +
According to the OSIS manual, for any given element X that defines a milestoneable form, all the instances of X in the document must use one form or the other and may not use both. The value of each sID attribute must be unique within the document.
 +
 +
Verse milestone sID/eID attributes can even have values that denote a verse range. This is purely for convenience to human readers. 
 +
 +
It is allowable to use milestone elements for verses alone, or for both verses and chapters. The body example above is for the former.
 +
 +
Although, the order of sID and osisID attributes within a milestone element is insignificant (as is the case for XML attributes in general), it helps for human readability to have the sID element first, such that it might be aligned with the corresponding eID element, as in the body example above.
 +
 +
==== Limitations of XML validators ====
 +
 +
An [[OSIS Bibles#Valid_OSIS_test|XML validator]] cannot validate whether OSIS milestones are used properly. It cannot validate:
 +
* that an element is consistently either milestoned or not.
 +
* that for each element with an sID that there is a paired element with an eID.<ref>osis2mod does not crash if the eID milestones are all missing, but the resulting module may appear to be void of text.</ref>
 +
* that each paired sID/eID have the same attribute value.
 +
* that different sID/eID pairs of the same element type do not overlap.<ref>If they weren't milestones, one would say they should be properly nested.</ref>
 +
* that the values of the osisID attributes are valid and correspond to the text demarcated by the verse milesones, etc.
 +
 +
'''Notes:'''
 +
<references />
 +
 +
==== Notes about OSIS elements ====
 +
* For an OSIS document to be valid it must use the non-milestoned &lt;div> and &lt;lg> elements.
 +
* There is no milestoned version of the &lt;p> element. From a practical perspective, this means that the milestoned verse element should be used when paragraphs are used.
 +
* The milestoned chapter element must be used when the paragraph is spanning over a chapter.
 +
* The SWORD engine cannot handle sub-identifiers separated by ! in an osisID, so osis2mod strips these off from the osisIDs for verses. They are only of use for the osisIDs for notes.
 +
 +
===Recommended approach===
 +
* For chapters, use &lt;chapter>...&lt;/chapter> container elements (except in the rare case that other container elements cross chapter boundaries)<ref>Conversion scripts such as [[Converting SFM Bibles to OSIS#usfm2osis.py|usfm2osis.py]] generally produce the milestone elements for chapters.</ref>
 +
* For verses, use milestone elements (unless container elements will suffice) &ndash; see [[OSIS Bibles/BSPExample]].
 +
* For paragraphs, use the &lt;p>...&lt;/p> container element
 +
* For poetry, use container elements &lt;lg>...&lt;/lg> to indicate stanzas (or other types of line groups) and &lt;l>...&lt;/l> to indicate lines
 +
* For quoted text, use the &lt;q>...&lt;/q> container element
 +
* For translation changes, use the &lt;transChange>...&lt;/transChange> container element<ref>Except where the text is within a &lt;w> which is not allowed by OSIS.<BR>For these cases use the alternative &lt;seg subType="x-added" type="x-transChange">...&lt;/seg><BR>The '''transChange''' element is still required to render the word[s] in italics if those attributes are omitted for the '''seg''' element as a simpler kludge.</ref>
 +
 +
<references />
 +
 +
====Pretty print?====
 +
Several popular XML editors and tools provide an option to '''pretty print''' the XML. We strongly advise against using this for OSIS Bibles. This is because '''pretty print''' generally does not preserve '''significant whitespace''', even if <code>xml:space="preserve"</code> is applied. It can cause the spaces between words to go missing and/or extra spaces to be inserted where they are unwarranted.
 +
 +
==Examples==
 +
===Marking Paragraphs===
 +
There is no milestoned version of the <tt>&lt;p></tt> element. Typically paragraphs surround whole verses. That is, they start and end between verses. If a paragraph begins or ends in a verse and extends beyond that verse, then the whole document must use the milestoned version of <tt>&lt;verse></tt>.
 
<pre>
 
<pre>
 
<div type="book" osisID="Gen" canonical="true">
 
<div type="book" osisID="Gen" canonical="true">
Line 108: Line 180:
 
|}
 
|}
  
Note: osis2mod converts a paragraph start into <tt>&lt;lb type="x-begin-paragraph"/></tt> and a paragraph end into <tt>&lt;lb type="x-end-paragraph"/></tt>. If these are between verses they are appended to the prior verse.
+
Note: [[osis2mod]] converts a paragraph start into <tt>&lt;div type="paragraph" sID="genX"/></tt> and a paragraph end into <tt>&lt;div type="paragraph" eID="genX"/></tt>.
  
==Marking Quotations==
+
===Marking Quotations===
Most of the Sword applications will show a chapter at a time and some will show isolated verses. This means that all of the Sword applications show partial quotations, such as the Sermon on the Mount which begins in Matt 5 and ends in Matt 7. For this reason, the milestoned version of the <tt>&lt;q&gt;</tt> should be used.
+
Most of the SWORD front-end applications can show a chapter at a time and some can show isolated verses. This means that all of the SWORD applications can show partial quotations, such as the Sermon on the Mount which begins in Matt 5 and ends in Matt 7.
  
===Default Quotation Marks===
+
====Default quotation marks====
By default, Sword will use <tt>"</tt> for quotations. The following discusses various ways to influence this.
+
By default, SWORD will use <tt>"</tt> for quotations. The following describes various ways to influence this.
  
===Indicating the nesting of a quote===
+
====Indicating the nesting of a quote====
 
When a quote is contained in a quote, it is customary to set the level attribute to indicate the depth of the nesting. For example, Jeremiah 23:38 is part of a larger quote and has a back and forth dialog of nested quotes:
 
When a quote is contained in a quote, it is customary to set the level attribute to indicate the depth of the nesting. For example, Jeremiah 23:38 is part of a larger quote and has a back and forth dialog of nested quotes:
  
Line 140: Line 212:
 
A couple of things to note about this verse. First, the level attribute is on both the sID and the eID pair, matching in value. Second, this is an example of a verse that has a quote that starts in the middle and finishes in another verse.
 
A couple of things to note about this verse. First, the level attribute is on both the sID and the eID pair, matching in value. Second, this is an example of a verse that has a quote that starts in the middle and finishes in another verse.
  
In this case, Sword will use the level to determine whether to use " or ' for quotes. Odd levels will use " and even levels will use '.
+
In this case, SWORD will use the level to determine whether to use <tt>"</tt> or <tt>'</tt> for quotes. Odd levels will use <tt>"</tt> and even levels will use <tt>'</tt>. This is in accordance with American English usage, which is the opposite of British English usage. Nesting levels up to five can be found in the Bible.<ref>Jeremiah 27:1-11; 29:1-28, 30-32; 34:1-5; and Ezekiel 1-36</ref>
  
===Supplying alternative quotation marks===
+
<references />
The quote element has a marker attribute that can be used to control the quotation marks. Sword applications will always use this value when rendering the quote. When the marker attribute is present but empty, it will render no quotation mark at all.
+
 
 +
====Supplying alternative quotation marks====
 +
The quote element has a marker attribute that can be used to control the quotation marks. SWORD applications will always use this value when rendering the quote. When the marker attribute is the null string, it will render no quotation mark at all.<ref>e.g. The KJV module has <tt>marker=""</tt> because the text of the KJV Bible does not use any quotation marks.</ref>
  
 
To specify "curly" quotes you can use the following values:
 
To specify "curly" quotes you can use the following values:
 
{|border=1
 
{|border=1
!Description!!Char!!HTML Entity
+
!Description!!Char!!HTML Entity!!Unicode
 
|-align=center
 
|-align=center
|Opening double quote||&#8220;||&amp;#8220;
+
|Opening double quote||&#8220;||&amp;#8220;||U+201C
 
|-align=center
 
|-align=center
|Closing double quote||&#8221;||&amp;#8221;
+
|Closing double quote||&#8221;||&amp;#8221;||U+201D
 
|-align=center
 
|-align=center
|Opening single quote||&#8216;||&amp;#8216;
+
|Opening single quote||&#8216;||&amp;#8216;||U+2018
 
|-align=center
 
|-align=center
|Closing single quote||&#8217;||&amp;#8217;
+
|Closing single quote||&#8217;||&amp;#8217;||U+2019
 
|}
 
|}
  
===Continuation Quotation Marks===
+
To use different marks to start and end a quote, use the milestoned version of the quote.
 +
&lt;q marker="&#8220;" sID="qN"/> ... &lt;q marker="&#8221;" eID="qN"/>
 +
 
 +
There is further information about English quotation marks and their usage in [http://en.wikipedia.org/wiki/Quotation_marks].
 +
 
 +
Quotation marks have a variety of forms in different languages and in different media. See [http://en.wikipedia.org/wiki/Quotation_mark,_non-English_usage Quotation mark, non-English usage].<ref>When modules are being converted from digitized source text used in other Bible software, it may be the case that quotation marks in the text source differ from those in the original published edition, whether due to inherent constraints of the other software, or for other causes.</ref>
 +
 
 +
'''Note:'''
 +
<references />
 +
 
 +
====Continuation quotation marks====
 
The <tt>&lt;milestone type="cQuote"/&gt;</tt> can be used to indicate the presence of a continued quote. If the marker attribute is present, it will use that otherwise it will use a straight double quote, ". Since there is no level attribute on the milestone element, it is best to specify the marker attribute.
 
The <tt>&lt;milestone type="cQuote"/&gt;</tt> can be used to indicate the presence of a continued quote. If the marker attribute is present, it will use that otherwise it will use a straight double quote, ". Since there is no level attribute on the milestone element, it is best to specify the marker attribute.
  
==Marking the Words of Christ==
+
====Marking the Words of Christ====
 +
To indicate that a quote is something that Jesus said<ref>http://en.wikipedia.org/wiki/Red_letter_edition</ref>, use the attribute <tt>who="Jesus"</tt>.
 +
 
 
<pre>
 
<pre>
 
<verse osisID="Luke.22.35 sID="Luke.22.35"/>
 
<verse osisID="Luke.22.35 sID="Luke.22.35"/>
Line 175: Line 261:
 
|}
 
|}
  
==Marking poetic material==
+
'''Note:'''
 +
<references />
 +
 
 +
===Marking poetic material===
 +
Poetry is marked up with &lt;lg&gt;, line group, and &lt;l&gt;, line, elements. The line element supports indentation with the level attribute. When the level attribute is not present or it is level="1", this should be interpreted as the first level of the line group. When level="2" it is indented relative to level="1". The same is true for each subsequent level. 
 +
 
 +
The level attribute is used to indicate indentation. A value of 1 means no indentation, the same as not specifying a level attribute. A value of 2 means to indent one. And so forth.
  
 
<pre>
 
<pre>
<chapter osisID="Exod.15" chapterTitle="Chapitre 15"><title type="chapter">Chapter 15</title>
+
  <chapter osisID="Exod.15" chapterTitle="Chapitre 15"><title type="chapter">Chapter 15</title>
<p>
+
    <p>
<verse sID="Exod.15.1" osisID="Exod.15.1" n="1"/>Then sang Moses and the children of
+
      <verse sID="Exod.15.1" osisID="Exod.15.1" n="1"/>
Israel this song unto the LORD, and spake, saying,
+
      Then sang Moses and the children of Israel this song unto the LORD, and spake, saying,
</p>
+
    </p>
<lg>
+
    <lg>
<l>I will sing unto the LORD, for he hath triumphed gloriously: the horse and his rider
+
      <l level="1">I will sing unto the LORD, for he hath triumphed gloriously:</l>
hath he thrown into the sea.</l><verse eID="Exod.15.1"/>
+
      <l level="2">the horse and his rider hath he thrown into the sea.</l>
 +
      <verse eID="Exod.15.1"/>
  
<verse sID="Exod.15.2" osisID="Exod.15.2" n="2"/><l>The LORD is my strength and song,
+
      <verse sID="Exod.15.2" osisID="Exod.15.2" n="2"/>
and he is become my salvation: he is my God, and I will prepare him an habitation; my
+
      <l level="1">The LORD is my strength and song, and he is become my salvation:</l>
father's God, and I will exalt him.</l><verse eID="Exod.15.2"/>
+
      <l level="2">he is my God, and I will prepare him an habitation;</l>
 +
      <l level="2">my father's God, and I will exalt him.</l>
 +
      <verse eID="Exod.15.2"/>
  
<verse sID="Exod.15.3" osisID="Exod.15.3" n="3"/><l>The LORD is a man of war: the LORD
+
      <verse sID="Exod.15.3" osisID="Exod.15.3" n="3"/>
is his name.</l><verse eID="Exod.15.3"/>
+
      <l level="1">The LORD is a man of war:</l>
 +
      <l level="2">the LORD is his name.</l>
 +
      <verse eID="Exod.15.3"/>
  
<verse sID="Exod.15.4" osisID="Exod.15.4" n="4"/><l>Pharaoh's chariots and his host
+
      <verse sID="Exod.15.4" osisID="Exod.15.4" n="4"/>
hath he cast into the sea: his chosen captains also are drowned in the Red sea.</l>
+
      <l level="1">Pharaoh's chariots and his host hath he cast into the sea:</l>
<verse eID="Exod.15.4"/>
+
      <l level="2">his chosen captains also are drowned in the Red sea.</l>
 +
      <verse eID="Exod.15.4"/>
  
<verse sID="Exod.15.5" osisID="Exod.15.5" n="5"/><l>The depths have covered them: they  
+
      <verse sID="Exod.15.5" osisID="Exod.15.5" n="5"/>
sank into the bottom as a stone.</l><verse eID="Exod.15.5"/>
+
      <l level="1">The depths have covered them:</l>
 +
      <l level="2">they sank into the bottom as a stone.</l>
 +
      <verse eID="Exod.15.5"/>
 
...
 
...
 
</pre>
 
</pre>
Line 208: Line 308:
 
<sup>'''(1)'''</sup> Then sang Moses and the children of Israel this song unto the LORD, and spake, saying,
 
<sup>'''(1)'''</sup> Then sang Moses and the children of Israel this song unto the LORD, and spake, saying,
  
::''I will sing unto the LORD, for he hath triumphed gloriously: the horse and his rider hath he thrown into the sea.''
+
::''I will sing unto the LORD, for he hath triumphed gloriously:''
 +
::::''the horse and his rider hath he thrown into the sea.''
  
::<sup>'''(2)'''</sup> ''The LORD is my strength and song, and he is become my salvation: he is my God, and I will repare him an habitation; my father's God, and I will exalt him.''
+
::<sup>'''(2)'''</sup> ''The LORD is my strength and song, and he is become my salvation:''
 +
::::''he is my God, and I will prepare him an habitation;''
 +
::::''my father's God, and I will exalt him.''
  
::<sup>'''(3)'''</sup> ''The LORD is a man of war: the LORD is his name.''
+
::<sup>'''(3)'''</sup> ''The LORD is a man of war:''
 +
::::''the LORD is his name.''
  
::<sup>'''(4)'''</sup> ''Pharaoh's chariots and his host hath he cast into the sea: his chosen captains also are drowned in the Red sea.''
+
::<sup>'''(4)'''</sup> ''Pharaoh's chariots and his host hath he cast into the sea:''
 +
::::''his chosen captains also are drowned in the Red sea.''
  
::<sup>'''(5)'''</sup> ''The depths have covered them: they sank into the bottom as a stone.''
+
::<sup>'''(5)'''</sup> ''The depths have covered them:''
 +
::::''they sank into the bottom as a stone.''
 
|}
 
|}
  
==Marking with Strong's Numbers==
+
'''Note:'''
To mark up Strong's numbers, you first need to declare a workID in the header of the OSIS document:
+
 
 +
# While OSIS defines a milestoned version of the <tt><lg></tt> element, its use (rather than the container version) will not produce a valid XML document. The <tt><l></tt> element can only occur within an <tt><lg></tt> container, so use of <tt><lg/></tt> milestones prevents use of <tt><l></tt> elements.
 +
 
 +
==== Marking acrostic poetry headings ====
 +
Use <tt>type="acrostic"</tt> as the title attribute for the stanza headings in acrostic passages such as Psalm 119. Whether or not this Psalm uses a poetry line group for each stanza, each title element should be placed before its related stanza.
 +
 
 +
=== Marking lemmas & morphology ===
 +
====Marking Strong's numbers====
 +
To mark up [http://en.wikipedia.org/wiki/Strong%27s_Concordance Strong's numbers], you first need to declare a '''workID''' in the header of the OSIS document:
 
<pre>
 
<pre>
 
   <header>
 
   <header>
Line 231: Line 345:
 
</pre>
 
</pre>
  
Sword does not actually use this declaration, but it is required to have a proper OSIS document.
+
SWORD does not actually use this declaration, but it is required to have a proper OSIS document.
  
And while OSIS allows arbitrary workIDs, Sword can only handle "strong" and a few variants.
+
And while OSIS allows arbitrary workIDs, SWORD can only handle "strong" and a few variants.
  
 
<pre>
 
<pre>
Line 241: Line 355:
 
The &lt;w&gt; element is used to surround the text that is represented by the Strong's number. It may be that the text is a phrase and it may be that more than one Strong's number defines the text.
 
The &lt;w&gt; element is used to surround the text that is represented by the Strong's number. It may be that the text is a phrase and it may be that more than one Strong's number defines the text.
  
When more than one Strong's number defines the text, each must be prefixed with a workID and must be separated from each other by a space. (While OSIS allows for the defining of default workIDs, Sword requires that the workIDs be used.)
+
When more than one Strong's number defines the text, each must be prefixed with a '''workID''' and must be separated from each other by a space. (While OSIS allows for the defining of default workIDs, SWORD requires that the workIDs be used.)
 +
 
 +
The actual Strong's Number should indicate whether it is Hebrew (H) or Greek (G) followed by the number. The number may be 0 padded up to 5 digits as in H00001.
  
The actual Strong's Number should indicate whether it is Hebrew (H) or Greek (G) followed by the number. The number can be 0 filled up to 5 digits as in H00001.
+
=====Marking Strong's splits =====
 +
Sometimes a single word in the Hebrew or Greek gets split in the Bible translation such that there's another word in between the two words that correspond to the one in the original language. Such instances can be marked up using the attribute <code>type="x-split-##"</code> where <code>##</code> is a serial number unique to each instance in the OSIS Bible.
 +
 
 +
Example 1: (in '''Exod.23.25''' for the KJV)
 +
<pre>
 +
<w lemma="strong:H05493" morph="strongMorph:TH8689" type="x-split-65">and I will take</w>
 +
<w lemma="strong:H04245">sickness</w>
 +
<w lemma="strong:H05493" morph="strongMorph:TH8689" type="x-split-65">away</w>
 +
</pre>
 +
Example 2: (in '''Mark.14.44''' for the KJV)
 +
<pre>
 +
<w src="17" lemma="strong:G520 lemma.TR:απαγαγετε" morph="robinson:V-2AAM-2P" type="x-split-1227">lead</w>
 +
<transChange type="added">him</transChange>
 +
<w src="17" lemma="strong:G520 lemma.TR:απαγαγετε" morph="robinson:V-2AAM-2P" type="x-split-1227">away</w>
 +
</pre>
  
==Marking with Morphology==
+
====Marking morphology====
In a similar manner to marking with Strong's numbers, morphology can also be noted. Since morphology regards the original language, Strong's numbers will be shown at the same time.
+
In a similar manner to marking with Strong's numbers, morphology can also be marked. Since morphology regards the original language, Strong's numbers will be shown at the same time.
  
As with Strong's numbers, a workID needs to be defined. Here we are defining one for Robinson's Morphology Codes. And while Sword will ignore this declaration, "robinson" is hard-coded into Sword for Greek morphology codes.
+
As with Strong's numbers, a '''workID''' needs to be defined. Here we are defining one for Robinson's Morphology Codes. And while SWORD will ignore this declaration, "robinson" is hard-coded into SWORD for Greek morphology codes.
 
<pre>
 
<pre>
 
   <header>
 
   <header>
Line 263: Line 393:
 
<w lemma="strong:G3588 strong:G80" morph="robinson:T-APM robinson:N-APM" src="7 8">his brethren</w>
 
<w lemma="strong:G3588 strong:G80" morph="robinson:T-APM robinson:N-APM" src="7 8">his brethren</w>
 
</pre>
 
</pre>
In this example, lemma, morph and src form parallel arrays. The first strong: mapping to the first robinson: and the first src value.
+
In this example, '''lemma''', '''morph''' and '''src''' form parallel arrays; the first '''strong:''' mapping to the first '''robinson:''' and the first '''src''' value, etc.<ref>There should never be more items in the '''morph''' attribute than there are strong numbers in the '''lemma''' attribute!</ref>
 +
 
 +
The '''workID''' should be name of a current, future, or potential lexicon module in which the morphology code could be looked up. For example, morph="packard:D" represents a reference to morphology code "D" in a module named Packard, whether or not a Packard module has been created or released. (Currently, SWORD offers lexicon modules named Robinson and Packard, both for Greek morphology.)
 +
 
 +
The '''src''' attribute is used here to indicate the word position in the original Greek.
 +
 
 +
'''Note:'''
 +
 
 +
<references />
 +
 
 +
====Marking other lemmas====
 +
The '''lemma''' attribute of the <tt>&lt;w&gt;</tt> element can contain any number of other lemmas. Like Strong's numbers and morphology codes, these need to have a '''workID''' declared in the header. SWORD presumes that these lemma workIDs all start with "lemma." (note the final period). The portion of the workID following "lemma." should be name of a current, future, or potential lexicon module in which the lemma could be looked up. For example, lemma="lemma.TWOT:271" represents a reference to lemma #271 in a module named TWOT (i.e the Theological Workbook of the Old Testament), whether or not a TWOT module has been created or released. As far as SWORD is concerned, there can be any number of these space-delimited values in a lemma attribute and they can be in any order, even interspersed among the "strong:" lemmas.
 +
 
 +
Example of a '''lemma''' markup for the Greek words from the [https://en.wikipedia.org/wiki/Textus_Receptus TR] in the KJV module:
 +
<w lemma="strong:G976 lemma.TR:βιβλος" morph="robinson:N-NSF" src="1">The book</w>
 +
 
 +
SWORD has the ability to show or hide non-Strong's lemmas as a group. See [[DevTools:conf Files#Elements_required_for_proper_rendering|GlobalOptionFilter=OSISLemma]] &ndash; for OSIS texts having lemmas.
 +
 
 +
==== Marking glosses ====
 +
Gloss markup uses the <tt>gloss</tt> attribute of the <tt><w></tt> element. The syntax is illustrated by this line exported from the module JapMeiji.
 +
<pre>
 +
<w gloss="はじめ">元始</w>に<w gloss="かみ">神</w><w gloss="てんち">天地</w>を<w gloss="つくり">創造</w>たまへり
 +
</pre>
 +
Display of glosses can be toggled (in [[Choosing a SWORD program#Module_Support|compatible front-ends]]) by having this line in the [[DevTools:conf Files#Elements_required_for_proper_rendering|.conf file]]:
 +
GlobalOptionFilter=OSISGlosses
 +
==== Marking encoded transliterations ====
 +
:''Not to be confused with automatic transliteration using '''libICU''' that some SWORD front-ends support''.
 +
Transliteration markup uses the xlit attribute of the <w> element. The syntax is illustrated by this line exported from the module SP.
 +
<w gloss="in_beginnings" lemma="strong:H7225" morph="ב_ראשית" n="1" xlit="Latn:b_raShit">בראשית</w>
 +
Display of encoded transliterations can be toggled (in [[Choosing a SWORD program#Module_Support|compatible front-ends]]) by having this line in the [[DevTools:conf Files#Elements_required_for_proper_rendering|.conf file]]:
 +
GlobalOptionFilter=OSISXlit
  
The workID should be name of a current, future, or potential lexicon module in which the morphology code could be looked up. For example, morph="packard:D" represents a reference to morphology code "D" in a module named Packard, whether or not a Packard module has been created or released. (Currently, Sword offers lexicon modules named Robinson and Packard, both for Greek morphology.)
+
==== Marking enumerated words ====
 +
Enumerated words markup uses the '''n''' attribute in the '''<w>''' element. The syntax is illustrated in the previous subsection.
  
The src attribute is used here to indicate the word position in the original Greek.
+
Display of enumerated words can be toggled (in [[Choosing a SWORD program#Module_Support|compatible front-ends]]) by having this line in the [[DevTools:conf Files#Elements_required_for_proper_rendering|.conf file]]:
 +
GlobalOptionFilter=OSISEnum
  
==Marking with Other Lemma==
+
==== Marking morpheme segmentation ====
The lemma attribute of the <tt>&lt;w&gt;</tt> element can contain any number of other lemmas. Like Strong's numbers and morphology codes, these need to have a workID declared in the header. Sword presumes that these lemma workIDs all start with "lemma." (note the final period). The portion of the workID following "lemma." should be name of a current, future, or potential lexicon module in which the lemma could be looked up. For example, lemma="lemma.TWOT:271" represents a reference to lemma #271 in a module named TWOT (i.e the Theological Workbook of the Old Testament), whether or not a TWOT module has been created or released. As far as Sword is concerned, there can be any number of these space-delimited values in a lemma attribute and they can be in any order, even interspersed among the "strong:" lemmas.
 
  
Sword has the ability to show or hide non-Strong's lemmas as a group.
+
In languages such as Biblical Hebrew, parts of words may be split into semantic segments using the XML '''seg''' element, thus:
  
==Marking the Divine Name==
+
<w><seg type="x-morph">וַ</seg><seg type="x-morph">יִּקְרָ֨א</seg></w>
The <tt>&lt;divineName&gt;</tt> is reserved for translations of YHWH. These occur in the Old Testament as Lord, God and Yah. Not every Lord or God is a translation of this.
 
  
The content of the divineName element is the word Lord, God or Yah, not in all upper case (i.e. not LORD, GOD, or YAH). Sword will either convert it to small-caps or uppercase.
+
Display of morpheme segmentation<ref>Currently, only some JSword based front-ends seem to support this feature. The SWORD engine has the switch available, but no change in output is effected.</ref> can be toggled (in [[Choosing a SWORD program#Module_Support|compatible front-ends]]<ref>e.g. STEP Bible uses these structures to provide colour coding. It just uses 2 colours to show different parts, alternating between the two.</ref>) by having this line in the [[DevTools:conf Files#Elements_required_for_proper_rendering|.conf file]]:
 +
GlobalOptionFilter=OSISMorphSegmentation
 +
 
 +
'''Notes:'''
 +
<references />
 +
 
 +
===Marking the divine name===
 +
The <tt>&lt;divineName&gt;</tt> tag is reserved for representations of the tetragrammaton יהוה (YHWH). These occur in the Old Testament as <span style="font-variant: small-caps;">Lord</span>, <span style="font-variant: small-caps;">God</span> and <span style="font-variant: small-caps;">Yah</span>. Not every instance of Lord or God is a translation of this.
 +
 
 +
The content of the divineName element is the word Lord, God or Yah, not in all upper case (i.e. not LORD, GOD, or YAH). SWORD will either convert it to small-caps or uppercase.
  
 
Note, if it is the use is possessive it is permissible to have the following:
 
Note, if it is the use is possessive it is permissible to have the following:
 
<pre>
 
<pre>
 
   <divineName>Lord's</divineName>
 
   <divineName>Lord's</divineName>
 +
</pre>
 +
 +
However, it is inadvisable to include any other punctuation within the tag pair. Thus the following is '''not''' good practice (the quotation mark should ''precede'' the start tag):
 +
<pre>
 +
  <divineName>“God </divineName> .....”
 
</pre>
 
</pre>
  
Line 288: Line 462:
 
   <divineName><w lemma="strong:H3068">Lord's</w><divineName>
 
   <divineName><w lemma="strong:H3068">Lord's</w><divineName>
 
or
 
or
   <w lemma="H3068">of the <seg><divineName>Lord</divineName></seg></w>
+
   <w lemma="strong:H3068">of the <seg><divineName>Lord</divineName></seg></w>
 +
</pre>
 +
 
 +
The latter form uses a workaround to allow the embedding of <tt>&lt;divineName&gt;</tt> in a <tt>&lt;w&gt;</tt>, since OSIS does not allow for this, but does allow for <tt>&lt;seg&gt;</tt> to be in a <tt>&lt;w&gt;</tt> and to contain <tt>&lt;divineName&gt;</tt>.
 +
 
 +
'''Note:''' See also [[OSIS 211 CR#Allow_.3CdivineName.3E_within_.3Cw.3E|OSIS change requests: Allow <divineName> within <w>]].
 +
 
 +
===Marking sections and titles===
 +
A section is marked with:
 +
<pre>
 +
<div type="section">
 +
...
 +
</div>
 +
</pre>
 +
 
 +
In OSIS the <tt>&lt;title&gt;</tt> element is used to provide general headings. Titles should be placed at the top of the container that they title, not before.
 +
<pre>
 +
<div type="book">
 +
  <title>A book title</title>
 +
  <chapter>
 +
      <title type="chapter">A title chapter</title>
 +
      <div type="section">
 +
            <title>A section title</title>
 +
            ...
 +
      </div>
 +
      ...
 +
      </chapter>
 +
</div>
 
</pre>
 
</pre>
 +
Using <tt>type="chapter"</tt> or <tt>type="main"</tt> is needed by [[osis2mod]] to distinguish chapter titles from verse titles. When SWORD stores an OSIS document it does so as an index of verses. It has special indexes for book and chapter titles. SWORD does not store the <tt>&lt;verse></tt> tags. So when it comes to storing a title in the following verse, [[osis2mod]] generates special markup to indicate that the title stands before the verse. SWORD uses this to place the verse number.
 +
 +
It is recommended that chapter labels (converted from the USFM tag \cl) be coded like this (Malayalam) example:
 +
<title type="chapter" subType="chapterLabel">൧. അദ്ധ്യായം.</title>
 +
This ensures that these labels are not treated differently to other chapter titles.
  
The latter form uses a hack to allow the embedding of <tt>&lt;divineName&gt;</tt> in a <tt>&lt;w&gt;</tt>, since OSIS does not allow for this, but does allow for <tt>&lt;seg&gt;</tt> to be in a <tt>&lt;w&gt;</tt> and to contain <tt>&lt;divineName&gt;</tt>.
+
'''Note:'''
 +
# The <tt>&lt;head&gt;</tt> element is used to provide headings for tables, lists and cast groups. There are errors in the OSIS 2.1.1 manual that use the <tt>&lt;head&gt;</tt> incorrectly.
  
==Marking Section Headings==
+
==== Marking pre-verse titles ====
In OSIS the <tt>&lt;title&gt;</tt> element is used to provide general headings. The <tt>&lt;head&gt;</tt> element is used to provide headings for tables, lists and cast groups. There are errors in the OSIS 2.1.1 manual that use the <tt>&lt;head&gt;</tt> incorrectly.
+
There is no special markup for pre-verse titles. '''Osis2mod''' will determine what titles belong to the book, the chapter and otherwise if the above advice is followed.
  
If the <tt>&lt;title type="main"&gt;...&lt;/title&gt;</tt> or <tt>&lt;title type="chapter"&gt;...&lt;/title&gt;</tt> it is treated by Sword specially. These are treated by Sword as book and chapter titles. Otherwise, the title is attached to the immediately following verse.
+
'''Note:'''
 +
# See also [[OSIS pre-verse titles]].
  
When Sword stores an OSIS document it does so as an index of verses. Sword does not store the <tt>&lt;verse></tt> tags. So when it comes to storing a title in the following verse, osis2mod generates special markup to indicate that the title stands before the verse. Sword uses this to place the verse number.
+
====Marking parallel passage headings====
 +
These may be marked in a similar manner to [[OSIS Bibles#Marking_cross-references_notes|cross-reference notes]]. Example: (Welsh Beibl for Matt.3)
 +
<pre>
 +
<title type="parallel">(
 +
  <reference type="parallel" osisRef="Mark.1.1-Mark.1.8">Marc 1:1-8</reference>;
 +
  <reference type="parallel" osisRef="Luke.3.1-Luke.3.18">Luc 3:1-18</reference>;  
 +
  <reference type="parallel" osisRef="John.1.19-John.1.28">Ioan 1:19-28</reference>)
 +
</title>
 +
</pre>
 +
'''Notes:'''
 +
# Each parallel passage has a separate reference element.
 +
# Each reference element may have the optional attribute type="parallel".
 +
# Each reference element has a valid osisRef attribute. Typically this is for a contiguous range of verses.
 +
# The displayed reference for each passage uses the locale for the language.
 +
# Reference elements are typically separated by a semicolon and space.
 +
# The displayed title is typically enclosed within parentheses, which are not part of any reference element.
 +
# Localized book names may or may not be abbreviated. If abbreviations are used, ideally they should be used consistently throughout the whole Bible.
 +
# Abbreviated localized book names may or may not end with a full stop.
 +
# The localized chapter verse separator may be other than a colon.
 +
# The range specifier is typically a [http://en.wikipedia.org/wiki/Hyphen-minus hyphen-minus] but alternatively may be an [http://en.wikipedia.org/wiki/Dash#En_dash en dash].
 +
# Special care is required when:
 +
:*There is more than one passage listed for the same book, when typically the book name is given only for the first passage.
 +
:*There is more than one passage listed for the same chapter, when typically the chapter number is given only for the first passage.
 +
:*The parallel passage is in the same book, when typically the book name is omitted in the displayed reference.
 +
:*The parallel passage is one or more whole chapters (or psalms), when typically the verse numbers are omitted.
 +
:*The parallel passage is a range of verses in a "single chapter" book, when typically the chapter number is omitted.<br>Example: Jude 3-24 is sometimes used as parallel with 2 Peter 2.
 +
:*The parallel passage is a range that spans a chapter divison, and where the range separator might even be an [http://en.wikipedia.org/wiki/Dash#Em_dash em dash] or an [http://en.wikipedia.org/wiki/Dash#En_dash en dash] rather than a hyphen-minus. Example: Exodus 35:30—36:1
  
To further complicate the matter, newline producing elements such as <tt>&lt;div&gt;</tt> and <tt>&lt;p&gt;</tt> that are between verses are generally placed with the preceding verse, even if they follow the title in the OSIS document. This is to prevent newline between the verse number and the verse text.
+
===Marking notes===
 +
The note element can appear in any element that can contain text ''outside'' of the '''header''' element.
  
==Marking Notes==
+
NB. The examples in this section include the use of sub-identifiers in '''osisID'''s for notes.
 
<pre>
 
<pre>
 
<verse sID="Gen.1.1" osisID="Gen.1.1" n="1"/>Au commencement  
 
<verse sID="Gen.1.1" osisID="Gen.1.1" n="1"/>Au commencement  
Line 324: Line 559:
 
|}
 
|}
  
The note should be attached to what it refers to, either after (as is the case here) or before. There should no additional space surrounding the note, but only what is in the text.
+
The note should be attached to what it refers to, either ''after'' (as is the case here) or ''before''. There should no additional space surrounding the note, but only what is in the text.
 +
 
 +
These notes can have any type other than '''crossReference'''.
 +
 
 +
For more detailed information about the '''note''' element, please refer to sections '''8.3''' to '''8.6''' of the '''OSIS Reference Manual'''.
 +
 
 +
====Notes with an annotation reference====
 +
An OSIS file converted correctly from USFM files may contain notes with an annotation reference. The syntax should be like this:
 +
<note placement="foot"><reference type="annotateRef" osisRef="Matt.1.19">1:19</reference> The footnote informational text is here.</note>
 +
 
 +
An annotation reference is usually generated when converting from USFM for notes containing the tag <tt>\fr </tt> or <tt>\xo </tt>.<ref>The human readable reference is typically just the chapter and verse number, as in this example.</ref>
  
These notes can have any type other than crossReference.
+
This example is for a note tagged for a word in Matt.1.19. and it illustrates that the reference element has two attributes.
 +
* <tt>type="annotateRef"</tt> &ndash; this defines the note as having an annotation reference
 +
* <tt>osisRef="Matt.1.19"</tt> &ndash; this specifies the origin of the note, and makes the "1:19" display as a link in the footnote panel.
 +
The space at the start of the footnote text is required in order to separate the reference from the text.
  
==Marking Cross-References Notes==
+
'''Note:'''
Sword provides the ability for a user to show or hide cross-references. To achieve this you embed one or more <tt>&lt;reference&gt;</tt> elements in a <tt>&lt;note type="crossReference"&gt;...&lt;/note&gt;</tt>. If this is not done, then the cross-references will always show inline in the text.
+
<references />
 +
 
 +
====Marking cross-references notes====
 +
SWORD provides the ability for a user to show or hide cross-references. To achieve this you embed one or more <tt>&lt;reference&gt;</tt> elements in a <tt>&lt;note type="crossReference"&gt;...&lt;/note&gt;</tt>. If this is not done, then the cross-references will always show inline in the text.
  
 
<pre>
 
<pre>
<note type="crossReference" n="t" osisID="Jer.24.7!crossReference.t" osisRef="Jer.24.7">
+
<note type="crossReference" n="t" osisID="Jer.24.7!crossReference.t">
 
   <reference osisRef="Jer.32.39">ch. 32:39</reference>;  
 
   <reference osisRef="Jer.32.39">ch. 32:39</reference>;  
 
   <reference osisRef="Deut.30.6">Deut. 30:6</reference>;  
 
   <reference osisRef="Deut.30.6">Deut. 30:6</reference>;  
 
   <reference osisRef="Ezek.11.19">Ezek. 11:19</reference>;  
 
   <reference osisRef="Ezek.11.19">Ezek. 11:19</reference>;  
   <reference osisRef="Ezek.36.26-Ezek.36.27">36:26, 27</reference>
+
   <reference osisRef="Ezek.36.26-Ezek.36.27">36:26, 27</reference>.
 
</note>
 
</note>
 
</pre>
 
</pre>
Line 344: Line 595:
 
Regarding the <tt>&lt;note&gt;</tt> element:
 
Regarding the <tt>&lt;note&gt;</tt> element:
  
<tt>type="crossReference"</tt> is one of the predefined OSIS note types. Sword looks for this value to show/hide cross-references.
+
:<tt>type="crossReference"</tt> is one of the predefined OSIS note types. SWORD looks for this value to show/hide cross-references.
  
<tt>n="t"</tt> provides the author's desired footnote marker for the note. A couple of Sword applications use this, but most manufacture their own marker.
+
:<tt>n="t"</tt> is the marker for the cross-reference note. This can be either of the following:
 +
:# A serially allocated index letter in the range <tt>a-z</tt>.
 +
:# A custom xref note marker symbol (or scheme) as may be specified by the author/translator.
 +
:btw. Source text [[Converting SFM Bibles to OSIS|converted from USFM]] usually adopts method 1.
  
The osisID is given based upon the location of the note. In order to not conflict with the verse's osisID and to construct a unique id, the ! (extension mark) is used to further qualify. This is followed by the note's type and n value, separated by a dot.
+
:The given '''osisID''' is based upon the location of the note. In order to not conflict with the verse's osisID and to construct a unique id, the ! (extension mark, ''aka'' sub-identifier) syntax is used. This is further qualified by the note's '''type''' and '''n''' value, separated by a dot.
  
This note pertains to a single verse and it is given in osisRef.
+
:Observe the punctuation marks between references and (optionally) after the last reference. There is typically a space after each semicolon.
  
 +
:This note pertains to a single verse and it is given an '''osisRef'''.
  
 
Regarding the <tt>&lt;reference&gt;</tt> elements:
 
Regarding the <tt>&lt;reference&gt;</tt> elements:
  
The <tt>&lt;reference&gt;</tt> element is replaced by Sword with a link to the reference with the text of the element being shown as link text.
+
:The <tt>&lt;reference&gt;</tt> element is replaced by SWORD with a link to the reference with the text of the element being shown as link text.<ref>Some front-ends (e.g. Xiphos, PocketSword) never display the raw cross-reference text (the original text wrapped within each '''reference''' element. Instead they display a preview of each cross-referenced linked verse.</ref>
 +
 
 +
:While the '''osisRef''' can point to multiple verses, most SWORD applications cannot handle a link that goes to more than one verse or a contiguous range of verses. Xiphos (for example) just generates a verse list in the side panel.
 +
 
 +
:Here we see that each '''reference''' element is separated by punctuation (the semicolon at the end of each line).
 +
 
 +
Some of the notes given under [[OSIS Bibles#Marking_parallel_passage_headings|Marking parallel passage headings]] are also applicable here.
 +
 
 +
For non-English Bibles, the punctuation marks used as separators in displayed Scripture references can be different. Even with English, there can be variations between Bible versions.
  
While the osisRef can point to multiple verses, most Sword applications cannot handle a link that goes to more than one verse or a contiguous range of verses. Here we see that each reference is separated by punctuation.
+
Special care is required when the cross-reference includes additional prose text which is not a scripture reference. This often arises in [[Converting SFM Bibles to OSIS|Bibles converted from SFM]]. Unless and until this issue is fixed, it becomes almost impossible to parse the reference elements such that the OSIS references can be added automatically.
  
=osis2mod usage (> 1.5.9)=
+
Further care is often required to deal with minor errors of punctuation that translators are prone to make in footnotes with cross-references.
It is always best to use the most recent version of osis2mod and compiling it from SVN is best.
 
  
After the Sword 1.5.9 release, osis2mod was changed to take flags rather than positional arguments.
+
'''Note:'''
<pre>
+
<references />
usage: ./osis2mod <output/path> <osisDoc> [OPTIONS]
 
  -a                    augment module if exists (default is to create new)
 
  -z                    use ZIP compression (default no compression)
 
  -Z                    use LZSS compression (default no compression)
 
  -b <2|3|4>            compression block size (default 4):
 
                                2 - verse; 3 - chapter; 4 - book
 
  -c <cipher_key>        encipher module using supplied key
 
                                (default no enciphering)
 
</pre>
 
<b>&lt;output/path&gt;</b><br/>
 
This a path to any existing directory. It is best for it to be empty.
 
  
<b>&lt;osisDoc&gt;</b><br/>
+
====Marking references in Right to Left scripts====
This is a single, well-formed, valid OSIS document.
+
To ensure that the ''human readable'' reference is correctly displayed in Bibles with a '''Right to Left''' script, the translator[s] may have made judicious use of the special Unicode character '''RIGHT TO LEFT MARK''' ['''RLM'''] (U+200F).
  
<b>-a</b><br/>
+
The RLM being invisible, its presence may easily go unnoticed, yet script developers need to be aware of it. The procedure to convert the ''human readable'' reference to the ''machine readable'' '''osisRef''' value must ensure that the RLM is deleted from the latter.
Osis2mod can create a Bible all at once or incrementally, depending on the presence of the -a flag. This
 
provides for two abilities,
 
<ol>
 
<li>Assembling a Bible from book files:<br/>
 
<pre>
 
mkdir /tmp/mymodule
 
osis2mod /tmp/mymodule  matt.xml
 
osis2mod /tmp/mymodule -a mark.xml
 
...
 
osis2mod /tmp/mymodule -a rev.xml
 
</pre>
 
<b>Note:</b> The book files can be in any order. Sword will order them correctly in the index.
 
<li>Adding corrections to a Bible:<br/>
 
<pre>
 
osis2mod /tmp/mymodule -a fixes.xml
 
</pre>
 
<b>Note:</b> When fixes are put into the module they are appended to the data file and do not actually replace the verses. The index file is adjusted to point to the new place in the data file.
 
</ol>
 
  
<b>-z|-Z</b><br/>
+
=====Technical details=====
A Sword Bible can be compressed with Zip (-z) '''or''' LZSS (-Z). All of Sword's Bible modules are compressed with Zip. This saves significant space over an uncompressed module. Uncompressed modules are useful for debugging.
+
The key to understanding this is the exact placements of the '''RLM''' in references in a RtoL script.
  
<b>-b 2|3|4</b>
+
* before the colon separator between chapter and verse
This setting is only useful for a compressed module. The choice as to whether to use Verse (2), Chapter (3) or Book (4, the default) level compression depends upon the amount of data in the block. A typical Bible is best compressed book by book. A commentary, chapter by chapter. If the commentary is very robust and the amount of text per verse is really huge, then verse compression might make sense.
+
* before the hyphen/minus (or endash) used as the verse range separator
 +
* before an ''ordinary'' comma between verse numbers in a list (not required when the ''Arabic'' comma is used)
 +
* before the chapter number in the caller reference, because that becomes the start of the displayed footnote in SWORD apps.
  
All of Sword's compressed Bible modules are compressed by bookBasically, all of the verses in a block are compressed and appended to the data file. For this reason, the datafile cannot be uncompressed by anything other than the Sword and JSword libraries.
+
'''Example:'''
  
When creating the module by appending it is important to do so by whole compression block. That is, if blockType is Chapter, then the osisDoc needs to contain one or more whole chapters.
+
Here's a '''note''' element in the OSIS source: (of the UrduGeo module)
  
<b>-c cipherKey</b>
+
<code>
This is typically 16 characters in length, having no leading or trailing spaces, consisting of alternating sets of 4 alpha and 4 numeric characters, such as Aduf0274PjNq0328.
+
<note placement="foot"><reference type="annotateRef" osisRef="2Kgs.12.4">‏12‏:4</reference> <catchWord>مردم شماری کے ٹیکس: </catchWord>دیکھئے <seg type="x-nested"><reference osisRef="Exod.11.16-Exod.11.30">خروج 11‏:16‏-30</reference></seg> </note>
 +
</code>
  
=osis2mod usage (<= 1.5.9)=
+
Here's the same Unicode text converted to PCRE.
osis2mod is used to create Bible and Commentary modules. At this point in time,
 
the commentary needs to be (incorrectly) encoded as if it were a Bible, but with the verse
 
contents being the commentary on the verse.
 
  
<pre>
+
<code>
usage: osis2mod &lt;output/path&gt; &lt;osisDoc&gt; [createMod] [compressType [blockType [cipherKey]]]
+
<note placement="foot"><reference type="annotateRef" osisRef="2Kgs.12.4">\x{200F}12\x{200F}:4</reference> <catchWord>\x{0645}\x{0631}\x{062F}\x{0645} \x{0634}\x{0645}\x{0627}\x{0631}\x{06CC} \x{06A9}\x{06D2} \x{0679}\x{06CC}\x{06A9}\x{0633}: </catchWord>\x{062F}\x{06CC}\x{06A9}\x{06BE}\x{0626}\x{06D2} <seg type="x-nested"><reference osisRef="Exod.11.16-Exod.11.30">\x{062E}\x{0631}\x{0648}\x{062C} 11\x{200F}:16\x{200F}-30</reference></seg> </note>
  createMod  : (default 0): 0 - create  1 - augment
+
</code>
  compressType: (default 0): 0 - no compression  1 - LZSS    2 - Zip
 
  blockType  : (default 4): 2 - verses  3 - chapters  4 - books
 
  cipherkey  : ascii string for module encryption
 
</pre>
 
  
<b>&lt;output/path&gt;</b><br/>
+
Observe the '''four''' instances of <code>\x{200F}</code> which is the '''RLM''' described above.
This a path to any existing directory. It is best for it to be empty.
 
  
<b>&lt;osisDoc&gt;</b><br/>
+
:''Further details to be added to illustrate the footnote as displayed in Xiphos.''
This is a single, well-formed, valid OSIS document.
 
  
<b>createMod</b><br/>
+
===Marking variants===
Osis2mod can create a Bible all at once or incrementally, depending on the setting of createMod. This
+
SWORD recognizes the element '''seg''' with '''type="x-variant"''' as marking variants present in different versions of a text<ref>This feature requires the front-end to have been compiled with SWORD version 1.7 or later.</ref>. The attribute '''subType''' should be added, with a value of '''"x-1"''' or '''"x-2"''' to indicate whether the reading is the primary or secondary variant. At present, SWORD supports only 2 different readings per text. The method is illustrated below:
provides for two abilities,
 
<ol>
 
<li>Assembling a Bible from book files:<br/>
 
 
<pre>
 
<pre>
mkdir /tmp/mymodule
+
The text of the Bible
osis2mod /tmp/mymodule 0 matt.xml
+
<seg type="x-variant" subType="x-1">may </seg>
osis2mod /tmp/mymodule 1 mark.xml
+
<seg type="x-variant" subType="x-2">can </seg>
...
+
contain variant readings.
osis2mod /tmp/mymodule 1 rev.xml
 
 
</pre>
 
</pre>
<b>Note:</b> The book files can be in any order. Sword will order them correctly in the index.
+
This illustrates a primary reading "may " and a secondary reading "can ". Observe the space included in both seg elements. If these spaces were omitted, the variant words would be joined when displaying all readings.
<li>Adding corrections to a Bible:<br/>
+
 
<pre>
+
==== Filter ====
osis2mod /tmp/mymodule 1 fixes.xml
+
Variant readings in OSIS modules may be switched by the SWORD engine when the module conf file includes:
</pre>
+
GlobalOptionFilter=OSISVariants
<b>Note:</b> When fixes are put into the module they are appended to the data file and do not actually replace the verses. The index file is adjusted to point to the new place in the data file.
+
'''Note:'''
</ol>
+
<references/>
  
<b>compressType</b><br/>
+
==== Examples ====
All of Sword's Bible modules are compressed with Zip. This saves significant space over an uncompressed module. Uncompressed modules are useful for debugging.
+
* The TR module contains 246 locations where two such variants are marked.
 +
* The WHNU module contains 1473 locations where two variants are marked.
  
<b>blockType</b>
+
==== SWORD implementation ====
This setting is only useful for a compressed module. The choice as to whether to use Verse, Chapter or Book level compression depends upon the amount of data in the block. A typical Bible is best compressed book by book. A commentary, chapter by chapter. If the commentary is very robust and the amount of text per verse is really huge, then verse compression might make sense.
+
The SWORD API provides for these three choices:
 +
[Primary Reading|Secondary Reading|All Readings]
 +
Example: In Xiphos version 3.1.6 or later, the module context menu provides these three options for any module that has variants.
  
All of Sword's compressed Bible modules are compressed by book. Basically, all of the verses in a block are compressed and appended to the data file. For this reason, the datafile cannot be uncompressed by anything other than the Sword and JSword libraries.
+
SWORD does not supply its own delimiters to distinguish between variants. When displaying text that contains variants, it may not be obvious where these are located.
  
When creating the module by appending it is important to do so by whole compression block. That is, if blockType is Chapter, then the osisDoc needs to contain one or more whole chapters.
+
==Tools==
  
<b>cipherKey</b>
+
===Bible Technologies Group===
This is typically 16 characters in length, having no leading or trailing spaces, consisting of alternating sets of 4 alpha and 4 numeric characters, such as Aduf0274PjNq0328.
+
The '''BTG''' that sponsored the OSIS committee and hosted the OSIS schema no longer exists.
 +
References below that use the domain '''www.bibletechnologies.net''' will no longer work.
 +
The schema location therefore now needs to be for a local copy on your computer or to a copy hosted by CrossWire or elsewhere.
  
=Tools=
+
For more up to date details, see [[OSIS 211 CR]] which includes CrossWire's own updated schema.
==Charset conversion==
 
  
==XML well-formed test==
+
===Valid OSIS test===
 +
A valid XML document one that is well-formed and conforms to the formal definition provided in a schema (or DTD). A document cannot have elements, attributes, or entities not defined in the schema. A schema can also define ''how'' entities may be nested, the possible values of attributes, etc.
  
==Validate OSIS test==
+
Many programs capable of schema validation exist. Most [http://en.wikipedia.org/wiki/XML_editor XML editors] ([http://xml-copy-editor.sourceforge.net/ XML Copy Editor], [http://en.wikipedia.org/wiki/Oxygen_XML_Editor Oxygen], [http://en.wikipedia.org/wiki/XMLSpy XMLSpy], [http://www.topologi.com/ Topologi], etc.) support some sort of XML schema validation.  The Windows based text editor [http://en.wikipedia.org/wiki/Notepad%2B%2B Notepad++] supports Unicode and has an [https://github.com/morbac/xmltools XML Tools] plugin which can perform syntax checking and validation.
  
===GNU/Linux===
+
There are also some online facilities for XML validation, e.g. [http://www.freeformatter.com/xml-validator-xsd.html].
To check if a document is well formed, use <tt>xmlwf</tt> usually included in expat package; if your document is wrong, it will output error, and position (filename:row:col):
 
  
 +
====xmllint====
 +
libxml2, available for Linux, Windows, & MacOS, includes a command-line validator called xmllint. To check that a document is valid against OSIS schema, use the following command.
 
<pre>
 
<pre>
$ xmlwf goodfile.osis.xml
+
$ xmllint --noout --schema http://crosswire.org/osis/osisCore.2.1.1.xsd myfile.osis.xml
$ xmlwf badfile.xml
 
badfile.xml:11:0: no element found
 
$
 
 
</pre>
 
</pre>
  
To check if a document is valid against OSIS schema; use <tt>xmllint</tt> usually included in libxml2 package; you need Internet access to validate your document.
+
'''Notes:'''
 +
# Internet access is required to validate your document using a remote schema.
 +
# Even if you use a local copy of the OSIS schema, it calls on [http://www.w3.org/XML/1998/namespace] via HTTP.
 +
# Having a VPN connection may sometimes interfere with the use of XML validation tools.
 +
# Some firewalls or proxies also prevent or interfere with the use of these tools.
 +
 
 +
To install xmllint, simply install libxml2 via your distribution's standard package management system in Linux or download the Windows binary from our [http://www.crosswire.org/ftpmirror/pub/sword/utils/win32/ mirror]. You also need to install "libxml2-utils". In this last package is program "xmllint".
 +
 
 +
====lxml (Python)====
 +
[http://lxml.de/ lxml] is a toolkit that can be used within a Python script to validate an XML file.
 +
 
 +
"The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt."
 +
 
 +
====Online XML Validators====
 +
The external links section of http://en.wikipedia.org/wiki/XML_Validation lists at least three online validators. Some or all of these can validate against external XML schema.
 +
 
 +
==Creating a SWORD Module==
 +
Use [[osis2mod]] to create the module.
 +
 
 +
== See also ==
 +
*[[OSIS]] &ndash; a partial list of other OSIS related pages and external links.
  
<pre>
+
*[[Converting SFM Bibles to OSIS]] &ndash; describes how to prepare a single OSIS XML file for a Biblical text supplied as several USFM files.
$ xmllint --noout --schema http://www.bibletechnologies.net/osisCore.2.1.1.xsd myfile.osis.xml
 
</pre>
 
  
===Windows===
+
== External links ==
To check if an OSIS document is valid against OSIS schema, use this easy tool if your document is wrong, it will output error, and position.
 
  
[http://wp1066500.wp101.webpack.hosteurope.de/zef/content/view/62/56/ OSIS - Easy Modul Validator (osisCore.2.1.1) ]
+
* [http://en.wikipedia.org/wiki/List_of_Bible_verses_not_included_in_modern_translations List of Bible verses not included in modern translations]
  
==Perl and automatic text transformation==
+
* [http://xml.coverpages.org/DeRoseEML2004.pdf Markup Overlap: A Review and a Horse] &ndash; Steven DeRose (2004) Bible Technologies Group.
  
==XSLT and automatic XML transformation==
+
[[Category:Guides|OSIS Bibles]]
  
==make and Makefile==
+
[[Category:OSIS|OSIS Bibles]]
  
=Troubleshooting=
+
[[Category:Morphology|OSIS Bibles]]
==Difficulties with Notes and Headings==
 
If certain features, such as notes and headings, do not display properly in BibleDesktop, check to make sure that you haven't created a Lucene index for searching in the process of creating your module. If you have, be sure to delete the folder for that module. In Windows, go to C:\Documents and Settings\USER\.jsword\lucene\Sword. Find the folder for that module and delete it.
 

Latest revision as of 07:21, 23 July 2023

OSIS

OSIS is an XML Schema definition for Bibles and other Biblical research texts, which enables ministries and other organizations to collaborate more easily. Traditionally, these organizations have stored their documents in disparate, proprietary markups, making it difficult when they wish to share in service with each other. OSIS provides a common markup for multiple visions.

CrossWire is committed to supporting the OSIS initiative. We have developed OSIS import and export tools which work with our SWORD engine, making OSIS documents available to all of our SWORD software.

The latest OSIS Schema definition and supporting information was once available at: [1].
However, the BTG no longer exists. See http://ebible.org/osis/

Introduction

This page is for practical examples of how to encode a Bible in OSIS 2.1.1 for building a SWORD module with osis2mod. It represents CrossWire's experience and best practices in creating modules.

Every OSIS SWORD module must be created from a well-formed and valid OSIS 2.1.1 document. While it is a desirable goal for any such document to be acceptable, SWORD has some particular requirements which are discussed here.

The schema for OSIS 2.1.1 that was formally at [2] is preserved at http://crosswire.org/osis/osisCore.2.1.1.xsd.
We are maintaining an updated schema at: http://www.crosswire.org/~dmsmith/osis/osisCore.2.1.1-cw-latest.xsd that may be used in place of the official BibleTechnologies URL for validating OSIS files.
See also CrossWire updated schema.

The March 2006 version of the OSIS Manual may be found here (PDF).

A good example of an OSIS document can be found at http://www.crosswire.org/~dmsmith/kjv2006.
The latest releases are found under http://www.crosswire.org/~dmsmith/kjv2011/

See also OSIS Book Name Abbreviations.

General structure

An OSIS document is a well-formed XML document, valid according to the OSIS schema. You can the most current version here.[1]

To produce a Bible, you can use this template:

<?xml version="1.0" encoding="UTF-8"?>
<osis
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns="http://www.bibletechnologies.net/2003/OSIS/namespace"
	xmlns:osis="http://www.bibletechnologies.net/2003/OSIS/namespace"
	xsi:schemaLocation="http://www.bibletechnologies.net/2003/OSIS/namespace http://www.bibletechnologies.net/osisCore.2.1.1.xsd">
	<osisText osisIDWork="{NAME}" osisRefWork="Bible" xml:lang="{LANG}" canonical="true">
		<header>
			{HEADER}
		</header>
		<div type="bookGroup">
			{BODY}
		</div>
	</osisText>
</osis>

With the following values:

{NAME}
Normalized name of the Bible version (Usually 3 letters for language, 3 for translation)
{LANG}
IETF language code-- ISO 639-1 codes are preferred, and ISO 639-3 codes are preferred when ISO 639-1 codes do not exist for the given language. See [3] for a list of codes.
{HEADER}
Description of the included text; see below
{BODY}
Text; see below

For text without any character outside ASCII, you can use US-ASCII encoding (usually for english text). For every other language, please use UTF-8 and NFC. See the tools section if you need to convert.

  1. See also root element

Header

This is the minimum contents of the header required for validation. It is also what usfm2osis.pl produces. A valid OSIS header may include more than this.

<header>  
  <work osisWork="{Name}"/>
</header>

Body

Here is the general structure of the body content:

<div type="bookGroup">
	<title>Old Testament</title>
	<div type="book" osisID="Gen" canonical="true">
		<title type="main" short="Genesis">Genesis</title>
		<chapter osisID="Gen.1" chapterTitle="CHAPTER 1.">
			<title type="chapter">CHAPTER 1.</title>
			<verse sID="Gen.1.1" osisID="Gen.1.1"/>In the beginning ...
			<verse eID="Gen.1.1"/>
			<verse sID="Gen.1.2" osisID="Gen.1.2"/>And the earth was without form ...
			<verse eID="Gen.1.2"/>
			...
		</chapter>
	</div>
</div>

Notes:

  1. The top level bookGroup division is not mandatory.
  2. If they are used, the bookGroup division for the New Testament should have a similar structure.
  3. Any div element defaults canonical to false. You need to set it to true on elements representing the structure of the original text.

OSIS Milestones

OSIS allows for two potentially overlapping structures: Document structure (BSP) and verse structure (BCV).

Document structure is dominated by book, sections and paragraphs (BSP), additionally with titles, quotes and poetic material. While verse structure is indicated by book, chapter and verse numbers (BCV). While a SWORD module requires verse structure, the best way to encode a module with deep markup is with document structure. Osis2mod is responsible for transforming document structure into verse structure.

Because these two systems can overlap and because XML does not allow for overlapping elements, OSIS defines a milestone mechanism for both document and verse structure elements.

For:

<X  ... attribute list ...>
...
</X>

the milestoned form is:

<X sID="g1" ... attribute list .../>
...
<X eID="g1"/>

According to the OSIS manual, for any given element X that defines a milestoneable form, all the instances of X in the document must use one form or the other and may not use both. The value of each sID attribute must be unique within the document.

Verse milestone sID/eID attributes can even have values that denote a verse range. This is purely for convenience to human readers.

It is allowable to use milestone elements for verses alone, or for both verses and chapters. The body example above is for the former.

Although, the order of sID and osisID attributes within a milestone element is insignificant (as is the case for XML attributes in general), it helps for human readability to have the sID element first, such that it might be aligned with the corresponding eID element, as in the body example above.

Limitations of XML validators

An XML validator cannot validate whether OSIS milestones are used properly. It cannot validate:

  • that an element is consistently either milestoned or not.
  • that for each element with an sID that there is a paired element with an eID.[1]
  • that each paired sID/eID have the same attribute value.
  • that different sID/eID pairs of the same element type do not overlap.[2]
  • that the values of the osisID attributes are valid and correspond to the text demarcated by the verse milesones, etc.

Notes:

  1. osis2mod does not crash if the eID milestones are all missing, but the resulting module may appear to be void of text.
  2. If they weren't milestones, one would say they should be properly nested.

Notes about OSIS elements

  • For an OSIS document to be valid it must use the non-milestoned <div> and <lg> elements.
  • There is no milestoned version of the <p> element. From a practical perspective, this means that the milestoned verse element should be used when paragraphs are used.
  • The milestoned chapter element must be used when the paragraph is spanning over a chapter.
  • The SWORD engine cannot handle sub-identifiers separated by ! in an osisID, so osis2mod strips these off from the osisIDs for verses. They are only of use for the osisIDs for notes.

Recommended approach

  • For chapters, use <chapter>...</chapter> container elements (except in the rare case that other container elements cross chapter boundaries)[1]
  • For verses, use milestone elements (unless container elements will suffice) – see OSIS Bibles/BSPExample.
  • For paragraphs, use the <p>...</p> container element
  • For poetry, use container elements <lg>...</lg> to indicate stanzas (or other types of line groups) and <l>...</l> to indicate lines
  • For quoted text, use the <q>...</q> container element
  • For translation changes, use the <transChange>...</transChange> container element[2]
  1. Conversion scripts such as usfm2osis.py generally produce the milestone elements for chapters.
  2. Except where the text is within a <w> which is not allowed by OSIS.
    For these cases use the alternative <seg subType="x-added" type="x-transChange">...</seg>
    The transChange element is still required to render the word[s] in italics if those attributes are omitted for the seg element as a simpler kludge.

Pretty print?

Several popular XML editors and tools provide an option to pretty print the XML. We strongly advise against using this for OSIS Bibles. This is because pretty print generally does not preserve significant whitespace, even if xml:space="preserve" is applied. It can cause the spaces between words to go missing and/or extra spaces to be inserted where they are unwarranted.

Examples

Marking Paragraphs

There is no milestoned version of the <p> element. Typically paragraphs surround whole verses. That is, they start and end between verses. If a paragraph begins or ends in a verse and extends beyond that verse, then the whole document must use the milestoned version of <verse>.

<div type="book" osisID="Gen" canonical="true">
	<title type="main">LE PREMIER LIVRE DE MOÏSE dit LA GENÈSE</title>
	<chapter osisID="Gen.1" chapterTitle="Chapitre 1"><title type="chapter">Chapitre 1</title>
		<p>
			<verse sID="Gen.1.1" osisID="Gen.1.1" n="1"/>Au commencement Dieu créa les 
			cieux et la terre.<verse eID="Gen.1.1"/>
		</p>
		<p>
			<verse sID="Gen.1.2" osisID="Gen.1.2" n="2"/>Et la terre était désolation et
			vide, et il y avait des ténèbres sur la face de l'abîme. Et l'Esprit de Dieu
			planait sur la face des eaux.<verse eID="Gen.1.2"/>
		</p>
		<p>
			<verse sID="Gen.1.3" osisID="Gen.1.3" n="3"/>Et Dieu dit : Que la lumière
			soit. Et la lumière fut.<verse eID="Gen.1.3"/>
			<verse sID="Gen.1.4" osisID="Gen.1.4" n="4"/>Et Dieu vit la lumière, qu'elle
			était bonne ; et Dieu sépara la lumière d'avec les ténèbres.
			<verse eID="Gen.1.4"/>
			<verse sID="Gen.1.5" osisID="Gen.1.5" n="5"/>Et Dieu appela la lumière Jour ;
			et les ténèbres, il les appela Nuit. Et il y eut soir, et il y eut matin : 
			&#8212; premier jour.<verse eID="Gen.1.5"/>
		</p>
...
Result

(1) Au commencement Dieu créa les cieux et la terre.

(2) Et la terre était désolation et vide, et il y avait des ténèbres sur la face de l'abîme. Et l'Esprit de Dieu planait sur la face des eaux.

(3) Et Dieu dit : Que la lumière soit. Et la lumière fut. (4) Et Dieu vit la lumière, qu'elle était bonne ; et Dieu sépara la lumière d'avec les ténèbres. (5) Et Dieu appela la lumière Jour ; et les ténèbres, il les appela Nuit. Et il y eut soir, et il y eut matin : — premier jour.

Note: osis2mod converts a paragraph start into <div type="paragraph" sID="genX"/> and a paragraph end into <div type="paragraph" eID="genX"/>.

Marking Quotations

Most of the SWORD front-end applications can show a chapter at a time and some can show isolated verses. This means that all of the SWORD applications can show partial quotations, such as the Sermon on the Mount which begins in Matt 5 and ends in Matt 7.

Default quotation marks

By default, SWORD will use " for quotations. The following describes various ways to influence this.

Indicating the nesting of a quote

When a quote is contained in a quote, it is customary to set the level attribute to indicate the depth of the nesting. For example, Jeremiah 23:38 is part of a larger quote and has a back and forth dialog of nested quotes:

But if you say,
<q level="2" sID="1"/>
	The burden of the Lord,
<q level="2" eID="1"/>
thus says the Lord,
<q level="2" sID="3"/>
	Because you have said these words,
	<q level="3" sID="4"/>
		The burden of the Lord,
	<q level="3" eID="4"/>
	when I sent to you, saying,
	<q level="3" sID="5"/>
		You shall not say,
		<q level="4" sID="6"/>
			The burden of the Lord,
		<q level="4" eID="6"/>
	<q level="3" eID="5"/>

A couple of things to note about this verse. First, the level attribute is on both the sID and the eID pair, matching in value. Second, this is an example of a verse that has a quote that starts in the middle and finishes in another verse.

In this case, SWORD will use the level to determine whether to use " or ' for quotes. Odd levels will use " and even levels will use '. This is in accordance with American English usage, which is the opposite of British English usage. Nesting levels up to five can be found in the Bible.[1]

  1. Jeremiah 27:1-11; 29:1-28, 30-32; 34:1-5; and Ezekiel 1-36

Supplying alternative quotation marks

The quote element has a marker attribute that can be used to control the quotation marks. SWORD applications will always use this value when rendering the quote. When the marker attribute is the null string, it will render no quotation mark at all.[1]

To specify "curly" quotes you can use the following values:

Description Char HTML Entity Unicode
Opening double quote &#8220; U+201C
Closing double quote &#8221; U+201D
Opening single quote &#8216; U+2018
Closing single quote &#8217; U+2019

To use different marks to start and end a quote, use the milestoned version of the quote.

<q marker="“" sID="qN"/> ... <q marker="”" eID="qN"/>

There is further information about English quotation marks and their usage in [4].

Quotation marks have a variety of forms in different languages and in different media. See Quotation mark, non-English usage.[2]

Note:

  1. e.g. The KJV module has marker="" because the text of the KJV Bible does not use any quotation marks.
  2. When modules are being converted from digitized source text used in other Bible software, it may be the case that quotation marks in the text source differ from those in the original published edition, whether due to inherent constraints of the other software, or for other causes.

Continuation quotation marks

The <milestone type="cQuote"/> can be used to indicate the presence of a continued quote. If the marker attribute is present, it will use that otherwise it will use a straight double quote, ". Since there is no level attribute on the milestone element, it is best to specify the marker attribute.

Marking the Words of Christ

To indicate that a quote is something that Jesus said[1], use the attribute who="Jesus".

	<verse osisID="Luke.22.35 sID="Luke.22.35"/>
	Then Jesus asked them, <q who="Jesus" marker="">When I sent you without purse,
	bag or sandals, did you lack anything?</q>
	<verse eID="Luke.22.35"/>
Result

Then Jesus asked them, When I sent you without purse, bag or sandals, did you lack anything?

Note:

  1. http://en.wikipedia.org/wiki/Red_letter_edition

Marking poetic material

Poetry is marked up with <lg>, line group, and <l>, line, elements. The line element supports indentation with the level attribute. When the level attribute is not present or it is level="1", this should be interpreted as the first level of the line group. When level="2" it is indented relative to level="1". The same is true for each subsequent level.

The level attribute is used to indicate indentation. A value of 1 means no indentation, the same as not specifying a level attribute. A value of 2 means to indent one. And so forth.

  <chapter osisID="Exod.15" chapterTitle="Chapitre 15"><title type="chapter">Chapter 15</title>
    <p>
      <verse sID="Exod.15.1" osisID="Exod.15.1" n="1"/>
      Then sang Moses and the children of Israel this song unto the LORD, and spake, saying,
    </p>
    <lg>
      <l level="1">I will sing unto the LORD, for he hath triumphed gloriously:</l>
      <l level="2">the horse and his rider hath he thrown into the sea.</l>
      <verse eID="Exod.15.1"/>

      <verse sID="Exod.15.2" osisID="Exod.15.2" n="2"/>
      <l level="1">The LORD is my strength and song, and he is become my salvation:</l>
      <l level="2">he is my God, and I will prepare him an habitation;</l>
      <l level="2">my father's God, and I will exalt him.</l>
      <verse eID="Exod.15.2"/>

      <verse sID="Exod.15.3" osisID="Exod.15.3" n="3"/>
      <l level="1">The LORD is a man of war:</l>
      <l level="2">the LORD is his name.</l>
      <verse eID="Exod.15.3"/>

      <verse sID="Exod.15.4" osisID="Exod.15.4" n="4"/>
      <l level="1">Pharaoh's chariots and his host hath he cast into the sea:</l>
      <l level="2">his chosen captains also are drowned in the Red sea.</l>
      <verse eID="Exod.15.4"/>

      <verse sID="Exod.15.5" osisID="Exod.15.5" n="5"/>
      <l level="1">The depths have covered them:</l>
      <l level="2">they sank into the bottom as a stone.</l>
      <verse eID="Exod.15.5"/>
...
Result

(1) Then sang Moses and the children of Israel this song unto the LORD, and spake, saying,

I will sing unto the LORD, for he hath triumphed gloriously:
the horse and his rider hath he thrown into the sea.
(2) The LORD is my strength and song, and he is become my salvation:
he is my God, and I will prepare him an habitation;
my father's God, and I will exalt him.
(3) The LORD is a man of war:
the LORD is his name.
(4) Pharaoh's chariots and his host hath he cast into the sea:
his chosen captains also are drowned in the Red sea.
(5) The depths have covered them:
they sank into the bottom as a stone.

Note:

  1. While OSIS defines a milestoned version of the <lg> element, its use (rather than the container version) will not produce a valid XML document. The <l> element can only occur within an <lg> container, so use of <lg/> milestones prevents use of <l> elements.

Marking acrostic poetry headings

Use type="acrostic" as the title attribute for the stanza headings in acrostic passages such as Psalm 119. Whether or not this Psalm uses a poetry line group for each stanza, each title element should be placed before its related stanza.

Marking lemmas & morphology

Marking Strong's numbers

To mark up Strong's numbers, you first need to declare a workID in the header of the OSIS document:

  <header>
    ...
    <work osisWork="strong">
      <refSystem>Dict.Strongs</refSystem>
    </work>
    ...
  </header>

SWORD does not actually use this declaration, but it is required to have a proper OSIS document.

And while OSIS allows arbitrary workIDs, SWORD can only handle "strong" and a few variants.

<w lemma="strong:H0853 strong:H03045">knew</w>

The <w> element is used to surround the text that is represented by the Strong's number. It may be that the text is a phrase and it may be that more than one Strong's number defines the text.

When more than one Strong's number defines the text, each must be prefixed with a workID and must be separated from each other by a space. (While OSIS allows for the defining of default workIDs, SWORD requires that the workIDs be used.)

The actual Strong's Number should indicate whether it is Hebrew (H) or Greek (G) followed by the number. The number may be 0 padded up to 5 digits as in H00001.

Marking Strong's splits

Sometimes a single word in the Hebrew or Greek gets split in the Bible translation such that there's another word in between the two words that correspond to the one in the original language. Such instances can be marked up using the attribute type="x-split-##" where ## is a serial number unique to each instance in the OSIS Bible.

Example 1: (in Exod.23.25 for the KJV)

<w lemma="strong:H05493" morph="strongMorph:TH8689" type="x-split-65">and I will take</w>
<w lemma="strong:H04245">sickness</w>
<w lemma="strong:H05493" morph="strongMorph:TH8689" type="x-split-65">away</w>

Example 2: (in Mark.14.44 for the KJV)

<w src="17" lemma="strong:G520 lemma.TR:απαγαγετε" morph="robinson:V-2AAM-2P" type="x-split-1227">lead</w>
<transChange type="added">him</transChange>
<w src="17" lemma="strong:G520 lemma.TR:απαγαγετε" morph="robinson:V-2AAM-2P" type="x-split-1227">away</w>

Marking morphology

In a similar manner to marking with Strong's numbers, morphology can also be marked. Since morphology regards the original language, Strong's numbers will be shown at the same time.

As with Strong's numbers, a workID needs to be defined. Here we are defining one for Robinson's Morphology Codes. And while SWORD will ignore this declaration, "robinson" is hard-coded into SWORD for Greek morphology codes.

  <header>
    ...
    <work osisWork="robinson">
      <refSystem>Dict.Robinson</refSystem>
    </work>
    ...
  </header>

Example markup of Robinson's Morphology Codes in the KJV module:

<w lemma="strong:G3588 strong:G80" morph="robinson:T-APM robinson:N-APM" src="7 8">his brethren</w>

In this example, lemma, morph and src form parallel arrays; the first strong: mapping to the first robinson: and the first src value, etc.[1]

The workID should be name of a current, future, or potential lexicon module in which the morphology code could be looked up. For example, morph="packard:D" represents a reference to morphology code "D" in a module named Packard, whether or not a Packard module has been created or released. (Currently, SWORD offers lexicon modules named Robinson and Packard, both for Greek morphology.)

The src attribute is used here to indicate the word position in the original Greek.

Note:

  1. There should never be more items in the morph attribute than there are strong numbers in the lemma attribute!

Marking other lemmas

The lemma attribute of the <w> element can contain any number of other lemmas. Like Strong's numbers and morphology codes, these need to have a workID declared in the header. SWORD presumes that these lemma workIDs all start with "lemma." (note the final period). The portion of the workID following "lemma." should be name of a current, future, or potential lexicon module in which the lemma could be looked up. For example, lemma="lemma.TWOT:271" represents a reference to lemma #271 in a module named TWOT (i.e the Theological Workbook of the Old Testament), whether or not a TWOT module has been created or released. As far as SWORD is concerned, there can be any number of these space-delimited values in a lemma attribute and they can be in any order, even interspersed among the "strong:" lemmas.

Example of a lemma markup for the Greek words from the TR in the KJV module:

<w lemma="strong:G976 lemma.TR:βιβλος" morph="robinson:N-NSF" src="1">The book</w>

SWORD has the ability to show or hide non-Strong's lemmas as a group. See GlobalOptionFilter=OSISLemma – for OSIS texts having lemmas.

Marking glosses

Gloss markup uses the gloss attribute of the <w> element. The syntax is illustrated by this line exported from the module JapMeiji.

<w gloss="はじめ">元始</w>に<w gloss="かみ">神</w><w gloss="てんち">天地</w>を<w gloss="つくり">創造</w>たまへり

Display of glosses can be toggled (in compatible front-ends) by having this line in the .conf file:

GlobalOptionFilter=OSISGlosses

Marking encoded transliterations

Not to be confused with automatic transliteration using libICU that some SWORD front-ends support.

Transliteration markup uses the xlit attribute of the <w> element. The syntax is illustrated by this line exported from the module SP.

<w gloss="in_beginnings" lemma="strong:H7225" morph="ב_ראשית" n="1" xlit="Latn:b_raShit">בראשית</w>

Display of encoded transliterations can be toggled (in compatible front-ends) by having this line in the .conf file:

GlobalOptionFilter=OSISXlit

Marking enumerated words

Enumerated words markup uses the n attribute in the <w> element. The syntax is illustrated in the previous subsection.

Display of enumerated words can be toggled (in compatible front-ends) by having this line in the .conf file:

GlobalOptionFilter=OSISEnum

Marking morpheme segmentation

In languages such as Biblical Hebrew, parts of words may be split into semantic segments using the XML seg element, thus:

<w><seg type="x-morph">וַ</seg><seg type="x-morph">יִּקְרָ֨א</seg></w>

Display of morpheme segmentation[1] can be toggled (in compatible front-ends[2]) by having this line in the .conf file:

GlobalOptionFilter=OSISMorphSegmentation

Notes:

  1. Currently, only some JSword based front-ends seem to support this feature. The SWORD engine has the switch available, but no change in output is effected.
  2. e.g. STEP Bible uses these structures to provide colour coding. It just uses 2 colours to show different parts, alternating between the two.

Marking the divine name

The <divineName> tag is reserved for representations of the tetragrammaton יהוה (YHWH). These occur in the Old Testament as Lord, God and Yah. Not every instance of Lord or God is a translation of this.

The content of the divineName element is the word Lord, God or Yah, not in all upper case (i.e. not LORD, GOD, or YAH). SWORD will either convert it to small-caps or uppercase.

Note, if it is the use is possessive it is permissible to have the following:

   <divineName>Lord's</divineName>

However, it is inadvisable to include any other punctuation within the tag pair. Thus the following is not good practice (the quotation mark should precede the start tag):

   <divineName>“God </divineName> .....”

When also marking with Strong's numbers you will need to do it one of two ways:

   <divineName><w lemma="strong:H3068">Lord's</w><divineName>
or
   <w lemma="strong:H3068">of the <seg><divineName>Lord</divineName></seg></w>

The latter form uses a workaround to allow the embedding of <divineName> in a <w>, since OSIS does not allow for this, but does allow for <seg> to be in a <w> and to contain <divineName>.

Note: See also OSIS change requests: Allow <divineName> within <w>.

Marking sections and titles

A section is marked with:

<div type="section">
...
</div>

In OSIS the <title> element is used to provide general headings. Titles should be placed at the top of the container that they title, not before.

<div type="book">
   <title>A book title</title>
   <chapter>
       <title type="chapter">A title chapter</title>
       <div type="section">
            <title>A section title</title>
            ...
       </div>
       ...
      </chapter>
</div>

Using type="chapter" or type="main" is needed by osis2mod to distinguish chapter titles from verse titles. When SWORD stores an OSIS document it does so as an index of verses. It has special indexes for book and chapter titles. SWORD does not store the <verse> tags. So when it comes to storing a title in the following verse, osis2mod generates special markup to indicate that the title stands before the verse. SWORD uses this to place the verse number.

It is recommended that chapter labels (converted from the USFM tag \cl) be coded like this (Malayalam) example:

<title type="chapter" subType="chapterLabel">൧. അദ്ധ്യായം.</title>

This ensures that these labels are not treated differently to other chapter titles.

Note:

  1. The <head> element is used to provide headings for tables, lists and cast groups. There are errors in the OSIS 2.1.1 manual that use the <head> incorrectly.

Marking pre-verse titles

There is no special markup for pre-verse titles. Osis2mod will determine what titles belong to the book, the chapter and otherwise if the above advice is followed.

Note:

  1. See also OSIS pre-verse titles.

Marking parallel passage headings

These may be marked in a similar manner to cross-reference notes. Example: (Welsh Beibl for Matt.3)

<title type="parallel">(
  <reference type="parallel" osisRef="Mark.1.1-Mark.1.8">Marc 1:1-8</reference>; 
  <reference type="parallel" osisRef="Luke.3.1-Luke.3.18">Luc 3:1-18</reference>; 
  <reference type="parallel" osisRef="John.1.19-John.1.28">Ioan 1:19-28</reference>)
</title>

Notes:

  1. Each parallel passage has a separate reference element.
  2. Each reference element may have the optional attribute type="parallel".
  3. Each reference element has a valid osisRef attribute. Typically this is for a contiguous range of verses.
  4. The displayed reference for each passage uses the locale for the language.
  5. Reference elements are typically separated by a semicolon and space.
  6. The displayed title is typically enclosed within parentheses, which are not part of any reference element.
  7. Localized book names may or may not be abbreviated. If abbreviations are used, ideally they should be used consistently throughout the whole Bible.
  8. Abbreviated localized book names may or may not end with a full stop.
  9. The localized chapter verse separator may be other than a colon.
  10. The range specifier is typically a hyphen-minus but alternatively may be an en dash.
  11. Special care is required when:
  • There is more than one passage listed for the same book, when typically the book name is given only for the first passage.
  • There is more than one passage listed for the same chapter, when typically the chapter number is given only for the first passage.
  • The parallel passage is in the same book, when typically the book name is omitted in the displayed reference.
  • The parallel passage is one or more whole chapters (or psalms), when typically the verse numbers are omitted.
  • The parallel passage is a range of verses in a "single chapter" book, when typically the chapter number is omitted.
    Example: Jude 3-24 is sometimes used as parallel with 2 Peter 2.
  • The parallel passage is a range that spans a chapter divison, and where the range separator might even be an em dash or an en dash rather than a hyphen-minus. Example: Exodus 35:30—36:1

Marking notes

The note element can appear in any element that can contain text outside of the header element.

NB. The examples in this section include the use of sub-identifiers in osisIDs for notes.

	<verse sID="Gen.1.1" osisID="Gen.1.1" n="1"/>Au commencement 
	Dieu<note osisRef="Gen.1.1" osisID="Gen.1.1!1" n="1"><hi type="italic">en hébreu</hi> : Élohim,
	(<hi type="italic">pluriel d</hi>'Éloah, le Dieu suprême), la Déité, <hi type="italic">dans
	le sens absolu</hi>.</note> créa les cieux et la terre.<verse eID="Gen.1.1"/>

	<verse sID="Gen.1.2" osisID="Gen.1.2" n="2"/>Et la terre était désolation et 
	vide<note osisRef="Gen.1.2" osisID="Gen.1.2!1" n="2">le vide.</note>, et il y avait des ténèbres
	sur la face de l'abîme. Et l'Esprit de Dieu planait sur la face des eaux.<verse eID="Gen.1.2"/>
Result
  1. Au commencement Dieu¹ créa les cieux et la terre.
  2. Et la terre était désolation et vide², et il y avait des ténèbres sur la face de l'abîme. Et l'Esprit de Dieu planait sur la face des eaux.
  • ¹ en hébreu: Élohim, (pluriel d'Éloah, le Dieu suprême), la Déité, dans le sens absolu.
  • ² le vide.

The note should be attached to what it refers to, either after (as is the case here) or before. There should no additional space surrounding the note, but only what is in the text.

These notes can have any type other than crossReference.

For more detailed information about the note element, please refer to sections 8.3 to 8.6 of the OSIS Reference Manual.

Notes with an annotation reference

An OSIS file converted correctly from USFM files may contain notes with an annotation reference. The syntax should be like this:

<note placement="foot"><reference type="annotateRef" osisRef="Matt.1.19">1:19</reference> The footnote informational text is here.</note>

An annotation reference is usually generated when converting from USFM for notes containing the tag \fr or \xo .[1]

This example is for a note tagged for a word in Matt.1.19. and it illustrates that the reference element has two attributes.

  • type="annotateRef" – this defines the note as having an annotation reference
  • osisRef="Matt.1.19" – this specifies the origin of the note, and makes the "1:19" display as a link in the footnote panel.

The space at the start of the footnote text is required in order to separate the reference from the text.

Note:

  1. The human readable reference is typically just the chapter and verse number, as in this example.

Marking cross-references notes

SWORD provides the ability for a user to show or hide cross-references. To achieve this you embed one or more <reference> elements in a <note type="crossReference">...</note>. If this is not done, then the cross-references will always show inline in the text.

<note type="crossReference" n="t" osisID="Jer.24.7!crossReference.t">
  <reference osisRef="Jer.32.39">ch. 32:39</reference>; 
  <reference osisRef="Deut.30.6">Deut. 30:6</reference>; 
  <reference osisRef="Ezek.11.19">Ezek. 11:19</reference>; 
  <reference osisRef="Ezek.36.26-Ezek.36.27">36:26, 27</reference>.
</note>

Here is a breakdown.

Regarding the <note> element:

type="crossReference" is one of the predefined OSIS note types. SWORD looks for this value to show/hide cross-references.
n="t" is the marker for the cross-reference note. This can be either of the following:
  1. A serially allocated index letter in the range a-z.
  2. A custom xref note marker symbol (or scheme) as may be specified by the author/translator.
btw. Source text converted from USFM usually adopts method 1.
The given osisID is based upon the location of the note. In order to not conflict with the verse's osisID and to construct a unique id, the ! (extension mark, aka sub-identifier) syntax is used. This is further qualified by the note's type and n value, separated by a dot.
Observe the punctuation marks between references and (optionally) after the last reference. There is typically a space after each semicolon.
This note pertains to a single verse and it is given an osisRef.

Regarding the <reference> elements:

The <reference> element is replaced by SWORD with a link to the reference with the text of the element being shown as link text.[1]
While the osisRef can point to multiple verses, most SWORD applications cannot handle a link that goes to more than one verse or a contiguous range of verses. Xiphos (for example) just generates a verse list in the side panel.
Here we see that each reference element is separated by punctuation (the semicolon at the end of each line).

Some of the notes given under Marking parallel passage headings are also applicable here.

For non-English Bibles, the punctuation marks used as separators in displayed Scripture references can be different. Even with English, there can be variations between Bible versions.

Special care is required when the cross-reference includes additional prose text which is not a scripture reference. This often arises in Bibles converted from SFM. Unless and until this issue is fixed, it becomes almost impossible to parse the reference elements such that the OSIS references can be added automatically.

Further care is often required to deal with minor errors of punctuation that translators are prone to make in footnotes with cross-references.

Note:

  1. Some front-ends (e.g. Xiphos, PocketSword) never display the raw cross-reference text (the original text wrapped within each reference element. Instead they display a preview of each cross-referenced linked verse.

Marking references in Right to Left scripts

To ensure that the human readable reference is correctly displayed in Bibles with a Right to Left script, the translator[s] may have made judicious use of the special Unicode character RIGHT TO LEFT MARK [RLM] (U+200F).

The RLM being invisible, its presence may easily go unnoticed, yet script developers need to be aware of it. The procedure to convert the human readable reference to the machine readable osisRef value must ensure that the RLM is deleted from the latter.

Technical details

The key to understanding this is the exact placements of the RLM in references in a RtoL script.

  • before the colon separator between chapter and verse
  • before the hyphen/minus (or endash) used as the verse range separator
  • before an ordinary comma between verse numbers in a list (not required when the Arabic comma is used)
  • before the chapter number in the caller reference, because that becomes the start of the displayed footnote in SWORD apps.

Example:

Here's a note element in the OSIS source: (of the UrduGeo module)

<note placement="foot"><reference type="annotateRef" osisRef="2Kgs.12.4">‏12‏:4</reference> <catchWord>مردم شماری کے ٹیکس: </catchWord>دیکھئے <seg type="x-nested"><reference osisRef="Exod.11.16-Exod.11.30">خروج 11‏:16‏-30</reference></seg> </note>

Here's the same Unicode text converted to PCRE.

<note placement="foot"><reference type="annotateRef" osisRef="2Kgs.12.4">\x{200F}12\x{200F}:4</reference> <catchWord>\x{0645}\x{0631}\x{062F}\x{0645} \x{0634}\x{0645}\x{0627}\x{0631}\x{06CC} \x{06A9}\x{06D2} \x{0679}\x{06CC}\x{06A9}\x{0633}: </catchWord>\x{062F}\x{06CC}\x{06A9}\x{06BE}\x{0626}\x{06D2} <seg type="x-nested"><reference osisRef="Exod.11.16-Exod.11.30">\x{062E}\x{0631}\x{0648}\x{062C} 11\x{200F}:16\x{200F}-30</reference></seg> </note>

Observe the four instances of \x{200F} which is the RLM described above.

Further details to be added to illustrate the footnote as displayed in Xiphos.

Marking variants

SWORD recognizes the element seg with type="x-variant" as marking variants present in different versions of a text[1]. The attribute subType should be added, with a value of "x-1" or "x-2" to indicate whether the reading is the primary or secondary variant. At present, SWORD supports only 2 different readings per text. The method is illustrated below:

The text of the Bible
<seg type="x-variant" subType="x-1">may </seg>
<seg type="x-variant" subType="x-2">can </seg>
contain variant readings.

This illustrates a primary reading "may " and a secondary reading "can ". Observe the space included in both seg elements. If these spaces were omitted, the variant words would be joined when displaying all readings.

Filter

Variant readings in OSIS modules may be switched by the SWORD engine when the module conf file includes:

GlobalOptionFilter=OSISVariants

Note:

  1. This feature requires the front-end to have been compiled with SWORD version 1.7 or later.

Examples

  • The TR module contains 246 locations where two such variants are marked.
  • The WHNU module contains 1473 locations where two variants are marked.

SWORD implementation

The SWORD API provides for these three choices:

[Primary Reading|Secondary Reading|All Readings]

Example: In Xiphos version 3.1.6 or later, the module context menu provides these three options for any module that has variants.

SWORD does not supply its own delimiters to distinguish between variants. When displaying text that contains variants, it may not be obvious where these are located.

Tools

Bible Technologies Group

The BTG that sponsored the OSIS committee and hosted the OSIS schema no longer exists. References below that use the domain www.bibletechnologies.net will no longer work. The schema location therefore now needs to be for a local copy on your computer or to a copy hosted by CrossWire or elsewhere.

For more up to date details, see OSIS 211 CR which includes CrossWire's own updated schema.

Valid OSIS test

A valid XML document one that is well-formed and conforms to the formal definition provided in a schema (or DTD). A document cannot have elements, attributes, or entities not defined in the schema. A schema can also define how entities may be nested, the possible values of attributes, etc.

Many programs capable of schema validation exist. Most XML editors (XML Copy Editor, Oxygen, XMLSpy, Topologi, etc.) support some sort of XML schema validation. The Windows based text editor Notepad++ supports Unicode and has an XML Tools plugin which can perform syntax checking and validation.

There are also some online facilities for XML validation, e.g. [5].

xmllint

libxml2, available for Linux, Windows, & MacOS, includes a command-line validator called xmllint. To check that a document is valid against OSIS schema, use the following command.

$ xmllint --noout --schema http://crosswire.org/osis/osisCore.2.1.1.xsd myfile.osis.xml

Notes:

  1. Internet access is required to validate your document using a remote schema.
  2. Even if you use a local copy of the OSIS schema, it calls on [6] via HTTP.
  3. Having a VPN connection may sometimes interfere with the use of XML validation tools.
  4. Some firewalls or proxies also prevent or interfere with the use of these tools.

To install xmllint, simply install libxml2 via your distribution's standard package management system in Linux or download the Windows binary from our mirror. You also need to install "libxml2-utils". In this last package is program "xmllint".

lxml (Python)

lxml is a toolkit that can be used within a Python script to validate an XML file.

"The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt."

Online XML Validators

The external links section of http://en.wikipedia.org/wiki/XML_Validation lists at least three online validators. Some or all of these can validate against external XML schema.

Creating a SWORD Module

Use osis2mod to create the module.

See also

  • OSIS – a partial list of other OSIS related pages and external links.

External links