Difference between revisions of "OSIS 211 CR"

From CrossWire Bible Society
Jump to: navigation, search
(OSIS variants: removed "# SWORD should be enhanced to allow the option to display no variants")
m (CrossWire updated schema: removed "private ")
 
(123 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page is for recording potential change requests to the OSIS XML schema, and for defining our own updated schema.
+
This page is for recording potential change requests to the [http://bibletechnologies.net/ OSIS] 2.1.1 XML schema, and for defining our own updated schema.
 +
 
 +
==Bible Technologies Group==
 +
The '''BTG''' that sponsored the OSIS committee and hosted the OSIS schema no longer exists.
 +
References that use the domain '''www.bibletechnologies.net''' will no longer work.
 +
The schema location therefore now needs to be for a local copy on your computer or to a copy hosted by CrossWire or elsewhere<ref>e.g. It is also mirrored at http://eBible.org/osisCore.2.1.1.xsd</ref>.
 +
 
 +
'''Note:'''
 +
<references />
  
 
== OSIS 2.1.1 Change Requests ==
 
== OSIS 2.1.1 Change Requests ==
Line 6: Line 14:
 
== CrossWire updated schema ==
 
== CrossWire updated schema ==
 
An an interim measure, we are maintaining an updated validation schema based on the contents of this page.
 
An an interim measure, we are maintaining an updated validation schema based on the contents of this page.
:''Currently these are looking for a new home but are currently at'':
+
:''Currently these are looking for a new home<ref>A repository has been created on our Society's [https://gitlab.com/crosswire-bible-society/osis-schema GitLab group].</ref> but are currently at'':
  
 
  http://www.crosswire.org/~dmsmith/osis
 
  http://www.crosswire.org/~dmsmith/osis
Line 21: Line 29:
  
 
This URL may be used in place of the official [http://www.bibletechnologies.net/osisCore.2.1.1.xsd BibleTechnologies URL] for validating XML files submitted for modules.
 
This URL may be used in place of the official [http://www.bibletechnologies.net/osisCore.2.1.1.xsd BibleTechnologies URL] for validating XML files submitted for modules.
 +
 +
'''Notes:'''
 +
<references />
  
 
== Bugs ==
 
== Bugs ==
Line 70: Line 81:
 
There's a typo that allows "seq" in <cell> instead of "seg".
 
There's a typo that allows "seq" in <cell> instead of "seg".
 
--[[User:Osk|Osk]] 04:29, 22 February 2014 (MST)
 
--[[User:Osk|Osk]] 04:29, 22 February 2014 (MST)
 +
 +
==== Correct the osisRef syntax for a non-verse-keyed OSIS module ====
 +
 +
The following syntax for such an '''osisRef''' link currently works but does not validate to '''osisCore.2.1.1.xsd'''
 +
 +
Module:Div1/Div2/Div3
 +
 +
The schema expects a period in place of each solidus. See [[OSIS Genbooks#Internal_Links|here]].
 +
 +
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 16:27, 22 June 2020 (UTC)
  
 
=== Beta testing bugs ===
 
=== Beta testing bugs ===
Line 108: Line 129:
  
 
=== OSIS Validation ===
 
=== OSIS Validation ===
:''List OSIS constructs that currently fail to validate, yet which would be better to allow''.
+
:''List OSIS constructs that currently fail to validate, yet which would be better to allow, or vice versa''.
  
 
==== Allow <divineName> within <w> ====
 
==== Allow <divineName> within <w> ====
Line 115: Line 136:
 
'''Note:''' the most recent release of the KJV assumes that this has been fixed.
 
'''Note:''' the most recent release of the KJV assumes that this has been fixed.
 
--[[User:Dmsmith|Dmsmith]] 07:20, 23 February 2014 (MST)
 
--[[User:Dmsmith|Dmsmith]] 07:20, 23 February 2014 (MST)
 +
::This change would wrongly permit to have more than one '''divineName''' element within a '''w''' element. Though there are no circumstances where this would ever arise in a genuine Bible translation, it's still a risk that we now have this loophole in the schema. [[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 01:47, 25 May 2017 (MDT)
 +
:::Even so, the oirginal workaround with the '''seg''' element could also have permitted having more than one '''divineName''' element within a '''w''' element. The risk is not new! [[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 11:39, 25 May 2017 (MDT)
  
 
==== Allow <divineName> within <name> ====
 
==== Allow <divineName> within <name> ====
Line 178: Line 201:
 
Note: the most recent release of the KJV assumes that this has been fixed.
 
Note: the most recent release of the KJV assumes that this has been fixed.
 
--[[User:Dmsmith|Dmsmith]] 07:22, 23 February 2014 (MST)
 
--[[User:Dmsmith|Dmsmith]] 07:22, 23 February 2014 (MST)
 +
 +
==== Allow <transChange> within <inscription> ====
 +
Some translations of Rev.17.5 may require this.
 +
<verse osisID="Rev.17.5">in na njenem čelu <transChange type="added">je bilo</transChange> napisano ime: <inscription>SKRIVNOST, VÉLIKA <transChange type="added">[METROPOLA]</transChange> BABILON, MATI POCESTNIC<note type="study">POCESTNIC: ali, PREŠUŠTEV</note> IN OGABNOSTI ZEMLJE</inscription>.</verse>
 +
 +
It can be worked around using the usual '''seg''' kludge. [[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 22:13, 3 January 2019 (UTC)
  
 
==== Allow &lt;hi> within &lt;abbr> ====
 
==== Allow &lt;hi> within &lt;abbr> ====
Line 207: Line 236:
 
Currently, SWORD supports only 2 variants in the main text, and uses the following [[OSIS Bibles#Marking_variants|syntax]]:
 
Currently, SWORD supports only 2 variants in the main text, and uses the following [[OSIS Bibles#Marking_variants|syntax]]:
 
  <seg type="x-variant" subType="x-1">text </seg><seg type="x-variant" subType="x-2">text </seg>
 
  <seg type="x-variant" subType="x-1">text </seg><seg type="x-variant" subType="x-2">text </seg>
Variants shouldn't have to rely upon user-defined attribute values like this. It would be better to have a new '''variant''' attribute for the '''seg''' element whose value can be the variant number. Thus the proposed new syntax is:
+
Variants shouldn't have to rely upon user-defined attribute values like this. It would be better to have a new '''variant''' attribute for the '''seg''' element whose value can be the variant number. See [[OSIS 211 CR#variant|below]].
<seg variant="1">text </seg><seg variant="2">text </seg><seg variant="3">text </seg>
+
 
'''Requirements:'''
+
==== Grain operator @s ====
# The number of variants shouldn't be limited.
+
Currently, OSIS deems as invalid<ref>The string must ''first'' also match the following regular expression in "<tt>osisCore.2.1.1-cw-latest.xsd</tt>" called '''osisGenRegex''':<BR> <tt><xs:pattern value="((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_)+)*:)?([^:\s])+)"/></tt><BR>Essentially, this allows only numbers and letters or a low line, so it also excludes diacritics as separate Unicode characters.</ref> a grain operator string that has a '''typographical apostrophe''', or one that is hyphenated, whether by a '''hyphen''' or an '''endash''' as in the '''KJV''' module, e.g.
# Primary and Secondary reading terminology should be dropped in favour of specified variant names
+
<catchWord osisRef="Gen.16.14@s[Beer–lahai–roi]">Beer–lahai–roi</catchWord>
# The name for each numbered variant should be identified in the module .conf file.
+
Other likely punctuation marks (in Latin scripts at least) include period, comma, semicolon, colon, parentheses & the horizontal ellipsis between words.
'''Implementation details:'''
+
 
# SWORD must cope with variant text that is only part of a word as well as multiple words
+
A possible solution would be to extend the OSIS schema to permit the use of XML [https://en.wikipedia.org/wiki/Numeric_character_reference numerical character entities]<ref>These are already valid syntax within XML attributes.</ref> within an '''osisRef''' fine grain string.<ref>We would need to ensure that the module build tools in SWORD utilities do not automatically replace each such entity by the character.</ref><ref>The SWORD API would need to decode such entities before passing the string to the search function.</ref><ref>This would, in theory, also also facilitate the inclusion of any number of spaces within the string.</ref>
# The default should be for SWORD to display variant #1 which assumed to be the base text.
+
 
# Front-ends should have a UI option to select which variants should be displayed. See [https://crosswire.org/wiki/Choosing_a_SWORD_program#Module_Support]
+
The example would then become:
# How variants are to be displayed is at the discretion of the front-end developers.
+
<catchWord osisRef="Gen.16.14@s[Beer&amp;#x2013;lahai&amp;#x2013;roi]">Beer–lahai–roi</catchWord>
# If two or more variants are displayed simultaneously as in-line text, then suitable delimiters are required, unless (e.g.) colour coding is used to distinguish each variant.
+
'''Notes:'''
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]])
+
<references/>
 +
 
 +
==== osisID for title note ====
 +
The following is invalid:
 +
<note type="study" osisRef="Ps.4.1!title" osisID="Ps.4.1!title!note.a" n="a">
 +
cf. In the '''KJV''' module, 47 of the 116 canonical '''Psalm titles''' have such a note.
 +
 
 +
The recommended solution is to replace the second '''!''' by a '''period''', e.g.
 +
<note type="study" osisRef="Ps.4.1!title" osisID="Ps.4.1!title.note.a" n="a">
 +
One can have any number of periods as a word separator in this part of an osisID attribute value.
 +
 
 +
==== Allow osisRef attribute in transChange ====
 +
This would permit the following in (e.g.) '''2Sam.23.8''':
 +
<transChange type="added" subType="x-copied-from" osisRef="1Chr.11.11">he lift up his spear</transChange>
 +
cf. The words "''he lift up his spear''" was copied by the KJV translators from the parallel verse in '''1 Chronicles'''.
 +
 
 +
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 13:32, 10 May 2020 (UTC)
 +
 
 +
==== Disallow certain self-closing elements ====
 +
 
 +
===== Note element =====
 +
The following anomaly was discovered almost inadvertently while testing module '''HunUj''' from '''CrossWire Beta'''.
 +
 
 +
$$$Genesis 32:2
 +
Jákób is útnak indult, és találkoztak vele Isten angyalai.<note n="a" osisID="Gen.32.2!crossReference.a" osisRef="Gen.32.2" type="crossReference"/>
 +
 
 +
The '''note''' element was semantically incorrect, having no content on account of it being ''self-closing''. The mistake was not detected during validation to the OSIS schema. It's currently valid in '''OSIS 2.1.1''' and certainly passes XML syntax check.
 +
 
 +
The module bug was reported to the developer and has already been fixed in the source text.
 +
 
 +
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 18:11, 20 June 2020 (UTC)
 +
 
 +
===== Other elements =====
 +
By the same token, we should also disallow the following ''self-closing'' elements with no text:
 +
<pre>
 +
<abbr />
 +
<caption />
 +
<catchWord />
 +
<closer />
 +
<divineName />
 +
<foreign />
 +
<inscription />
 +
<list />
 +
<q />
 +
<rdg />
 +
<reference />
 +
<salute />
 +
<seg />
 +
<speaker />
 +
<speech />
 +
<table />
 +
<title />
 +
<transChange />
 +
<w />
 +
</pre>
 +
* ''This list is not exhaustive''.  
 +
* The rule should apply irrespective of any such an element having attributes.
 +
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 16:21, 23 June 2020 (UTC)
  
 
=== New Features ===
 
=== New Features ===
Line 232: Line 318:
 
Biblical Hebrew is an area where the usual priority of semantic markup over presentational markup cannot be taken for granted. <BR>[[User:David Haslam|David Haslam]]
 
Biblical Hebrew is an area where the usual priority of semantic markup over presentational markup cannot be taken for granted. <BR>[[User:David Haslam|David Haslam]]
  
These new '''hi''' types should be implemented in a way that retains the comaptibility with search features. A whole word should be wrapped, with the letters to be rendered specified by means of a further attribute value.
+
These new '''hi''' types should be implemented in a way that retains the compatibility with search features. A whole word should be wrapped, with the letters to be rendered specified by means of a further attribute value.
  
 
'''Note:'''
 
'''Note:'''
Line 240: Line 326:
 
See https://en.wikipedia.org/wiki/Qere_and_Ketiv
 
See https://en.wikipedia.org/wiki/Qere_and_Ketiv
  
A ketiv or qere can consist of one or more words, and so need to be grouped and related to one another.  I propose adding <ketiv> with @id, and <qere> with @idref, to contain the content (<w> elements) and allow validation of the connection.  A qere with no ketiv could be marked up without the @idref.
+
A '''ketiv''' or '''qere''' can consist of one or more words, and so need to be grouped and related to one another.  I propose adding <ketiv> with @id, and <qere> with @idref, to contain the content (<w> elements) and allow validation of the connection.  A '''qere''' with no '''ketiv''' could be marked up without the @idref.
  
 
: This sounds like a good application for <seg>. I would recommend named types for <seg> instead: ketiv & qere. --[[User:Osk|Osk]] 00:37, 23 February 2014 (MST)
 
: This sounds like a good application for <seg>. I would recommend named types for <seg> instead: ketiv & qere. --[[User:Osk|Osk]] 00:37, 23 February 2014 (MST)
Line 258: Line 344:
 
* type="Ethiopian"
 
* type="Ethiopian"
 
May be required as and when we support Bibles & Commentaries for the Ethiopian Orthodox Church.
 
May be required as and when we support Bibles & Commentaries for the Ethiopian Orthodox Church.
[[User:David Haslam|David Haslam]] 07:53, 22 January 2016 (MST)
+
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]])
  
 
==== Quotation types ====
 
==== Quotation types ====
Line 283: Line 369:
 
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]])
 
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]])
  
==== name type="ethnic" ====
+
==== New attributes for the &lt;name&gt; element ====
 +
===== name type="ethnic", etc =====
 
Allow <code>type="ethnic"</code> as an attribute of the '''name''' element to identify '''ethnic''' names, etc.<BR>
 
Allow <code>type="ethnic"</code> as an attribute of the '''name''' element to identify '''ethnic''' names, etc.<BR>
 
Allow <code>subType="people-group"</code> as an attribute of the '''name''' element to identify tribes, etc.<BR>
 
Allow <code>subType="people-group"</code> as an attribute of the '''name''' element to identify tribes, etc.<BR>
Line 289: Line 376:
 
Both these '''subType''' values may be used together with either <code>type="ethnic"</code> or <code>type="geographic"</code>. [[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]])
 
Both these '''subType''' values may be used together with either <code>type="ethnic"</code> or <code>type="geographic"</code>. [[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]])
  
==== name type="book" ====
+
===== name type="book" =====
 
Allow <code>type="book"</code> as an attribute of the '''name''' element to identify [http://en.wikipedia.org/wiki/Non-canonical_books_referenced_in_the_Bible Non-canonical books referenced in the Bible].
 
Allow <code>type="book"</code> as an attribute of the '''name''' element to identify [http://en.wikipedia.org/wiki/Non-canonical_books_referenced_in_the_Bible Non-canonical books referenced in the Bible].
  
==== name type="calendric" ====
+
===== name type="calendric" =====
 
Allow <code>type="calendric"</code> as an attribute of the '''name''' element to identify calendar objects.<BR>
 
Allow <code>type="calendric"</code> as an attribute of the '''name''' element to identify calendar objects.<BR>
 
Allow <code>subType="month"</code> to identify named months in the [https://en.wikipedia.org/wiki/Hebrew_calendar#Names_of_months Hebrew calendar].<BR>
 
Allow <code>subType="month"</code> to identify named months in the [https://en.wikipedia.org/wiki/Hebrew_calendar#Names_of_months Hebrew calendar].<BR>
 
Allow (e.g.) <code>n="9"</code> as the corresponding month number. [[User:David Haslam|David Haslam]]
 
Allow (e.g.) <code>n="9"</code> as the corresponding month number. [[User:David Haslam|David Haslam]]
  
==== name sex="male" and sex="female" ====
+
===== name sex="male" and sex="female" =====
 
Define new attribute '''sex''' for use in the '''name''' element along with <code>type="person"</code>.
 
Define new attribute '''sex''' for use in the '''name''' element along with <code>type="person"</code>.
  
==== divineName type normal ====
+
==== Attributes for the &lt;divineName&gt; element ====
 +
===== divineName type normal =====
 
There are four places in the KJV where the word '''JEHOVAH''' is all uppercase, but not small-caps. The following markup is desirable for these:
 
There are four places in the KJV where the word '''JEHOVAH''' is all uppercase, but not small-caps. The following markup is desirable for these:
 
  <divineName type="normal">JEHOVAH</divineName>
 
  <divineName type="normal">JEHOVAH</divineName>
 
The locations are: Exodus 6:3, Psalms 83:18, Isaiah 12:2, Isaiah 26:4.
 
The locations are: Exodus 6:3, Psalms 83:18, Isaiah 12:2, Isaiah 26:4.
 +
 +
===== divineName type added=====
 +
There are a few places in the KJV where the '''divineName''' element is found within text marked by the '''transChange''' element! It may help computers to find these if the following markup were to be defined:
 +
<divineName type="added">Lord</divineName>
 +
Aside: ''If it was added by the translators, it was not in the original Hebrew, so by definition, it could not have been the tetragrammaton''.
 +
 
[[User:David Haslam|David Haslam]]
 
[[User:David Haslam|David Haslam]]
  
Line 315: Line 409:
 
===== refrain =====
 
===== refrain =====
 
For Bible modules that do not use poetry line elements, this would provide a means to markup the refrains in such passages as Deut.27-15-25 and Psalm.136
 
For Bible modules that do not use poetry line elements, this would provide a means to markup the refrains in such passages as Deut.27-15-25 and Psalm.136
 +
 +
===== variant =====
 +
The proposed new syntax is:
 +
<seg variant="1">text </seg><seg variant="2">text </seg><seg variant="3">text </seg>
 +
'''Requirements:'''
 +
# The number of variants shouldn't be limited.
 +
# Primary and Secondary reading terminology should be dropped in favour of specified variant names
 +
# The name for each numbered variant should be identified in the module .conf file.
 +
'''Implementation details:'''
 +
# SWORD must cope with variant text that is only part of a word as well as multiple words
 +
# The default should be for SWORD to display variant #1 which assumed to be the base text.
 +
# Front-ends should have a UI option to select which variants should be displayed. See [https://crosswire.org/wiki/Choosing_a_SWORD_program#Module_Support]
 +
# How variants are to be displayed is at the discretion of the front-end developers.
 +
# If two or more variants are displayed simultaneously as in-line text, then suitable delimiters are required, unless (e.g.) colour coding is used to distinguish each variant.
 +
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]])
  
 
==== New hi types ====
 
==== New hi types ====
In addition to these defined type values,
+
In addition to these defined type values<ref>Some of the hi '''type''' values defined in OSIS do not yet have an assigned character style programmed in SWORD.<BR>e.g. <code>type="acrostic"</code> renders as normal text.</ref>,
 
:'''• acrostic • bold • emphasis • illuminated • italic • line-through • normal • small-caps • sub • super • underline'''
 
:'''• acrostic • bold • emphasis • illuminated • italic • line-through • normal • small-caps • sub • super • underline'''
 
it would be useful to add several further types for the '''hi''' element. [[User:David Haslam|David Haslam]]
 
it would be useful to add several further types for the '''hi''' element. [[User:David Haslam|David Haslam]]
 +
 +
===== caps =====
 +
For Bibles (such as the English '''Revised Version''' of 1885) where the printed edition has the first word (or two) of each chapter in uppercase, it would be preferable for the underlying text to be in ordinary sentence case and the first word[s] marked using:
 +
<hi type="caps">...</hi>
 +
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]])
  
 
===== overline =====
 
===== overline =====
Line 338: Line 452:
 
Many printed Bibles use drop-caps for the first letter in a verse<ref>To maintain comptibility with search features, the whole word should be marked, not just the first letter.<BR>The same goes for <code>type="illuminated".</code>. The style sheet or rendering will determine that it applies only to the first letter.</ref>, usually the first verse in each chapter. To reproduce this in electronic editions, a means to implement this presentational format is required.
 
Many printed Bibles use drop-caps for the first letter in a verse<ref>To maintain comptibility with search features, the whole word should be marked, not just the first letter.<BR>The same goes for <code>type="illuminated".</code>. The style sheet or rendering will determine that it applies only to the first letter.</ref>, usually the first verse in each chapter. To reproduce this in electronic editions, a means to implement this presentational format is required.
  
'''Note:'''
+
'''Notes:'''
 
<references/>
 
<references/>
  
 
==== Grain operator @s ====
 
==== Grain operator @s ====
The '''osisRef''' fine grain string<ref>OSIS User Manual (pp.82, 91, 148). It's uncertain whether this OSIS feature is even supported by SWORD.</ref> operator '''@s[''text'']''' works only for a whole word without spaces. It will also find only the first occurrence of the specified word.
+
The '''osisRef''' fine grain string<ref>OSIS User Manual (pp.82, 91, 148). It's uncertain whether this OSIS feature is even supported by SWORD.</ref> operator '''@s[''text'']''' works only for a whole word without spaces.
  
 
It would be useful to expand this operator to facilitate:
 
It would be useful to expand this operator to facilitate:
* text containing spaces, rather than only a single word
+
* text containing spaces, rather than only a single word<ref>Probably impossible due to the manner in which XML handles spaces, rather than OSIS in particular.</ref>
 
* returning the whole string rather than merely a pointer to its first character
 
* returning the whole string rather than merely a pointer to its first character
* text containing punctuation marks<ref>It's uncertain whether it can cope with a string that has an apostrophe, or one that is hyphenated.</ref>
+
* text containing punctuation marks. See [[OSIS 211 CR#Grain operator .40s|above]].
* a method to find further occurrences of the same word after the first, e.g. for the n<sup>th</sup> instance,
+
* a shorter way of specifying a range of consecutive words within the same osisRef<ref>This would avoid having to repeat the full osisRef for the end of the range. e.g.<BR><tt><catchWord osisRef="Lev.23.40@s[boughs]-Lev.23.40@s[trees]">boughs of goodly trees</catchWord></tt>
@s[word]n
+
</ref>, by such as:
* a shorter way of specifying a range of consecutive words within the same osisRef<ref>This would avoid having to repeat the full osisRef for the end of the range.</ref>, by such as:
 
 
  @s[first]-[last]
 
  @s[first]-[last]
* a way of specifing a comma separated series of words within the same osisRef, by such as:
 
@s[third],[fifth],[seventh]
 
 
* to allow a method for the user agent (e.g. SWORD) to process a fine grain string ending with "…", the HORIZONTAL ELLIPSIS (U+2026), by returning the match as to just before the next terminating punctuation mark, or the end of the specified osisRef.
 
* to allow a method for the user agent (e.g. SWORD) to process a fine grain string ending with "…", the HORIZONTAL ELLIPSIS (U+2026), by returning the match as to just before the next terminating punctuation mark, or the end of the specified osisRef.
  
Line 359: Line 470:
 
<references/>
 
<references/>
 
[[User:David Haslam|David Haslam]] 27 January 2016 (MST)
 
[[User:David Haslam|David Haslam]] 27 January 2016 (MST)
 +
Updated [[User:David Haslam|David Haslam]] 29 April 2020 (MST)
 +
Updated [[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 16:36, 4 May 2020 (UTC)
  
 
==== Table cells ====
 
==== Table cells ====
Line 374: Line 487:
  
 
<references/>
 
<references/>
 +
 +
==== Alternate verse number ====
 +
USFM to OSIS converters generally replace '''\va_#\va*''' (and '''\ca_#\ca*''') by a '''milestone''' element. These remain hidden by SWORD. It would make sense to define a suitable OSIS element for these such that SWORD could display them when an appropriate option is enabled and hide them when it's disabled.
 +
 +
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]])
 +
 +
==== New XML elements ====
 +
===== The &lt;number&gt; element =====
 +
Define a '''number''' element to mark all natural numbers found in Scripture. This would be particularly useful in Bibles such as the KJV where numbers are expressed in words.
 +
 +
For the number element, allow <tt>type="cardinal"</tt> and use attribute '''n''' to record the decimal integer value. e.g.
 +
<number type="cardinal" n="80">fourscore</number>
 +
<number type="cardinal" n="153">an hundred and fifty and three</number>
 +
 +
Allow <tt>type="ordinal"</tt> for first, second, third, fourth, etc.
 +
 +
Allow <tt>type="fractional"</tt> e.g. for "one half", "a third part", "a tenth", etc.
 +
 +
This could facilitate programmatic enquiries about numbers in the Bible.
 +
 +
Front-end apps might be enhanced to (e.g.) display a tooltip with the numeric value.
 +
 +
NB. Special provision would be required when the number in text encompasses another word, as in Psalm 90:10
 +
<number type="cardinal" n="70">threescore years and ten</number>
 +
In this example, the word 'years' is not part of the numeral.
 +
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]])
 +
 +
===== The &lt;unit&gt; element =====
 +
Define a '''unit''' element to mark all measurable units found in Scripture. The following '''type''' attributes should be included:
 +
* currency
 +
* length
 +
* area
 +
* volume
 +
* weight
 +
* time
 +
Further thought might be given to having an '''equivalent''' attribute, for units that have approximate equivalents in modern (e.g. [https://en.wikipedia.org/wiki/International_System_of_Units S.I.]) units.
 +
 +
Among other things, the '''number''' element should allow the '''unit''' element to be included.
 +
 +
The sole exceptional case in the previous subsection would then become:
 +
 +
<number type="cardinal" n="70">threescore <unit type="time">years</unit> and ten</number>
 +
 +
Another unusual case is when units are implied by the absence of repetition, as in:
 +
 +
forty cubits long and thirty broad
 +
 +
This might become:
 +
 +
<number type="cardinal" n="40">forty</number> <unit type="length">cubits</unit> long and <number type="cardinal" n="30">thirty</number><unit type="length" subType="x-implied" n="cubits"/> broad
 +
 +
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]])
  
 
==== USFM 3.0 support ====
 
==== USFM 3.0 support ====
Line 388: Line 553:
  
 
=== seg type="benediction" ===
 
=== seg type="benediction" ===
This is mentioned as a suggestion in § '''11.1.4''' but '''benediction''' is not a defined value for the '''type''' attribute of '''seg'''.<BR>These are • '''alluded''' • '''keyword''' • '''otPassage''' • '''verseNumber'''. It should therefore either have the "x-" prefix or be defined as an addition to the schema.
+
This is mentioned as a suggestion in § '''11.1.4''' but '''benediction''' is not a defined value for the '''type''' attribute of '''seg'''.<BR>These are • '''alluded''' • '''keyword''' • '''otPassage''' • '''verseNumber'''. It should therefore either have the "x-" prefix or be defined as an addition to the schema. See [[OSIS 211 CR#benediction|above]].
 +
 
 +
=== Right double quotation mark ===
 +
Some XML code samples have attributes wrapped between two '''double right quotation marks''' (U+201D) rather than ''ordinary'' '''double quotation mark''' (U+0022).
 +
 
 +
Instances found in pages 20, 40, 41, 45, 47, 48, 63, 80, 87.
 +
''Some pages have it multiple times.''
 +
 
 +
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 11:37, 10 December 2017 (MST)
  
 
=== Appendix D.1.1 English Editions (prefix "en:") ===
 
=== Appendix D.1.1 English Editions (prefix "en:") ===
 
* Add further abbreviations for all the English Bibles published since OSIS 2.1.1 was released.
 
* Add further abbreviations for all the English Bibles published since OSIS 2.1.1 was released.
 +
 +
=== <contributor> in <work> ===
 +
If the '''contributor''' element is used within the '''work''' element, it fails to validate if it comes after the '''creator''' element. This is contrary to the example given in '''section 7.1''' on page 23 of the '''OSIS 2.1 User Manual'''.
 +
 +
<work osisWork="EG">
 +
<title>Egyptian Grammar</title>
 +
<creator role="aut">Alan Gardiner</creator>
 +
<contributor role="dte">Francis Llewellyn Griffith</contributor>
 +
<date event="original" type="gregorian">1927</date>
 +
<date event="eversion" type="gregorian">2003</date>
 +
<type type="x-grammar">Grammar</type>
 +
<publisher>Griffith Institute, Ashmolean Museum, Oxford</publisher>
 +
<language type="ISO-639">EN</language>
 +
<language type="Ethnologue">EG-ancient</language>
 +
<identifier type="ISBN">0900416351</identifier>
 +
<identifier type="LCCN">95230980</identifer>
 +
</work>
 +
--[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 13:56, 28 December 2017 (MST)
 +
 +
:The user manual is in error.
 +
 +
:The work element defines a strict order (aka sequence) of optional elements.
 +
 +
<xs:sequence>
 +
  <xs:element name="title" type="titleCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="contributor" type="contributorCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="creator" type="creatorCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="subject" type="subjectCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="date" type="dateCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="description" type="descriptionCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="publisher" type="publisherCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="type" type="typeCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="format" type="formatCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="identifier" type="identifierCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="source" type="sourceCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="language" type="languageCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="relation" type="relationCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="coverage" type="coverageCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="rights" type="rightsCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="scope" type="scopeCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="castList" type="castListCT" minOccurs="0" maxOccurs="unbounded"/>
 +
  <xs:element name="teiHeader" type="teiHeaderCT" minOccurs="0"/>
 +
  <xs:element name="refSystem" type="refSystemCT" minOccurs="0" maxOccurs="unbounded"/>
 +
</xs:sequence>
 +
 +
:--[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 13:58, 28 December 2017 (MST)
 +
 +
=== milestone element ===
 +
Section 12 on page 72 includes the following paragraph:
 +
<blockQuote>
 +
When setting the attribute '''n''' on a '''milestone''', it should indicate the number of the unit starting, not the unit ending. For example, '''<milestone type="page" n="3"/>''' indicates the break between pages 2 and 3, not between pages 3 and 4. Numbering does not need to be unique across various types of milestones -- for example, the 24th line on page 5 of a manuscript may be marked simply n="24", rather than n="5.24" or something similar.
 +
</blockQuote>
 +
However, '''page''' is not a predefined '''type''' attribute value! The example should be '''<milestone type="pb" n="3"/>'''
 +
 +
--[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 02:36, 3 January 2018 (MST)
 +
 +
=== Mistaken example for the w element ===
 +
Section '''13.17. w''' on page 86 has the following example for the use of the '''w''' element.
 +
 +
<word gloss="s:H325>Ahasuerus</word>
 +
 +
This is a mistake. It should read,
 +
 +
<w gloss="s:H325>Ahasuerus</w>
 +
 +
--[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 15:44, 24 March 2020 (UTC)
 +
 +
=== Grain operator @s ===
 +
:''An undocumented feature has just come to light''.
 +
Recent study of the .xsd file for '''OSIS 2.1.1''' has led to the discovery that the following example is valid for the '''osisRef''' fine grain '''@s''' operator.
 +
osisRef="Gen.1.4@s[the][2]"
 +
We are making the assumption that, as the bracketed number defaults to [1] when omitted, that the example given would point to the second occurrence of "the" in verse text, thus:
 +
Genesis 1:4: And God saw the light, that it was good: and God divided '''the''' light from the darkness.
 +
 +
--[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 16:26, 4 May 2020 (UTC)
  
 
== See also ==
 
== See also ==
Line 398: Line 646:
  
 
== External links ==
 
== External links ==
Our friend, [http://eBible.org/ Michael Paul Johnson] maintains his own '''Modified OSIS''' schema. This is used in his [http://haiola.org/ Haila] software.  
+
* Our friend, [http://eBible.org/ Michael Paul Johnson] maintains his own '''Modified OSIS''' schema. This is used in his [http://haiola.org/ Haila] software.  
 +
 
 +
* [https://github.com/OpenScriptureInformationStandard/ OpenScriptureInformationStandard] has created a new home for the OSIS specification on GitHub. There are two public repositories:
 +
# [https://github.com/OpenScriptureInformationStandard/Reference Reference] - The OSIS spec and related reference documents
 +
# [https://github.com/OpenScriptureInformationStandard/Schemas Schemas] - The actual schema versions and doc for OSIS
  
 
[[Category:OSIS]]
 
[[Category:OSIS]]

Latest revision as of 13:32, 14 June 2023

This page is for recording potential change requests to the OSIS 2.1.1 XML schema, and for defining our own updated schema.

Contents

Bible Technologies Group

The BTG that sponsored the OSIS committee and hosted the OSIS schema no longer exists. References that use the domain www.bibletechnologies.net will no longer work. The schema location therefore now needs to be for a local copy on your computer or to a copy hosted by CrossWire or elsewhere[1].

Note:

  1. e.g. It is also mirrored at http://eBible.org/osisCore.2.1.1.xsd

OSIS 2.1.1 Change Requests

Anyone with an outstanding OSIS bug report or feature proposal for consideration for inclusion into an updated OSIS schema, please write a very concise change request here in this page, including motivating use case.

CrossWire updated schema

An an interim measure, we are maintaining an updated validation schema based on the contents of this page.

Currently these are looking for a new home[1] but are currently at:
http://www.crosswire.org/~dmsmith/osis

In that location there are various iterations of the schema:

  • osisCore.2.1.1-orig.xsd (The original schema, with some changes to whitespace).
  • osisCore.2.1.1-cw1.xsd
  • osisCore.2.1.1-cw2.xsd
  • ...
  • osisCore.2.1.1-cwN.xsd (Where N is the highest version number.)
  • osisCore.2.1.1-cw-latest.xsd (The same as osisCore.2.1.1-cwN.xsd)

i.e. The most recent edition will usually be found in the osis directory, with filename osisCore.2.1.1-cw-latest.xsd.

This URL may be used in place of the official BibleTechnologies URL for validating XML files submitted for modules.

Notes:

  1. A repository has been created on our Society's GitLab group.

Bugs

Alpha testing bugs

List bugs in the schema that cause correct OSIS not to validate.

osisGenRegex bug

Currently that regex looks like [1], but it should looks like [2]:

[1]     ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+)
[2]     ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_)+)*:)?([^:\s])+)
                        (missing + right here ^)

So our document with the following element isn't valid because the string "Strong" cannot be more than 1 character long in the current schema: <w morph="robinson:N-NSF" lemma="lemma.Strong:βίβλος">βίβλος</w>

--Osk 19:48, 5 November 2010 (UTC)

milestoned <lg>

Since the <l> element can only occur within an <lg> element, use of milestoned <lg> prevents use of <l> elements (within that <lg>). Since <lg> is milestonable, one would presume that the following snippet would be valid, but it is not, for the above reason:

     <lg sID="eg1"/>
          <l>Poetry line</l>
          <l>Poetry line</l>
     <lg eID="eg1"/>

--Osk 18:18, 31 December 2011 (MST)

The <lg> element does not allow for mixed content. However the use of the milestoned <lg> wrongly allows for it.

   <lg sID="eg2"/>
      text
   <lg eID="eg2"/>

--Dmsmith 16:29, 14 October 2012 (MDT)

<closer> in <verse> container?

According to the OSIS manual (cf. 11.1.3 on p. 58), it should be possible to embed a <closer> element within a <verse> container, but the schema does not allow this. One or the other should be corrected. --Osk 05:56, 6 July 2012 (MDT)

<seg> in <cell>

This was already reported to osis-users, but for the sake of completeness: There's a typo that allows "seq" in <cell> instead of "seg". --Osk 04:29, 22 February 2014 (MST)

Correct the osisRef syntax for a non-verse-keyed OSIS module

The following syntax for such an osisRef link currently works but does not validate to osisCore.2.1.1.xsd

Module:Div1/Div2/Div3

The schema expects a period in place of each solidus. See here.

David Haslam (talk) 16:27, 22 June 2020 (UTC)

Beta testing bugs

List bugs in the schema that allow incorrect OSIS to validate.

rdg

In these lines of the schema:

        <xs:simpleType name="rdgType">
                <xs:union memberTypes="osisRdg attributeExtension xs:string"/>
        </xs:simpleType>
  • osisRdg is a list (alternate, variant).
  • attributeExtension is a regular expression allowing x-….
  • xs:string allows any string expression.

Thus rdg elements with any text value as the type attribute will always validate, even though they should fail for anything other than (alternate, variant, x-userdefined)
David Haslam 07:19, 22 January 2016 (MST)

lineType

Similar to above:

	<xs:simpleType name="lineType">
		<xs:union memberTypes="osisLine attributeExtension xs:string"/>
	</xs:simpleType>

David Haslam 07:24, 22 January 2016 (MST)

lineGroup

Similar to above:

	<xs:simpleType name="lineGroupType">
		<xs:union memberTypes="osisLineGroup attributeExtension xs:string"/>
	</xs:simpleType>

David Haslam 07:24, 22 January 2016 (MST)

Feature requests

OSIS Validation

List OSIS constructs that currently fail to validate, yet which would be better to allow, or vice versa.

Allow <divineName> within <w>

Often a Hebrew word is translated into multiple English words. In the case of the Divine name, the tetragrammaton, there are frequent "of the LORD", "to the LORD", "the LORD", .... In OSIS these would properly be represented as: <w lemma="strong:H03068">the <divineName>Lord</divineName></w>. To get around this short-coming a hack has to be employed where an element that allows <divineName> is allowed to be in <w>. <seg> is allowed in <w> and allows <divineName> within it: <w>the <seg><divineName>Lord</divineName></seg></w>.

Note: the most recent release of the KJV assumes that this has been fixed. --Dmsmith 07:20, 23 February 2014 (MST)

This change would wrongly permit to have more than one divineName element within a w element. Though there are no circumstances where this would ever arise in a genuine Bible translation, it's still a risk that we now have this loophole in the schema. David Haslam (talk) 01:47, 25 May 2017 (MDT)
Even so, the oirginal workaround with the seg element could also have permitted having more than one divineName element within a w element. The risk is not new! David Haslam (talk) 11:39, 25 May 2017 (MDT)

Allow <divineName> within <name>

The OSIS generated by usfm2osis.py for beibl.net files provided one example [1], viz.

<verse sID="Num.21.14" osisID="Num.21.14"/>Mae
<name type="x-workTitle">Llyfr Rhyfeloedd yr
<divineName>ARGLWYDD</divineName>
</name> yn cyfeirio at y lle fel yma:

A similar hack is required using <seg>, viz.

<verse sID="Num.21.14" osisID="Num.21.14"/>Mae
<name type="x-workTitle">Llyfr Rhyfeloedd yr
<seg><divineName>ARGLWYDD</divineName></seg>
</name> yn cyfeirio at y lle fel yma:

This ought also to apply for any other element that allows seg but not divineName.

Allow <transChange> within <w>

An encoder ought to be allowed to put <transChange> on elements smaller than an orthographic word. If I'm translating an instance of "λόγος", but for some reason I believe that I should translate it as "words", I ought to be able to encode <w>word<transChange>s</transChange></w>. --Osk 19:48, 5 November 2010 (UTC)

Add an element for morphology within <w>

Necessary for encoding documents like MORPH (WLC + morphology), we need an element to embed within <w> to carry lexical information. I suggest calling it <m> and giving it all of the attributes found on <w>. --Osk 19:48, 5 November 2010 (UTC)

Allow <transChange> within <hi>

A highlighted sentence or part of a sentence is a unit, including any transChange parts of it. At the moment a highlighted sentence with a transChange will look like this:

<hi type="bold"> Texttexttext </hi><transChange><hi type="bold"> moreText</hi></transChange><hi type="bold"> TextText</hi>
<hi type="bold"> Texttexttext <transChange>moreText</transChange> TextText</hi>

This would look cleaner and would be also closer to what is meant. refdoc:talk 16:02, 3 August 2011 (MDT)

Allow <catchWord> within <hi>

A highlighted sentence or part of a sentence is a unit, including any catchWord parts of it. At the moment a highlighted sentence with a catchWord will look like this:

<hi type="bold"> Texttexttext </hi><catchWord><hi type="bold"> moreText</hi></catchWord><hi type="bold"> TextText</hi>
<hi type="bold"> Texttexttext <catchWord>moreText</catchWord> TextText</hi>

This is identical in form to the <transChange> issue. The problem with both of these is that <transChange> and <catchWord> may reasonably be styled in the same fashion as what is indicated by <hi>. --Dmsmith 16:58, 14 October 2012 (MDT)

The OSIS User Manual recommends that text within a catchWord element be rendered as bold italics. David Haslam (talk)

Allow multiple types for <hi>

It'd really be convenient for

<hi type="bold italic small-caps">text</hi>

rather than

<hi type="bold"><hi type="italic"><hi type="small-caps">text</hi></hi></hi>

--Dmsmith 16:57, 14 October 2012 (MDT)

Allow <hi> within <title>

There are some languages for which the earlier orthography used an italicised N (both cases) as a separate letter of the alphabet.
Example: Old Pohnpeian. Allowing <hi type="italics">n</h> within the text of a title element would obviate the need to use the seg element as a workaround.

David Haslam 13:55, 15 January 2016 (MST)

The use of italics to mark a single character within a word must interfere with the the search function of SWORD and JSword. It would have been better if the Old Pohnpeian alphabet had used a separate character such as Ñ (ñ). In the modern orthography, the digraph ng is used for this consonant. David Haslam (talk) 11:27, 14 February 2016 (MST)

Allow <transChange> within <note>

When translating an alternate Greek version of a passage, added words need to be indicated.

Note: the most recent release of the KJV assumes that this has been fixed. --Dmsmith 07:22, 23 February 2014 (MST)

Allow <transChange> within <inscription>

Some translations of Rev.17.5 may require this.

<verse osisID="Rev.17.5">in na njenem čelu <transChange type="added">je bilo</transChange> napisano ime: <inscription>SKRIVNOST, VÉLIKA <transChange type="added">[METROPOLA]</transChange> BABILON, MATI POCESTNIC<note type="study">POCESTNIC: ali, PREŠUŠTEV</note> IN OGABNOSTI ZEMLJE</inscription>.</verse>

It can be worked around using the usual seg kludge. David Haslam (talk) 22:13, 3 January 2019 (UTC)

Allow <hi> within <abbr>

To restrict the highlighting to letters and exclude punctuation marks, the abbr element should allow the hi element. This avoids having to use a seg hack to achieve the required markup:

<abbr expansion="Psalm"><hi type="spaced-letters">PSAL</hi>.</abbr>

would become possible, and obviates the need to treat any characters different to others as the engine renders the special higlighting.

David Haslam

Allow <name> within <name>

There is a requirement to do this for various multi-word names, e.g.

<name type="person" regular="Jesus">Jesus of <name type="geographic">Nazareth</name></name>
<name type="person" regular="Doeg">Doeg the <name type="ethnic">Edomite</name></name>
<name type="geographic" regular="Cana">Cana in <name type="geographic">Galilee</name></name>
<name type="person" regular="Pilate">Pontius <name type="person">Pilate</name></name>

Currently, a hack using the seg element has to be used. David Haslam (talk)

Allow remote header reference

When serving short passages via web services, as valid OSIS documents, a full header is obtrusive. Also, in a collection of related documents, for example separate book files for a Bible, one centralized header would be more maintainable. The simplest approach would probably be to allow @href on the header element, to abstract some or all of the header content. See Troy's related post.

Allow shadow/virtual elements

A second requirement for distributing valid OSIS fragments through web services is a form of virtual, or shadow, element to supply the context of the given fragment. A new global attribute for indicating this virtual status is essential to distinguish them from the actual markup of the document. In the ESV API, they have this construct via `virtual` attribute (see description for `include-virtual-attributes``). See Troy's related post (same as previous).

OSIS variants

Currently, SWORD supports only 2 variants in the main text, and uses the following syntax:

<seg type="x-variant" subType="x-1">text </seg><seg type="x-variant" subType="x-2">text </seg>

Variants shouldn't have to rely upon user-defined attribute values like this. It would be better to have a new variant attribute for the seg element whose value can be the variant number. See below.

Grain operator @s

Currently, OSIS deems as invalid[1] a grain operator string that has a typographical apostrophe, or one that is hyphenated, whether by a hyphen or an endash as in the KJV module, e.g.

<catchWord osisRef="Gen.16.14@s[Beer–lahai–roi]">Beer–lahai–roi</catchWord>

Other likely punctuation marks (in Latin scripts at least) include period, comma, semicolon, colon, parentheses & the horizontal ellipsis between words.

A possible solution would be to extend the OSIS schema to permit the use of XML numerical character entities[2] within an osisRef fine grain string.[3][4][5]

The example would then become:

<catchWord osisRef="Gen.16.14@s[Beer&#x2013;lahai&#x2013;roi]">Beer–lahai–roi</catchWord>

Notes:

  1. The string must first also match the following regular expression in "osisCore.2.1.1-cw-latest.xsd" called osisGenRegex:
    <xs:pattern value="((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_)+)*:)?([^:\s])+)"/>
    Essentially, this allows only numbers and letters or a low line, so it also excludes diacritics as separate Unicode characters.
  2. These are already valid syntax within XML attributes.
  3. We would need to ensure that the module build tools in SWORD utilities do not automatically replace each such entity by the character.
  4. The SWORD API would need to decode such entities before passing the string to the search function.
  5. This would, in theory, also also facilitate the inclusion of any number of spaces within the string.

osisID for title note

The following is invalid:

<note type="study" osisRef="Ps.4.1!title" osisID="Ps.4.1!title!note.a" n="a">

cf. In the KJV module, 47 of the 116 canonical Psalm titles have such a note.

The recommended solution is to replace the second ! by a period, e.g.

<note type="study" osisRef="Ps.4.1!title" osisID="Ps.4.1!title.note.a" n="a">

One can have any number of periods as a word separator in this part of an osisID attribute value.

Allow osisRef attribute in transChange

This would permit the following in (e.g.) 2Sam.23.8:

<transChange type="added" subType="x-copied-from" osisRef="1Chr.11.11">he lift up his spear</transChange>

cf. The words "he lift up his spear" was copied by the KJV translators from the parallel verse in 1 Chronicles.

David Haslam (talk) 13:32, 10 May 2020 (UTC)

Disallow certain self-closing elements

Note element

The following anomaly was discovered almost inadvertently while testing module HunUj from CrossWire Beta.

$$$Genesis 32:2
Jákób is útnak indult, és találkoztak vele Isten angyalai.<note n="a" osisID="Gen.32.2!crossReference.a" osisRef="Gen.32.2" type="crossReference"/>

The note element was semantically incorrect, having no content on account of it being self-closing. The mistake was not detected during validation to the OSIS schema. It's currently valid in OSIS 2.1.1 and certainly passes XML syntax check.

The module bug was reported to the developer and has already been fixed in the source text.

David Haslam (talk) 18:11, 20 June 2020 (UTC)

Other elements

By the same token, we should also disallow the following self-closing elements with no text:

 <abbr />
 <caption />
 <catchWord />
 <closer />
 <divineName />
 <foreign />
 <inscription />
 <list />
 <q />
 <rdg />
 <reference />
 <salute />
 <seg />
 <speaker />
 <speech />
 <table />
 <title />
 <transChange />
 <w />
  • This list is not exhaustive.
  • The rule should apply irrespective of any such an element having attributes.

David Haslam (talk) 16:21, 23 June 2020 (UTC)

New Features

List new features or extensions to existing features here.

Biblical Hebrew

Add further <hi> types to support Biblical Hebrew

The Masoretic Text includes some words whose characters have a different style than the main text. These three styles use "large", "small" and "suspended" letters.[1]
MT scholars would find it beneficial if these special text styles could be properly represented in OSIS XML (and rendered as such in modules).

Provide type attribute values to support small, large and suspended Hebrew glyphs.
This would enable more accurate display of these orthographic peculiarities found in the Tanakh.
Biblical Hebrew is an area where the usual priority of semantic markup over presentational markup cannot be taken for granted.
David Haslam

These new hi types should be implemented in a way that retains the compatibility with search features. A whole word should be wrapped, with the letters to be rendered specified by means of a further attribute value.

Note:

  1. See https://www.win.tue.nl/~aeb/natlang/hebrew/hebrew_bible.html
Improve Ketiv/Qere markup in Biblical Hebrew

See https://en.wikipedia.org/wiki/Qere_and_Ketiv

A ketiv or qere can consist of one or more words, and so need to be grouped and related to one another. I propose adding <ketiv> with @id, and <qere> with @idref, to contain the content (<w> elements) and allow validation of the connection. A qere with no ketiv could be marked up without the @idref.

This sounds like a good application for <seg>. I would recommend named types for <seg> instead: ketiv & qere. --Osk 00:37, 23 February 2014 (MST)
<seg type="qere">...</seg> and <seg type="ketiv">...</seg> is the change request.
<seg type="x-qere">...</seg> and <seg type="x-ketiv">...</seg> could be used interim.

Or we could adapt the new proposal for OSIS variants. David Haslam (talk)

Add peripheral types from USFM to osisDivs

Add the additional USFM peripheral types to osisDivs to maintain feature parity. I believe OSIS 2.1.1 had this feature parity at the time of its release, but USFM has standardized additional peripheral types since then, which should be added as: halfTitlePage, promotionalPage, foreword, alphabeticalContents, tableofAbbreviations, chronology, weightsandMeasures, mapIndex, ntQuotesfromLXX, spine --Osk 01:04, 23 February 2014 (MST)

Calendar types

Add the following calendar system:

  • type="Ethiopian"

May be required as and when we support Bibles & Commentaries for the Ethiopian Orthodox Church. David Haslam (talk)

Quotation types

From the manual (p. 43): "The rendering for quotations marks after an interruption, for example, can be distinguished using the type attribute on this element, with values such as initial, medial, and final." Please make these @type values official: initial, medial, and final.

Milestonable <p>

For documents where the primary structure is book, chapter, verse, like the Authorized Version or the Hebrew Bible, we should be able to mark up paragraphs as milestones. This would allow for equality, rather than making book, section, paragraph a privileged system.

Improve Selah markup

Selah can be represented at the end of a line. The markup of <l type="selah">...</l> does not allow for the text identified as selah to be at the end of the current line. Maybe allow for a separate markup, rather than a type of line.

But see also http://www.crosswire.org/tracker/browse/MODTOOLS-84 David Haslam (talk)

title subType

Add the following attributes for use along with type="chapter" in the title element.

subType="chapterDescription"
subType="chapterLabel"

The former would faciliate SWORD to be extended to show chapter descriptions in italics and normal font size or smaller.
The latter would faciltate SWORD to be extended to display the module chapter labels instead of the normal chapter labels programmed in the front-end.

Currently, these are typically done using "x-" prefix in the attribute value, without any SWORD support. David Haslam (talk)

New attributes for the <name> element

name type="ethnic", etc

Allow type="ethnic" as an attribute of the name element to identify ethnic names, etc.
Allow subType="people-group" as an attribute of the name element to identify tribes, etc.
Allow subType="people-group-member" as an attribute of the name element to identify an individual member of a tribe, etc.
Both these subType values may be used together with either type="ethnic" or type="geographic". David Haslam (talk)

name type="book"

Allow type="book" as an attribute of the name element to identify Non-canonical books referenced in the Bible.

name type="calendric"

Allow type="calendric" as an attribute of the name element to identify calendar objects.
Allow subType="month" to identify named months in the Hebrew calendar.
Allow (e.g.) n="9" as the corresponding month number. David Haslam

name sex="male" and sex="female"

Define new attribute sex for use in the name element along with type="person".

Attributes for the <divineName> element

divineName type normal

There are four places in the KJV where the word JEHOVAH is all uppercase, but not small-caps. The following markup is desirable for these:

<divineName type="normal">JEHOVAH</divineName>

The locations are: Exodus 6:3, Psalms 83:18, Isaiah 12:2, Isaiah 26:4.

divineName type added

There are a few places in the KJV where the divineName element is found within text marked by the transChange element! It may help computers to find these if the following markup were to be defined:

<divineName type="added">Lord</divineName>

Aside: If it was added by the translators, it was not in the original Hebrew, so by definition, it could not have been the tetragrammaton.

David Haslam

New seg types

In addition to these defined type values,

• alluded • keyword • otPassage • verseNumber

it would be useful to add several further types for the seg element. David Haslam

benediction

This is already documented in the OSIS Reference § 11.1.4. Benedictions but is not listed in § 13.14. seg.

refrain

For Bible modules that do not use poetry line elements, this would provide a means to markup the refrains in such passages as Deut.27-15-25 and Psalm.136

variant

The proposed new syntax is:

<seg variant="1">text </seg><seg variant="2">text </seg><seg variant="3">text </seg>

Requirements:

  1. The number of variants shouldn't be limited.
  2. Primary and Secondary reading terminology should be dropped in favour of specified variant names
  3. The name for each numbered variant should be identified in the module .conf file.

Implementation details:

  1. SWORD must cope with variant text that is only part of a word as well as multiple words
  2. The default should be for SWORD to display variant #1 which assumed to be the base text.
  3. Front-ends should have a UI option to select which variants should be displayed. See [2]
  4. How variants are to be displayed is at the discretion of the front-end developers.
  5. If two or more variants are displayed simultaneously as in-line text, then suitable delimiters are required, unless (e.g.) colour coding is used to distinguish each variant.

David Haslam (talk)

New hi types

In addition to these defined type values[1],

• acrostic • bold • emphasis • illuminated • italic • line-through • normal • small-caps • sub • super • underline

it would be useful to add several further types for the hi element. David Haslam

caps

For Bibles (such as the English Revised Version of 1885) where the printed edition has the first word (or two) of each chapter in uppercase, it would be preferable for the underlying text to be in ordinary sentence case and the first word[s] marked using:

<hi type="caps">...</hi>

David Haslam (talk)

overline

SWORD already supports type="overline" for the hi element, despite it not being defined in the schema before.

dotted-underline

Dotted underline is sometimes used in Chinese ideographic script to highlight certain words. Should we provide for this in OSIS?

dashed-underline

This is similar to dotted underline, but the line is dashed rather than dotted.

spaced-letters

Many of the book titles in the Blayney edition contain words in which the letters are spaced. e.g.
The R E V E L A T I O N of S. J O H N the Divine.
It's desirable to have a new highlight type for these, e.g.

<hi type="spaced-letters">REVELATION</hi>

In this way the highlighted text will be semantically still be the same word, even though it is displayed differently. As and when this is implemented by SWORD, the letter spacing should be done by intelligent rendering rather than by inserting spaces. This would be especially important for front-end apps that feature text to speech for selected text.

drop-caps

Many printed Bibles use drop-caps for the first letter in a verse[2], usually the first verse in each chapter. To reproduce this in electronic editions, a means to implement this presentational format is required.

Notes:

  1. Some of the hi type values defined in OSIS do not yet have an assigned character style programmed in SWORD.
    e.g. type="acrostic" renders as normal text.
  2. To maintain comptibility with search features, the whole word should be marked, not just the first letter.
    The same goes for type="illuminated".. The style sheet or rendering will determine that it applies only to the first letter.

Grain operator @s

The osisRef fine grain string[1] operator @s[text] works only for a whole word without spaces.

It would be useful to expand this operator to facilitate:

  • text containing spaces, rather than only a single word[2]
  • returning the whole string rather than merely a pointer to its first character
  • text containing punctuation marks. See above.
  • a shorter way of specifying a range of consecutive words within the same osisRef[3], by such as:
@s[first]-[last]
  • to allow a method for the user agent (e.g. SWORD) to process a fine grain string ending with "…", the HORIZONTAL ELLIPSIS (U+2026), by returning the match as to just before the next terminating punctuation mark, or the end of the specified osisRef.

Note:

  1. OSIS User Manual (pp.82, 91, 148). It's uncertain whether this OSIS feature is even supported by SWORD.
  2. Probably impossible due to the manner in which XML handles spaces, rather than OSIS in particular.
  3. This would avoid having to repeat the full osisRef for the end of the range. e.g.
    <catchWord osisRef="Lev.23.40@s[boughs]-Lev.23.40@s[trees]">boughs of goodly trees</catchWord>

David Haslam 27 January 2016 (MST) Updated David Haslam 29 April 2020 (MST) Updated David Haslam (talk) 16:36, 4 May 2020 (UTC)

Table cells

The cell element should have the attributes rows and cols to specify the spanning of a cell horizontally and vertically. Using subType is insufficient to communicate to values. Dmsmith

USFM 3.0 defines updated syntax for column spanning in table rows. David Haslam (talk)

Pronunciation help

Although mentioned briefly in the OSIS 2.1.1 User Manual, there is no defined element in OSIS for pronunciation help. Such a feature would be useful for all the proper names in the Bible, many of which are unusual, especially in terms of how a speaker of the translation language might attempt to pronounce them.

Although proper names are a good reason for having an OSIS extension, pronunciation is essentially a word level requirement. I would therefore propose that the OSIS w element be enhanced by the following attribute:

phonetic

A phonetic attribute would necessitate defining which system of phonetic notation[1][2] is to be used in the work. This needs to be defined under the work element in the OSIS header.

  1. e.g. IPA, Arpabet, etc.
  2. See also the BBC Text Spelling Guide

Alternate verse number

USFM to OSIS converters generally replace \va_#\va* (and \ca_#\ca*) by a milestone element. These remain hidden by SWORD. It would make sense to define a suitable OSIS element for these such that SWORD could display them when an appropriate option is enabled and hide them when it's disabled.

David Haslam (talk)

New XML elements

The <number> element

Define a number element to mark all natural numbers found in Scripture. This would be particularly useful in Bibles such as the KJV where numbers are expressed in words.

For the number element, allow type="cardinal" and use attribute n to record the decimal integer value. e.g.

<number type="cardinal" n="80">fourscore</number>
<number type="cardinal" n="153">an hundred and fifty and three</number>

Allow type="ordinal" for first, second, third, fourth, etc.

Allow type="fractional" e.g. for "one half", "a third part", "a tenth", etc.

This could facilitate programmatic enquiries about numbers in the Bible.

Front-end apps might be enhanced to (e.g.) display a tooltip with the numeric value.

NB. Special provision would be required when the number in text encompasses another word, as in Psalm 90:10

<number type="cardinal" n="70">threescore years and ten</number>

In this example, the word 'years' is not part of the numeral. David Haslam (talk)

The <unit> element

Define a unit element to mark all measurable units found in Scripture. The following type attributes should be included:

  • currency
  • length
  • area
  • volume
  • weight
  • time

Further thought might be given to having an equivalent attribute, for units that have approximate equivalents in modern (e.g. S.I.) units.

Among other things, the number element should allow the unit element to be included.

The sole exceptional case in the previous subsection would then become:

<number type="cardinal" n="70">threescore <unit type="time">years</unit> and ten</number>

Another unusual case is when units are implied by the absence of repetition, as in:

forty cubits long and thirty broad

This might become:

<number type="cardinal" n="40">forty</number> <unit type="length">cubits</unit> long and <number type="cardinal" n="30">thirty</number><unit type="length" subType="x-implied" n="cubits"/> broad

David Haslam (talk)

USFM 3.0 support

There are several new markers in USFM 3.0 as well as some syntax changes. OSIS needs to be enhanced to support suitable equivalents of these new and updated features.

OSIS User Manual (bugs & feature requests)

List here any errors in the OSIS User Manual and any omissions that need rectifiying.

head element

The OSIS manual give the head element as a means of providing for titles. It is not in the schema as a child of div, but it is in the manual.

divineName element

Manual gives type="x-yhwh" in § 11.5.1.2 but it's unnecessary. It also has the content as LORD, but it should be Lord.

seg type="benediction"

This is mentioned as a suggestion in § 11.1.4 but benediction is not a defined value for the type attribute of seg.
These are • alludedkeywordotPassageverseNumber. It should therefore either have the "x-" prefix or be defined as an addition to the schema. See above.

Right double quotation mark

Some XML code samples have attributes wrapped between two double right quotation marks (U+201D) rather than ordinary double quotation mark (U+0022).

Instances found in pages 20, 40, 41, 45, 47, 48, 63, 80, 87. Some pages have it multiple times.

David Haslam (talk) 11:37, 10 December 2017 (MST)

Appendix D.1.1 English Editions (prefix "en:")

  • Add further abbreviations for all the English Bibles published since OSIS 2.1.1 was released.

<contributor> in <work>

If the contributor element is used within the work element, it fails to validate if it comes after the creator element. This is contrary to the example given in section 7.1 on page 23 of the OSIS 2.1 User Manual.

<work osisWork="EG">
<title>Egyptian Grammar</title>
<creator role="aut">Alan Gardiner</creator>
<contributor role="dte">Francis Llewellyn Griffith</contributor>
<date event="original" type="gregorian">1927</date>
<date event="eversion" type="gregorian">2003</date>
<type type="x-grammar">Grammar</type>
<publisher>Griffith Institute, Ashmolean Museum, Oxford</publisher>
<language type="ISO-639">EN</language>
<language type="Ethnologue">EG-ancient</language>
<identifier type="ISBN">0900416351</identifier>
<identifier type="LCCN">95230980</identifer>
</work>

--David Haslam (talk) 13:56, 28 December 2017 (MST)

The user manual is in error.
The work element defines a strict order (aka sequence) of optional elements.
<xs:sequence>
 <xs:element name="title" type="titleCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="contributor" type="contributorCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="creator" type="creatorCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="subject" type="subjectCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="date" type="dateCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="description" type="descriptionCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="publisher" type="publisherCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="type" type="typeCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="format" type="formatCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="identifier" type="identifierCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="source" type="sourceCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="language" type="languageCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="relation" type="relationCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="coverage" type="coverageCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="rights" type="rightsCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="scope" type="scopeCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="castList" type="castListCT" minOccurs="0" maxOccurs="unbounded"/>
 <xs:element name="teiHeader" type="teiHeaderCT" minOccurs="0"/>
 <xs:element name="refSystem" type="refSystemCT" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
--David Haslam (talk) 13:58, 28 December 2017 (MST)

milestone element

Section 12 on page 72 includes the following paragraph:

When setting the attribute n on a milestone, it should indicate the number of the unit starting, not the unit ending. For example, <milestone type="page" n="3"/> indicates the break between pages 2 and 3, not between pages 3 and 4. Numbering does not need to be unique across various types of milestones -- for example, the 24th line on page 5 of a manuscript may be marked simply n="24", rather than n="5.24" or something similar.

However, page is not a predefined type attribute value! The example should be <milestone type="pb" n="3"/>

--David Haslam (talk) 02:36, 3 January 2018 (MST)

Mistaken example for the w element

Section 13.17. w on page 86 has the following example for the use of the w element.

<word gloss="s:H325>Ahasuerus</word>

This is a mistake. It should read,

<w gloss="s:H325>Ahasuerus</w>

--David Haslam (talk) 15:44, 24 March 2020 (UTC)

Grain operator @s

An undocumented feature has just come to light.

Recent study of the .xsd file for OSIS 2.1.1 has led to the discovery that the following example is valid for the osisRef fine grain @s operator.

osisRef="Gen.1.4@s[the][2]"

We are making the assumption that, as the bracketed number defaults to [1] when omitted, that the example given would point to the second occurrence of "the" in verse text, thus:

Genesis 1:4: And God saw the light, that it was good: and God divided the light from the darkness.

--David Haslam (talk) 16:26, 4 May 2020 (UTC)

See also

External links

  1. Reference - The OSIS spec and related reference documents
  2. Schemas - The actual schema versions and doc for OSIS