OSIS 211 CR

From CrossWire Bible Society
Revision as of 20:28, 2 January 2015 by David Haslam (talk | contribs) (Allow within: The OSIS generated by usfm2osis.py for beibl.net files provided one example, viz....)

Jump to: navigation, search

This page is for recording potential change requests to the OSIS XML schema.

OSIS 2.1.1 Change Requests

Anyone with an outstanding OSIS bug report or feature proposal for consideration for inclusion into an updated OSIS schema, please write a very concise change request here in this page, including motivating use case.

Bugs

osisGenRegex bug

Currently that regex looks like [1], but it should looks like [2]:

[1]     ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+)
[2]     ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_)+)*:)?([^:\s])+)
                        (missing + right here ^)

So our document with the following element isn't valid because the string "Strong" cannot be more than 1 character long in the current schema: <w morph="robinson:N-NSF" lemma="lemma.Strong:βίβλος">βίβλος</w>

--Osk 19:48, 5 November 2010 (UTC)

milestoned <lg>

Since the <l> element can only occur within an <lg> element, use of milestoned <lg> prevents use of <l> elements (within that <lg>). Since <lg> is milestonable, one would presume that the following snippet would be valid, but it is not, for the above reason:

     <lg sID="eg1"/>
          <l>Poetry line</l>
          <l>Poetry line</l>
     <lg eID="eg1"/>

--Osk 18:18, 31 December 2011 (MST)

The <lg> element does not allow for mixed content. However the use of the milestoned <lg> wrongly allows for it.

   <lg sID="eg2"/>
      text
   <lg eID="eg2"/>

--Dmsmith 16:29, 14 October 2012 (MDT)

<closer> in <verse> container?

According to the OSIS manual (cf. 11.1.3 on p. 58), it should be possible to embed a <closer> element within a <verse> container, but the schema does not allow this. One or the other should be corrected. --Osk 05:56, 6 July 2012 (MDT)

<seg> in <cell>

This was already reported to osis-users, but for the sake of completeness: There's a typo that allows "seq" in <cell> instead of "seg". --Osk 04:29, 22 February 2014 (MST)

Feature requests

Allow <divineName> within <w>

Often a Hebrew word is translated into multiple English words. In the case of the Divine name, the tetragrammaton, there are frequent "of the LORD", "to the LORD", "the LORD", .... In OSIS these would properly be represented as: <w lemma="strong:H03068">the <divineName>Lord</divineName></w>. To get around this short-coming a hack has to be employed where an element that allows <divineName> is allowed to be in <w>. <seg> is allowed in <w> and allows <divineName> within it: <w>the <seg><divineName>Lord</divineName></seg></w>.

Note: the most recent release of the KJV assumes that this has been fixed. --Dmsmith 07:20, 23 February 2014 (MST)

Allow <divineName> within <name>

The OSIS generated by usfm2osis.py for beibl.net files provided one example, viz.

<verse sID="Num.21.14" osisID="Num.21.14"/>Mae
<name type="x-workTitle">Llyfr Rhyfeloedd yr
<divineName>ARGLWYDD</divineName>
</name> yn cyfeirio at y lle fel yma:

A similar hack is required using <seg>, viz.

<verse sID="Num.21.14" osisID="Num.21.14"/>Mae
<name type="x-workTitle">Llyfr Rhyfeloedd yr
<seg><divineName>ARGLWYDD</divineName></seg>
</name> yn cyfeirio at y lle fel yma:

Allow <transChange> within <w>

An encoder ought to be allowed to put <transChange> on elements smaller than an orthographic word. If I'm translating an instance of "λόγος", but for some reason I believe that I should translate it as "words", I ought to be able to encode <w>word<transChange>s</transChange></w>. --Osk 19:48, 5 November 2010 (UTC)

Add an element for morphology within <w>

Necessary for encoding documents like MORPH (WLC + morphology), we need an element to embed within <w> to carry lexical information. I suggest calling it <m> and giving it all of the attributes found on <w>. --Osk 19:48, 5 November 2010 (UTC)

Allow <transChange> within <hi>

A highlighted sentence or part of a sentence is a unit, including any transChange parts of it. At the moment a highlighted sentence with a transChange will look like this:

<hi type="bold"> Texttexttext </hi><transChange><hi type="bold"> moreText</hi></transChange><hi type="bold"> TextText</hi>
<hi type="bold"> Texttexttext <transChange>moreText</transChange> TextText</hi>

This would look cleaner and would be also closer to what is meant. refdoc:talk 16:02, 3 August 2011 (MDT)

Allow <catchWord> within <hi>

A highlighted sentence or part of a sentence is a unit, including any catchWord parts of it. At the moment a highlighted sentence with a catchWord will look like this:

<hi type="bold"> Texttexttext </hi><catchWord><hi type="bold"> moreText</hi></catchWord><hi type="bold"> TextText</hi>
<hi type="bold"> Texttexttext <catchWord>moreText</catchWord> TextText</hi>

This is identical in form to the <transChange> issue. The problem with both of these is that <transChange> and <catchWord> may reasonably be styled in the same fashion as what is indicated by <hi>. --Dmsmith 16:58, 14 October 2012 (MDT)

Allow multiple types for <hi>

It'd really be convenient for

<hi type="bold italic small-caps">text</hi>

rather than

<hi type="bold"><hi type="italic"><hi type="small-caps">text</hi></hi></hi>

--Dmsmith 16:57, 14 October 2012 (MDT)

Allow <transChange> within <note>

When translating an alternate Greek version of a passage, added words need to be indicated.

Note: the most recent release of the KJV assumes that this has been fixed. --Dmsmith 07:22, 23 February 2014 (MST)

Allow remote header reference

When serving short passages via web services, as valid OSIS documents, a full header is obtrusive. Also, in a collection of related documents, for example separate book files for a Bible, one centralized header would be more maintainable. The simplest approach would probably be to allow @href on the header element, to abstract some or all of the header content. See Troy's related post.

Allow shadow/virtual elements

A second requirement for distributing valid OSIS fragments through web services is a form of virtual, or shadow, element to supply the context of the given fragment. A new global attribute for indicating this virtual status is essential to distinguish them from the actual markup of the document. In the ESV API, they have this construct via `virtual` attribute (see description for `include-virtual-attributes``). See Troy's related post (same as previous).

Improve Ketiv/Qere markup

A ketiv or qere can consist of one or more words, and so need to be grouped and related to one another. I propose adding <ketiv> with @id, and <qere> with @idref, to contain the content (<w> elements) and allow validation of the connection. A qere with no ketiv could be marked up without the @idref.

This sounds like a good application for <seg>. I would recommend named types for <seg> instead: ketiv & qere. --Osk 00:37, 23 February 2014 (MST)

Improve Selah markup

Selah can be represented at the end of a line. The markup of <l type="selah">...</l> does not allow for the text identified as selah to be at the end of the current line. Maybe allow for a separate markup, rather than a type of line.

Milestonable <p>

For documents where the primary structure is book, chapter, verse, like the Authorized Version or the Hebrew Bible, we should be able to mark up paragraphs as milestones. This would allow for equality, rather than making book, section, paragraph a privileged system.

Quotation types

From the manual (p. 43): "The rendering for quotations marks after an interruption, for example, can be distinguished using the type attribute on this element, with values such as initial, medial, and final." Please make these @type values official: initial, medial, and final.

Add peripheral types from USFM to osisDivs

Add the additional USFM peripheral types to osisDivs to maintain feature parity. I believe OSIS 2.1.1 had this feature parity at the time of its release, but USFM has standardized additional peripheral types since then, which should be added as: halfTitlePage, promotionalPage, foreword, alphabeticalContents, tableofAbbreviations, chronology, weightsandMeasures, mapIndex, ntQuotesfromLXX, spine --Osk 01:04, 23 February 2014 (MST)

Manual bugs & feature requests

Head Elements

The OSIS manual give the "head" element as a means of providing for titles. It is not in the schema as a child of div, but it is in the manual.

DivineName Element

Manual give type as x-yhwh in 11.5.1.2, perhaps elsewhere. It isn't necessary. It also has the content as LORD, but it should be Lord.