Difference between revisions of "OSIS 211 CR"
David Haslam (talk | contribs) m (→Quotation types: bold) |
David Haslam (talk | contribs) m (→Fine grain operator @s: Grain) |
||
Line 285: | Line 285: | ||
<references/> | <references/> | ||
− | ==== | + | ==== Grain operator @s ==== |
The '''osisRef''' fine grain string<ref>OSIS User Manual (pp.82, 91, 148). It's uncertain whether this OSIS feature is even supported by SWORD.</ref> operator '''@s[''text'']''' works only for a whole word without spaces. It will also find only the first occurrence of the specified word. | The '''osisRef''' fine grain string<ref>OSIS User Manual (pp.82, 91, 148). It's uncertain whether this OSIS feature is even supported by SWORD.</ref> operator '''@s[''text'']''' works only for a whole word without spaces. It will also find only the first occurrence of the specified word. | ||
Revision as of 14:05, 19 February 2016
This page is for recording potential change requests to the OSIS XML schema.
Contents
- 1 OSIS 2.1.1 Change Requests
- 2 CrossWire updated schema
- 3 Bugs
- 4 Feature requests
- 4.1 OSIS Validation
- 4.1.1 Allow <divineName> within <w>
- 4.1.2 Allow <divineName> within <name>
- 4.1.3 Allow <transChange> within <w>
- 4.1.4 Add an element for morphology within <w>
- 4.1.5 Allow <transChange> within <hi>
- 4.1.6 Allow <catchWord> within <hi>
- 4.1.7 Allow multiple types for <hi>
- 4.1.8 Allow <hi> within <title>
- 4.1.9 Allow <transChange> within <note>
- 4.1.10 Allow <hi> within <abbr>
- 4.1.11 Allow remote header reference
- 4.1.12 Allow shadow/virtual elements
- 4.2 New Features
- 4.1 OSIS Validation
- 5 OSIS User Manual (bugs & feature requests)
- 6 See also
- 7 External links
OSIS 2.1.1 Change Requests
Anyone with an outstanding OSIS bug report or feature proposal for consideration for inclusion into an updated OSIS schema, please write a very concise change request here in this page, including motivating use case.
CrossWire updated schema
An an interim measure, we are maintaining an updated validation schema based on the contents of this page.
- Currently these are looking for a new home but are currently at:
http://www.crosswire.org/~dmsmith/osis
In that location there are various iterations of the schema:
- osisCore.2.1.1-orig.xsd (The original schema, with some changes to whitespace).
- osisCore.2.1.1-cw1.xsd
- osisCore.2.1.1-cw2.xsd
- ...
- osisCore.2.1.1-cwN.xsd (Where N is the highest version number.)
- osisCore.2.1.1-cw-latest.xsd (The same as osisCore.2.1.1-cwN.xsd)
i.e. The most recent edition will usually be found in the osis directory, with filename osisCore.2.1.1-cw-latest.xsd.
This URL may be used in place of the official BibleTechnologies URL for validating XML files submitted for modules.
Bugs
Alpha testing bugs
- List bugs in the schema that cause correct OSIS not to validate.
osisGenRegex bug
Currently that regex looks like [1], but it should looks like [2]:
[1] ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+) [2] ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_)+)*:)?([^:\s])+) (missing + right here ^)
So our document with the following element isn't valid because the string "Strong" cannot be more than 1 character long in the current schema: <w morph="robinson:N-NSF" lemma="lemma.Strong:βίβλος">βίβλος</w>
--Osk 19:48, 5 November 2010 (UTC)
milestoned <lg>
Since the <l> element can only occur within an <lg> element, use of milestoned <lg> prevents use of <l> elements (within that <lg>). Since <lg> is milestonable, one would presume that the following snippet would be valid, but it is not, for the above reason:
<lg sID="eg1"/> <l>Poetry line</l> <l>Poetry line</l> <lg eID="eg1"/>
--Osk 18:18, 31 December 2011 (MST)
The <lg> element does not allow for mixed content. However the use of the milestoned <lg> wrongly allows for it.
<lg sID="eg2"/> text <lg eID="eg2"/>
--Dmsmith 16:29, 14 October 2012 (MDT)
<closer> in <verse> container?
According to the OSIS manual (cf. 11.1.3 on p. 58), it should be possible to embed a <closer> element within a <verse> container, but the schema does not allow this. One or the other should be corrected. --Osk 05:56, 6 July 2012 (MDT)
<seg> in <cell>
This was already reported to osis-users, but for the sake of completeness: There's a typo that allows "seq" in <cell> instead of "seg". --Osk 04:29, 22 February 2014 (MST)
Beta testing bugs
- List bugs in the schema that allow incorrect OSIS to validate.
rdg
In these lines of the schema:
<xs:simpleType name="rdgType"> <xs:union memberTypes="osisRdg attributeExtension xs:string"/> </xs:simpleType>
- osisRdg is a list (alternate, variant).
- attributeExtension is a regular expression allowing x-….
- xs:string allows any string expression.
Thus rdg elements with any text value as the type attribute will always validate, even though they should fail for anything other than (alternate, variant, x-userdefined)
David Haslam 07:19, 22 January 2016 (MST)
lineType
Similar to above:
<xs:simpleType name="lineType"> <xs:union memberTypes="osisLine attributeExtension xs:string"/> </xs:simpleType>
David Haslam 07:24, 22 January 2016 (MST)
lineGroup
Similar to above:
<xs:simpleType name="lineGroupType"> <xs:union memberTypes="osisLineGroup attributeExtension xs:string"/> </xs:simpleType>
David Haslam 07:24, 22 January 2016 (MST)
Feature requests
OSIS Validation
- List OSIS constructs that currently fail to validate, yet which would be better to allow.
Allow <divineName> within <w>
Often a Hebrew word is translated into multiple English words. In the case of the Divine name, the tetragrammaton, there are frequent "of the LORD", "to the LORD", "the LORD", .... In OSIS these would properly be represented as: <w lemma="strong:H03068">the <divineName>Lord</divineName></w>. To get around this short-coming a hack has to be employed where an element that allows <divineName> is allowed to be in <w>. <seg> is allowed in <w> and allows <divineName> within it: <w>the <seg><divineName>Lord</divineName></seg></w>.
Note: the most recent release of the KJV assumes that this has been fixed. --Dmsmith 07:20, 23 February 2014 (MST)
Allow <divineName> within <name>
The OSIS generated by usfm2osis.py for beibl.net files provided one example [1], viz.
<verse sID="Num.21.14" osisID="Num.21.14"/>Mae <name type="x-workTitle">Llyfr Rhyfeloedd yr <divineName>ARGLWYDD</divineName> </name> yn cyfeirio at y lle fel yma:
A similar hack is required using <seg>, viz.
<verse sID="Num.21.14" osisID="Num.21.14"/>Mae <name type="x-workTitle">Llyfr Rhyfeloedd yr <seg><divineName>ARGLWYDD</divineName></seg> </name> yn cyfeirio at y lle fel yma:
This ought also to apply for any other element that allows seg but not divineName.
Allow <transChange> within <w>
An encoder ought to be allowed to put <transChange> on elements smaller than an orthographic word. If I'm translating an instance of "λόγος", but for some reason I believe that I should translate it as "words", I ought to be able to encode <w>word<transChange>s</transChange></w>. --Osk 19:48, 5 November 2010 (UTC)
Add an element for morphology within <w>
Necessary for encoding documents like MORPH (WLC + morphology), we need an element to embed within <w> to carry lexical information. I suggest calling it <m> and giving it all of the attributes found on <w>. --Osk 19:48, 5 November 2010 (UTC)
Allow <transChange> within <hi>
A highlighted sentence or part of a sentence is a unit, including any transChange parts of it. At the moment a highlighted sentence with a transChange will look like this:
<hi type="bold"> Texttexttext </hi><transChange><hi type="bold"> moreText</hi></transChange><hi type="bold"> TextText</hi>
<hi type="bold"> Texttexttext <transChange>moreText</transChange> TextText</hi>
This would look cleaner and would be also closer to what is meant. refdoc:talk 16:02, 3 August 2011 (MDT)
Allow <catchWord> within <hi>
A highlighted sentence or part of a sentence is a unit, including any catchWord parts of it. At the moment a highlighted sentence with a catchWord will look like this:
<hi type="bold"> Texttexttext </hi><catchWord><hi type="bold"> moreText</hi></catchWord><hi type="bold"> TextText</hi>
<hi type="bold"> Texttexttext <catchWord>moreText</catchWord> TextText</hi>
This is identical in form to the <transChange> issue. The problem with both of these is that <transChange> and <catchWord> may reasonably be styled in the same fashion as what is indicated by <hi>. --Dmsmith 16:58, 14 October 2012 (MDT)
Allow multiple types for <hi>
It'd really be convenient for
<hi type="bold italic small-caps">text</hi>
rather than
<hi type="bold"><hi type="italic"><hi type="small-caps">text</hi></hi></hi>
--Dmsmith 16:57, 14 October 2012 (MDT)
Allow <hi> within <title>
There are some languages for which the earlier orthography used an italicised N (both cases) as a separate letter of the alphabet.
Example: Old Pohnpeian. Allowing <hi type="italics">n</h> within the text of a title element would obviate the need to use the seg element as a workaround.
David Haslam 13:55, 15 January 2016 (MST)
- The use of italics to mark a single character within a word must interfere with the the search function of SWORD and JSword. It would have been better if the Old Pohnpeian alphabet had used a separate character such as Ñ (ñ). In the modern orthography, the digraph ng is used for this consonant. David Haslam (talk) 11:27, 14 February 2016 (MST)
Allow <transChange> within <note>
When translating an alternate Greek version of a passage, added words need to be indicated.
Note: the most recent release of the KJV assumes that this has been fixed. --Dmsmith 07:22, 23 February 2014 (MST)
Allow <hi> within <abbr>
To restrict the highlighting to letters and exclude punctuation marks, the abbr element should allow the hi element. This avoids having to use a seg hack to achieve the required markup:
<abbr expansion="Psalm"><hi type="spaced-letters">PSAL</hi>.</abbr>
would become possible, and obviates the need to treat any characters different to others as the engine renders the special higlighting.
Allow remote header reference
When serving short passages via web services, as valid OSIS documents, a full header is obtrusive. Also, in a collection of related documents, for example separate book files for a Bible, one centralized header would be more maintainable. The simplest approach would probably be to allow @href on the header element, to abstract some or all of the header content. See Troy's related post.
Allow shadow/virtual elements
A second requirement for distributing valid OSIS fragments through web services is a form of virtual, or shadow, element to supply the context of the given fragment. A new global attribute for indicating this virtual status is essential to distinguish them from the actual markup of the document. In the ESV API, they have this construct via `virtual` attribute (see description for `include-virtual-attributes``). See Troy's related post (same as previous).
New Features
- List new features or extensions to existing features here.
Biblical Hebrew
Add further <hi> types to support Biblical Hebrew
The Masoretic Text includes some words whose characters have a different style than the main text. These three styles use "large", "small" and "suspended" letters.[1]
MT scholars would find it beneficial if these special text styles could be properly represented in OSIS XML (and rendered as such in modules).
Provide type attribute values to support small, large and suspended Hebrew glyphs.
This would enable more accurate display of these orthographic peculiarities found in the Tanakh.
Biblical Hebrew is an area where the usual priority of semantic markup over presentational markup cannot be taken for granted.
David Haslam
These new hi types should be implemented in a way that retains the comaptibility with search features. A whole word should be wrapped, with the letters to be rendered specified by means of a further attribute value.
Note:
Improve Ketiv/Qere markup in Biblical Hebrew
See https://en.wikipedia.org/wiki/Qere_and_Ketiv
A ketiv or qere can consist of one or more words, and so need to be grouped and related to one another. I propose adding <ketiv> with @id, and <qere> with @idref, to contain the content (<w> elements) and allow validation of the connection. A qere with no ketiv could be marked up without the @idref.
- This sounds like a good application for <seg>. I would recommend named types for <seg> instead: ketiv & qere. --Osk 00:37, 23 February 2014 (MST)
- <seg type="qere">...</seg> and <seg type="ketiv">...</seg> is the change request.
- <seg type="x-qere">...</seg> and <seg type="x-ketiv">...</seg> could be used interim.
- David Haslam 14:11, 15 January 2016 (MST)
Add peripheral types from USFM to osisDivs
Add the additional USFM peripheral types to osisDivs to maintain feature parity. I believe OSIS 2.1.1 had this feature parity at the time of its release, but USFM has standardized additional peripheral types since then, which should be added as: halfTitlePage, promotionalPage, foreword, alphabeticalContents, tableofAbbreviations, chronology, weightsandMeasures, mapIndex, ntQuotesfromLXX, spine --Osk 01:04, 23 February 2014 (MST)
Calendar types
Add the following calendar system:
- type="Ethiopian"
May be required as and when we support Bibles & Commentaries for the Ethiopian Orthodox Church. David Haslam 07:53, 22 January 2016 (MST)
Quotation types
From the manual (p. 43): "The rendering for quotations marks after an interruption, for example, can be distinguished using the type attribute on this element, with values such as initial, medial, and final." Please make these @type values official: initial, medial, and final.
Milestonable <p>
For documents where the primary structure is book, chapter, verse, like the Authorized Version or the Hebrew Bible, we should be able to mark up paragraphs as milestones. This would allow for equality, rather than making book, section, paragraph a privileged system.
Improve Selah markup
Selah can be represented at the end of a line. The markup of <l type="selah">...</l> does not allow for the text identified as selah to be at the end of the current line. Maybe allow for a separate markup, rather than a type of line.
- But see also http://www.crosswire.org/tracker/browse/MODTOOLS-84 David Haslam 13:33, 2 January 2015 (MST)
title subType
Add the following attributes for use along with type="chapter"
in the title element.
subType="chapterDescription" subType="chapterLabel"
The former would faciliate SWORD to be extended to show chapter descriptions in italics and normal font size or smaller.
The latter would faciltate SWORD to be extended to display the module chapter labels instead of the normal chapter labels programmed in the front-end.
Currently, these are typically done using "x-" prefix in the attribute value, without any SWORD support.
David Haslam 08:09, 27 January 2016 (MST)
name type="book"
Allow type="book"
as an attribute of the name element to identify Non-canonical books referenced in the Bible. David Haslam (talk)
divineName type normal
There are four places in the KJV where the word JEHOVAH is all uppercase, but not small-caps. The following markup is desirable for these:
<divineName type="normal">JEHOVAH</divineName>
The locations are: Exodus 6:3, Psalms 83:18, Isaiah 12:2, Isaiah 26:4. David Haslam
New hi types
In addition to these defined type values,
- • acrostic • bold • emphasis • illuminated • italic • line-through • normal • small-caps • sub • super • underline
it would be useful to add several further types for the hi element. David Haslam
overline
SWORD already supports type="overline"
for the hi element, despite it not being defined in the schema before.
dotted-underline
Dotted underline is sometimes used in Chinese ideographic script to highlight certain words. Should we provide for this in OSIS?
dashed-underline
This is similar to dotted underline, but the line is dashed rather than dotted.
spaced-letters
Many of the book titles in the Blayney edition contain words in which the letters are spaced. e.g.
The R E V E L A T I O N of S. J O H N the Divine.
It's desirable to have a new highlight type for these, e.g.
<hi type="spaced-letters">REVELATION</hi>
In this way the highlighted text will be semantically still be the same word, even though it is displayed differently. As and when this is implemented by SWORD, the spaces should be of the non-break type.
drop-caps
Many printed Bibles use drop-caps for the first letter in a verse[1], usually the first verse in each chapter. To reproduce this in electronic editions, a means to implement this presentational format is required.
Note:
- ↑ To maintain comptibility with search features, the whole word should be marked, not just the first letter.
The same goes fortype="illuminated".
. The style sheet or rendering will determine that it applies only to the first letter.
Grain operator @s
The osisRef fine grain string[1] operator @s[text] works only for a whole word without spaces. It will also find only the first occurrence of the specified word.
It would be useful to expand this operator to facilitate:
- text containing spaces, rather than only a single word
- returning the whole string rather than merely a pointer to its first character
- text containing punctuation marks[2]
- a method to find further occurrences of the same word after the first, e.g. for the nth instance,
@s[word]n
- a shorter way of specifying a range of consecutive words within the same osisRef[3], by such as:
@s[first]-[last]
- a way of specifing a comma separated series of words within the same osisRef, by such as:
@s[third],[fifth],[seventh]
- to allow a method for the user agent (e.g. SWORD) to process a fine grain string ending with "…", the HORIZONTAL ELLIPSIS (U+2026), by returning the match as to just before the next terminating punctuation mark, or the end of the specified osisRef.
Note:
- ↑ OSIS User Manual (pp.82, 91, 148). It's uncertain whether this OSIS feature is even supported by SWORD.
- ↑ It's uncertain whether it can cope with a string that has an apostrophe, or one that is hyphenated.
- ↑ This would avoid having to repeat the full osisRef for the end of the range.
David Haslam 27 January 2016 (MST)
OSIS User Manual (bugs & feature requests)
- List here any errors in the OSIS User Manual and any omissions that need rectifiying.
Head Elements
The OSIS manual give the head element as a means of providing for titles. It is not in the schema as a child of div, but it is in the manual.
DivineName Element
Manual gives type="x-yhwh"
in 11.5.1.2 but it's unnecessary. It also has the content as LORD, but it should be Lord.
<seg type="benediction">
This is mentioned as a suggestion in 11.1.4 but benediction is not a defined value for the type attribute of seg.
These are • alluded • keyword • otPassage • verseNumber. It should therefore have the "x-" prefix.
See also
External links
Our friend, Michael Paul Johnson maintains his own Modified OSIS schema. This is used in his Haila software.