Talk:Modules in the beta repository

From CrossWire Bible Society
Revision as of 02:49, 27 June 2008 by Osk (talk | contribs) (ABU)

Jump to: navigation, search

I thought I'd move discussion about the implementation of modules here. It was cluttering the other page and when we'd get a new version or address a problem, we'd reset the row. Here we can keep the info until we are done.--Dmsmith 05:35, 22 June 2008 (MDT)

ABU

In Matt 5 there is a WoC display problem. The WoC has a start in verse 3 and ends at the end of the last chapter. Fortunately, the WoC start and finish in this module is on chapter boundaries. If it had started in chapter 5 and finished in chapter 7 then the display of chapter 6 would never highlight the WoC.

The SWORD Engine currently terminates WoC at a verse boundary, regardless of how it is encoded. This is because it does not keep state regarding WoC across a single verse. No frontend will display it correctly, because it is not a frontend problem.

I (DM) see several solutions (there may be others):

  • Change osis2mod to accept a wider range of valid OSIS inputs, creating a module that the SWORD Engine can handle, specifically allowing how this is encoded. Then it can work with 1.5.9.
  • Change the SWORD engine to handle this input at a chapter level. This is not a complete solution. Currently, when the KJV is searched, the WoC are highlighted in the search results list. This input will only highlight Matt 5:3 for any hit in Matt 5. Changing the SWORD Engine will require this module to be 1.5.12.
  • Change the module so that it is encoded as recommended in the wiki for OSIS Bibles. Then it can work with 1.5.9.

The easiest is to change the module. If not that, I'd suggest changing osis2mod, which probably is the best solution, resulting in easier module definition. I don't like the SWORD Engine change, because it is incomplete.


  • It looks to me as if it doesn't really matter how I encode the text, it won't render correctly on some set of frontends. ... was the original encoding, and that got complaints, so I switched to milestoned ... encoding (1.2), and that got more complaints, so I switched back to containers (1.3). Play with both. Tell me which is less wrong. Osk 14:03, 21 June 2008 (MDT)

There have been 3 versions:

The version 1.1 had for Matt 5:3-4:
(Note: my comment on this was that the sIDs and the eIDs were not properly encoded.)

5.3:
<q marker="" sID="q1" who="Jesus"/><br/>
   Happy the poor in spirit;
   for theirs is the kingdom of heaven.
<q eID="q1++" marker="" who="Jesus"/>
5.4:
<q marker="" sID="q1" who="Jesus"/>
   Happy they that mourn;
   for they shall be comforted.
<q eID="q1++" marker="" who="Jesus"/>

Version 1.2 has:

5.3:
<q marker="" who="Jesus" sID="q7"/>
   Happy the poor in spirit;
   for theirs is the kingdom of heaven.
5.4:
   Happy they that mourn;
   for they shall be comforted.

Version 1.3 has:

<q marker="" who="Jesus">
   Happy the poor in spirit;
   for theirs is the kingdom of heaven.
5.4:
   Happy they that mourn;
   for they shall be comforted.

Of the above, 1.1 is in my opinion, the best. It can work in the search result.

The following variant of 1.1 will work for all frontends and given how simple the ABU is, it should produce well-formed valid XML. The way to think about this is that is not a quotation marker but is a WoC marker as in <woc>...</woc> that has to be started and stopped in each verse and surround each word/phrase that Jesus uttered.

5.3:
<q marker=""  who="Jesus">
   Happy the poor in spirit;
   for theirs is the kingdom of heaven.
</q>
5.4:
<q marker=""  who="Jesus">
   Happy they that mourn;
   for they shall be comforted.
</q>

Dmsmith 19:14, 21 June 2008 (MDT)


There's some confusion here.

  • 1.0 was encoded with ....
  • 1.1 was encoded with .... That was just a bug in that I forgot to execute the ++ expression, so it got copied into the text. If I ever did ... in a way that was contained within the verse, it was probably at this stage and was an error stemming from the ++ execution bug.
  • 1.2 just executed the ++ interator, so it has ... and iterates the number.
  • 1.3 goes back to 1.0 (it's quite possibly bit-for-bit identical) with ....
  • ABU_1_2 is identical to 1.2 because it is literally the same files. I just moved the directory on the server from ABU to ABU_1_2 before uploading the new version.
  • Whenever milestones are used, <verse/> appears as a container, and vice versa.
  • The basic hierarchy of OSIS Bibles is Book-Section-Paragraph rather than Book-Chapter-Verse. We asked SIL/Wycliffe (and maybe some UBS guys); they said use BSP, not BCV. We chose to prefer BSP. That's why <div/> and <p/> (the BSP hierarchy elements) aren't milestonable but <chapter/> and <verse/> are.

Osk 22:35, 21 June 2008 (MDT)


OK, I've got my numbers off. The version that had q++, I thought was 1.2. I guess I never saw 1.2. I've corrected my statement above to your information.

Regarding your comment about BCV, the &ltdiv> element is milestoneable.

If BSP is the proper way to encode an OSIS Bible, then I think:

  1. <verse> should always be milesoned.
  2. osis2mod should preserve the verse element (start and end) in the text and get rid of the pre-verse hack. With BSP, this will occur more and more.

Whether we encode OSIS Bible texts as BCV or BSP, the resulting module needs to work for Bible applications. In the SWORD engine verse is the indexable unit in the SWORD engine. All of our applications display verses in isolation, at least in the search result, some elsewhere.

I think the following is the best short term solution (which is a minor variation of 1.1):

  1. If quotation marks are to be displayed in the module, mark the beginning of the quote in chapter 5 with , and the end of the quote in Matt 7 with . Add marker attribute to be UTF-8 curly quotes if desired. Also, if the quote is interrupted, such that quotation marks should appear in the span of the Sermon on the Mount, then put the same there.

    If quotations are not needed then these are entirely unnecessary for our code as it stands today, but they might come in handy if we had each quote in the scripture marked with who as we could analyze the text for who said what.
  1. Within each verse surround the actual words of Christ with <woc>...</woc>. Obviously, if these cross a BSP boundary, then they stop and start on either side of the boundary. Finally, to make valid OSIS replace those with and respectively. The milestoned version (i.e. your 1.1) should have worked for all SWORD apps as of 1.5.9. But it didn't.
    It does not work for JSword because it uses xslt to do the processing, which cannot handle it.
  1. If the module should not show quote marks, use OSISQToTick=false (From memory. So, I may have goofed this.) This makes the empty marker="" unnecessary.

Ultimately, it is the responsibility of osis2mod to placate the SWORD Engine by transforming modules to what it wants to hear. I think the best long term solution is for osis2mod to handle all properly encoded documents, such as 1.2 and 1.3. (Version 1.1 was a placation.) Obviously, if one can tediously encode 1.1, that processing can be put into osis2mod.

Dmsmith 05:35, 22 June 2008 (MDT)

---

One of the longstanding principles of our employment of OSIS has been that we should accept any valid OSIS, but that we need not maintain valid OSIS in our data. So osis2mod should accept anything that is valid, but the contents of the modules themselves need not be valid OSIS. (I'm more concerned with actual markup here. The cases where people actually want to use UTF-16 encoded OSIS or single quotes instead of double in attribute values aren't significant enough for me to care.)

  • <verse> should generally be the first element that gets milestoned any time there is a well-formedness problem. The purpose for allowing both milestoned and container forms was to allow simply Bibles to be encoded simply.
  • I agree that <verse> elements should be preserved. That's why I made the change, committed it, and posted a Bible or two using this format. Troy had objections and rolled back the changes, though they had no negative effects on any existing or future content. If you want to pursue this, talk to Troy. I see not preserving <verse> as universally bad. I don't see any problem with maintaining the pre-verse title system.
  • There should be no problem with using marker="". This is actual OSIS, whereas OSISQToTick=false was added to handle OSIS docs that don't actually conform to the standard. The marker attribute came in one of the last OSIS releases (2.1 or 2.1.1) and solved (in a more official capacity) a problem for which we had developed a hack (OSISQToTick). If we're concerned with accepting any valid OSIS, we should definitely accept marker="" and should probably deprecate OSISQToTick.

We should probably define a non-standard method of encoding that fits within <verse> elements but that can be easily derived (by osis2mod) from either standard, valid encoding. I suspect we should just use (though <milestone/> is another possibility). Given the following input:

<verse>
  cdata
  <q osisID="q1" sID="q1" who="Jesus" marker=""/>
    cdata
</verse>
<verse>
    cdata
  <q osisID="q1" eID="q1" who="Jesus" marker=""/>
  cdata
</verse>

We could generate:

<verse>
  cdata
  <q osisID="q1" sID="q1" who="Jesus" marker=""/>
    cdata
  <q type="x-continuation" eID="" who="Jesus" marker=""/>
</verse>
<verse>
  <q type="x-continuation" sID="" who="Jesus" marker=""/>
    cdata
  <q osisID="q1" eID="q1" who="Jesus" marker=""/>
  cdata
</verse>

And given the following input:

<verse sID="v1"/>
  cdata
  <q osisID="q1" who="Jesus" marker="">
    cdata
<verse eID="v1"/>
<verse sID="v2"/>
    cdata
  </q>
  cdata
<verse eID="v2"/>

We could generate:

<verse sID="v1"/>
  cdata
  <q osisID="q1" who="Jesus" marker="">
    cdata
  <q type="x-continuation" eID="" who="Jesus" marker=""/>
<verse eID="v1"/>
<verse sID="v2"/>
  <q type="x-continuation" sID="" who="Jesus" marker=""/>
    cdata
  </q>
  cdata
<verse eID="v2"/>

I believe these will validate as OSIS and would work in BibleCS without modification. I suspect they will work in HTMLHREF frontends with little (if any) modification, but I haven't looked at the code lately. It seems to me that XSLT ought to be able to convert milestone elements to container element starts/ends for JSword, but I haven't touched XSLT in about 6 years.

Osk 20:49, 26 June 2008 (MDT)