Difference between revisions of "Talk:Modules in the beta repository"

From CrossWire Bible Society
Jump to: navigation, search
(Moved discussion about module encoding to the "Talk" page.)
(No difference)

Revision as of 11:35, 22 June 2008

I thought I'd move discussion about the implementation of modules here. It was cluttering the other page and when we'd get a new version or address a problem, we'd reset the row. Here we can keep the info until we are done.--Dmsmith 05:35, 22 June 2008 (MDT)

ABU

In Matt 5 there is a WoC display problem. The WoC has a start in verse 3 and ends at the end of the last chapter. Fortunately, the WoC start and finish in this module is on chapter boundaries. If it had started in chapter 5 and finished in chapter 7 then the display of chapter 6 would never highlight the WoC.

The SWORD Engine currently terminates WoC at a verse boundary, regardless of how it is encoded. This is because it does not keep state regarding WoC across a single verse. No frontend will display it correctly, because it is not a frontend problem.

I (DM) see several solutions (there may be others):

  • Change osis2mod to accept a wider range of valid OSIS inputs, creating a module that the SWORD Engine can handle, specifically allowing how this is encoded. Then it can work with 1.5.9.
  • Change the SWORD engine to handle this input at a chapter level. This is not a complete solution. Currently, when the KJV is searched, the WoC are highlighted in the search results list. This input will only highlight Matt 5:3 for any hit in Matt 5. Changing the SWORD Engine will require this module to be 1.5.12.
  • Change the module so that it is encoded as recommended in the wiki for OSIS Bibles. Then it can work with 1.5.9.

The easiest is to change the module. If not that, I'd suggest changing osis2mod, which probably is the best solution, resulting in easier module definition. I don't like the SWORD Engine change, because it is incomplete.


  • It looks to me as if it doesn't really matter how I encode the text, it won't render correctly on some set of frontends. ... was the original encoding, and that got complaints, so I switched to milestoned ... encoding (1.2), and that got more complaints, so I switched back to containers (1.3). Play with both. Tell me which is less wrong. Osk 14:03, 21 June 2008 (MDT)

There have been 3 versions:

The version 1.1 had for Matt 5:3-4:
(Note: my comment on this was that the sIDs and the eIDs were not properly encoded.)

5.3:
<q marker="" sID="q1" who="Jesus"/><br/>
   Happy the poor in spirit;
   for theirs is the kingdom of heaven.
<q eID="q1++" marker="" who="Jesus"/>
5.4:
<q marker="" sID="q1" who="Jesus"/>
   Happy they that mourn;
   for they shall be comforted.
<q eID="q1++" marker="" who="Jesus"/>

Version 1.2 has:

5.3:
<q marker="" who="Jesus" sID="q7"/>
   Happy the poor in spirit;
   for theirs is the kingdom of heaven.
5.4:
   Happy they that mourn;
   for they shall be comforted.

Version 1.3 has:

<q marker="" who="Jesus">
   Happy the poor in spirit;
   for theirs is the kingdom of heaven.
5.4:
   Happy they that mourn;
   for they shall be comforted.

Of the above, 1.1 is in my opinion, the best. It can work in the search result.

The following variant of 1.1 will work for all frontends and given how simple the ABU is, it should produce well-formed valid XML. The way to think about this is that is not a quotation marker but is a WoC marker as in <woc>...</woc> that has to be started and stopped in each verse and surround each word/phrase that Jesus uttered.

5.3:
<q marker=""  who="Jesus">
   Happy the poor in spirit;
   for theirs is the kingdom of heaven.
</q>
5.4:
<q marker=""  who="Jesus">
   Happy they that mourn;
   for they shall be comforted.
</q>

Dmsmith 19:14, 21 June 2008 (MDT)


There's some confusion here.

  • 1.0 was encoded with ....
  • 1.1 was encoded with .... That was just a bug in that I forgot to execute the ++ expression, so it got copied into the text. If I ever did ... in a way that was contained within the verse, it was probably at this stage and was an error stemming from the ++ execution bug.
  • 1.2 just executed the ++ interator, so it has ... and iterates the number.
  • 1.3 goes back to 1.0 (it's quite possibly bit-for-bit identical) with ....
  • ABU_1_2 is identical to 1.2 because it is literally the same files. I just moved the directory on the server from ABU to ABU_1_2 before uploading the new version.
  • Whenever milestones are used, <verse/> appears as a container, and vice versa.
  • The basic hierarchy of OSIS Bibles is Book-Section-Paragraph rather than Book-Chapter-Verse. We asked SIL/Wycliffe (and maybe some UBS guys); they said use BSP, not BCV. We chose to prefer BSP. That's why <div/> and <p/> (the BSP hierarchy elements) aren't milestonable but <chapter/> and <verse/> are.

Osk 22:35, 21 June 2008 (MDT)


OK, I've got my numbers off. The version that had q++, I thought was 1.2. I guess I never saw 1.2. I've corrected my statement above to your information.

Regarding your comment about BCV, the &ltdiv> element is milestoneable.

If BSP is the proper way to encode an OSIS Bible, then I think:

  1. <verse> should always be milesoned.
  2. osis2mod should preserve the verse element (start and end) in the text and get rid of the pre-verse hack. With BSP, this will occur more and more.

Whether we encode OSIS Bible texts as BCV or BSP, the resulting module needs to work for Bible applications. In the SWORD engine verse is the indexable unit in the SWORD engine. All of our applications display verses in isolation, at least in the search result, some elsewhere.

I think the following is the best short term solution (which is a minor variation of 1.1):

  1. If quotation marks are to be displayed in the module, mark the beginning of the quote in chapter 5 with , and the end of the quote in Matt 7 with . Add marker attribute to be UTF-8 curly quotes if desired. Also, if the quote is interrupted, such that quotation marks should appear in the span of the Sermon on the Mount, then put the same there.

    If quotations are not needed then these are entirely unnecessary for our code as it stands today, but they might come in handy if we had each quote in the scripture marked with who as we could analyze the text for who said what.
  1. Within each verse surround the actual words of Christ with <woc>...</woc>. Obviously, if these cross a BSP boundary, then they stop and start on either side of the boundary. Finally, to make valid OSIS replace those with and respectively. The milestoned version (i.e. your 1.1) should have worked for all SWORD apps as of 1.5.9. But it didn't.
    It does not work for JSword because it uses xslt to do the processing, which cannot handle it.
  1. If the module should not show quote marks, use OSISQToTick=false (From memory. So, I may have goofed this.) This makes the empty marker="" unnecessary.

Ultimately, it is the responsibility of osis2mod to placate the SWORD Engine by transforming modules to what it wants to hear. I think the best long term solution is for osis2mod to handle all properly encoded documents, such as 1.2 and 1.3. (Version 1.1 was a placation.) Obviously, if one can tediously encode 1.1, that processing can be put into osis2mod.

Dmsmith 05:35, 22 June 2008 (MDT)