Osis2mod
Contents
Introduction
osis2mod transforms an OSIS encoded Bible or commentary into a SWORD module.
History of Changes
The following outlines in reverse, chronological order the major changes to osis2mod. When several changes were made over the span of a few days, they are lumped into the most recent date. Bug fixes are not mentioned.
Date | Feature |
---|---|
2009-04-28 |
|
2009-04-24 |
|
2008-09-11 |
|
2008-02-29 |
|
2007-09-27 |
|
2007-05-13 |
|
2007-05-01 |
|
2007-04-24 |
|
2006-07-15 |
|
2006-07-04 |
|
2005-12-22 |
|
2005-04-29 |
|
2005-01-23 |
|
2004-06-12 |
|
2004-05-19 |
|
2003-11-20 |
|
2003-05-26 |
|
Transformations
Osis2mod performs the following transformations:
- Whitespace -- Allows for human-readable OSIS files.
- Leading whitespace on books, chapters and verses is removed
- Whitespace is normalized into blanks
- multiple adjacent whitespace is reduced to a single space
- Unicode handling - All modules should be UTF-8, NFC.
- Latin-1 (cp1252 and iso8859-1) are converted into UTF-8
- UTF-8 is normalized into NFC
- Milestone conversion - necessary for frontends to show a verse at a time.
(note: genX is unique for an sID/eID pair, where X is a number.)- <q ...>...</q> is converted into <q sID="genX" .../>...<lt;q eID="genX" .../>. Note: Quotes with who="Jesus" are not transformed at this time.
- <p ...>...</p> becomes <div type="paragraph" sID="genX" .../>... <div type="paragraph" eID="genX" ...>.
- <verse ...>...</verse> becomes <verse sID="genX" .../>...<verse eID="genX" .../>
- <chapter ...>...</chapter> becomes <chapter sID="genX" .../>...<chapter eID="genX" .../>
- <closer ...>...</closer> becomes <closer sID="genX" .../>...<closer eID="genX" .../>
- <div ...>...</div> becomes <div sID="genX" .../>...<div eID="genX" .../>
- <l ...>...</l> becomes <l sID="genX" .../>...<l eID="genX" .../>
- <lg ...>...</lg> becomes <lg sID="genX" .../>...<lg eID="genX" .../>
- <salute ...>...</salute> becomes <salute sID="genX" .../>...<salute eID="genX" .../>
- <signed ...>...</signed> becomes <signed sID="genX" .../>...<signed eID="genX" .../>
- <speech ...>...</speech> becomes <speech sID="genX" .../>...<speech eID="genX" .../>
- Words of Christ - necessary for front-ends to appropriately highlight the WOC, a verse at a time.
- <q sID="XXX" who="Jesus" .../>...<eID="XXX" who="Jesus" .../> becomes <q who="Jesus" marker=""><q sID="XXX" .../>...<q eID="XXX" .../></q>
- <q who="Jesus" ...>...</q> becomes <q who="Jesus" marker=""><q sID="genX" .../>...<q eID="genX" .../></q>
- Within the following construct, <q who="Jesus" marker="">...</q> will surround verse text.
- Pre-Verse Titles (obsolete with SVN revision 2358 for the SWORD 1.6.0 release)
- Titles immediately preceeding a verse are converted into <title type="section" subType="x-preverse>...</title>
- Interverse tags not in titles are appended to prior verse.
- (In 1.6.0) <div sID="pvX" type="x-milestone" subType="x-preverse"/>...<div eID="pvX" type="x-milestone" subType="x-preverse"/> will replace preverse titles.
Note: Other than Pre-Verse Titles these transformations can be reversed to produce the original elements.
Handling of Introductions, Titles and Inter-Verse Material
SWORD for module, testament, book and chapter introductory material. Those introductions can have appropriate titles as well. At this time, osis2mod does not handle module and testament introductions.
In SWORD 1.6.0 the handling of this material has changed.
Please Note: In the following, the effects of the above transformations are not shown.
Book Introductions and Titles
Book introductions and titles are straight forward. It includes the start of the book and everything following it up to, but not including the start of the chapter. See OSIS Bibles for best practices in marking up titles and introductions.
For example:
<div type="book" ...> ... introductory material ... <div type="chapter"...>
will put the following into the book introduction:
<div type="book" ...> ... introductory material ...
Chapter Introductions
Chapter introductions and titles are a bit problematic. Between the start of a chapter and it's first verse, we could have a chapter title, a chapter introduction and/or a start of a section of verses or a titled verse. Osis2mod now handles this in a predictable fashion. From the start of the chapter up to and not including a section div or a non-chapter title, the content is chapter introduction. After that, it is part of the verse.
Specifically, the following list gives the possible first elements following the chapter introduction.:
- <div type="section" ...>
- <title type="yyy" ...> where yyy is not main or chapter.
- <title ...> where no type is given.
For example,
<chapter ...> <title type="chapter">Chapter Title</title> <div type="introduction">... intro ...</div> <p> <lg> <div type="section"> or <title> or <title type="yyy">
will put the following into the chapter introduction:
<chapter ...> <title type="chapter">Chapter Title</title> <div type="introduction">... intro ...</div> <p> <lg>
Note: This example shows that the <p> and <lg> elements are misplaced.
The material starting with:
<div type="section"> or <title> or <title type="yyy">
and including everything up to the <verse ...> will be put into the following construct and prepended to the verse content.
<div type="x-milestone" subType="x-preverse" sID="pvXXX"/> <div type="section"> or <title> or <title type="yyy"> <div type="x-milestone" subType="x-preverse" eID="pvXXX"/> ... verse content ...
Between Verses
Between verses we may have closing tags to finish off what was started earlier, structural opening tags (e.g. line groups, divisions, paragraphs, ...), titles and/or introductory material.
Upon finding the close of a verse, osis2mod will append all adjacent closing tags to it. Once it finds a start tag, it will attach that to the following verse, marking it up in the same fashion.
For example, the following would be prepended to the verse content:
<div type="x-milestone" subType="x-preverse" sID="pvXXX"/> <div type="section"> <title>Section title</title> <p> <lg> <div type="x-milestone" subType="x-preverse" eID="pvXXX"/> ... verse content ...
Last Verse
The material following the last verse of a chapter is appended to that verse. You might find: ... verse content ... <div type="colophon">... colophon text ...</div> </chapter> </div> </div>
Exclusions
Only content starting the first <div> to the last </div> is retained. All other is excluded. From a practical perspective, this excludes the OSIS header information.
Usage
It is always best to use the most recent version of osis2mod and compiling it from SVN is best.
After the SWORD 1.5.9 release, osis2mod was changed to take flags rather than positional arguments.
usage: ./osis2mod <output/path> <osisDoc> [OPTIONS] -a augment module if exists (default is to create new) -z use ZIP compression (default no compression) -Z use LZSS compression (default no compression) -b <2|3|4> compression block size (default 4): 2 - verse; 3 - chapter; 4 - book -c <cipher_key> encipher module using supplied key (default no enciphering) -N do not convert UTF-8 or normalize UTF-8 to NFC (default is to convert to UTF-8, if needed, and then normalize to NFC) Note: UTF-8 texts should be normalized to NFC. -4 use 4 byte size entries (default is 2). Note: useful for commentaries with very large entries in uncompressed modules (default is 65535 bytes) -v <v11n> use versification scheme other than KJV.
<output/path>
This a path to any existing directory. It is best for it to be empty.
<osisDoc>
This is a single, well-formed, valid OSIS document.
-a
Osis2mod can create a Bible all at once or incrementally, depending on the presence of the -a flag. This
provides for two abilities,
- Assembling a Bible from book files:
mkdir /tmp/mymodule osis2mod /tmp/mymodule matt.xml osis2mod /tmp/mymodule -a mark.xml ... osis2mod /tmp/mymodule -a rev.xml
Note: The book files can be in any order. SWORD will order them correctly in the index.
- Adding corrections to a Bible:
osis2mod /tmp/mymodule -a fixes.xml
Note: When fixes are put into the module they are appended to the data file and do not actually replace the verses. The index file is adjusted to point to the new place in the data file.
-z|-Z
A SWORD Bible can be compressed with Zip (-z) or LZSS (-Z). All of SWORD's Bible modules are compressed with Zip. This saves significant space over an uncompressed module. Uncompressed modules are useful for debugging.
-b 2|3|4
This setting is only useful for a compressed module. The choice as to whether to use Verse (2), Chapter (3) or Book (4, the default) level compression depends upon the amount of data in the block. A typical Bible is best compressed book by book. A commentary, chapter by chapter. If the commentary is very robust and the amount of text per verse is really huge, then verse compression might make sense.
All of SWORD's compressed Bible modules are compressed by book. Basically, all of the verses in a block are compressed and appended to the data file. For this reason, the datafile cannot be uncompressed by anything other than the SWORD and JSword libraries.
When creating the module by appending it is important to do so by whole compression block. That is, if blockType is Chapter, then the osisDoc needs to contain one or more whole chapters.
-c cipherKey
This is typically 16 characters in length, having no leading or trailing spaces, consisting of alternating sets of 4 alpha and 4 numeric characters, such as Aduf0274PjNq0328.
-N
All OSIS modules should be UTF-8 and all that are UTF-8 are also to be NFC. The default is to automatically detect the presense of Latin-1 (either cp1252 or iso8859-1) and convert it to UTF-8 and to normalize UTF-8 to NFC. This flag will turn off this behavior and is useful for creating Latin-1 modules or for modules that are already UTF-1 and NFC.
Note: this was added late Feb 2008 and requires ICU support when compiling.
-4
This flag allows uncompressed OSIS modules to have entries that are larger than 64K bytes. This is needed for Bibles, having large introductory materials, and for commentaries with large entries. All compressed OSIS modules can handle large entries.
Note: this was added late Apr 2009 and will be part of the SWORD 1.6.0 release (formerly known as 1.5.11).
-v v11n
By default, osis2mod uses the KJV versification. The practical implication of this is that only books in the KJV canon are allowed and any text in an allowed book are retained. However, if the verse reference of a supported book falls outside of the versification it is appended to the prior verse in the canon. This flag allows for an alternate versification.
Note: this was added late Apr 2009 and will be part of the SWORD 1.6.0 release (formerly known as 1.5.11). With that release, only the Leningrad Codex will be supported, with -v Leningrad.