DevTools:conf Files

From CrossWire Bible Society
Revision as of 12:17, 8 January 2018 by Refdoc (talk | contribs) (module.conf File Layout)

Jump to: navigation, search

Creating a .conf File

module.conf File Layout

SWORD uses a plain text configuration file to store information about modules. The file follows a Key=Value format. This file is used by the SWORD engine to process modules, by installers to help users install modules and by front-ends to render the module appropriately.

Different End-of-Line styles should be not be mixed in the same file. For CrossWire purposes on Linux line endings are acceptable.

Overview by Example

The module.conf file starts with an INI section, giving the ModName.

[KJV]

This is then followed by key=value pairs. While a Windows INI file allows : in addition to =, SWORD does not.

# A line that starts with a # is a comment
 ; A line that begins with a ; is also a comment
    # whitespace at the beginning of the line or end of the line is trimmed. This also is a comment.
    DataPath=./modules/texts/ztext/kjv/
# whitespace can be around the = as well.
ModDrv   =     zText
Encoding=UTF-8
BlockType=BOOK
CompressType=ZIP
SourceType=OSIS
Lang=en

Some keys can be repeated with different values:

GlobalOptionFilter=OSISStrongs
GlobalOptionFilter=OSISMorph
GlobalOptionFilter=OSISFootnotes
GlobalOptionFilter=OSISHeadings
GlobalOptionFilter=OSISRedLetterWords

Some keys support localization: Abbreviation=和合本 Abbreviation_en=ChiUn Description=和合本 (繁體字) Description_en=Chinese Union Version (Traditional)

Some fields can have RTF in a single line:

About=This is the King James Version of the Holy Bible (also known as the Authorized Version) with embedded Strong's Numbers. The rights to the base text are held by the Crown of England. The Strong's numbers in the OT were obtained from The Bible Foundation: http://www.bf.org. The NT Strong's data was obtained from The KJV2003 Project at CrossWire: http://www.crosswire.org. These mechanisms provide a useful means for looking up the exact original language word in a lexicon that is keyed to Strong's numbers.\par\par Special thanks to the volunteers at Bible Foundation for keying the Hebrew/English data and of Project KJV2003 for working toward the completion of synchronizing the English phrases to the Stephanas Textus Receptus, and to Dr. Maurice Robinson for providing the base Greek text with Strong's and Morphology. We are also appreciative of formatting markup that was provided by Michael Paul Johnson at http://www.ebible.org. Their time and generosity to contribute such for the free use of the Body of Christ is a great blessing and this derivative work could not have been possible without these efforts of so many individuals. It is in this spirit that we in turn offer the KJV2003 Project text freely for any purpose. Any copyright that might be obtained for this effort is held by CrossWire Bible Society (c) 2003 and CrossWire Bible Society hereby grants a general public license to use this text for any purpose.\par Inquiries and comments may be directed to:\par\par CrossWire Bible Society\par kjv2003@crosswire.org\par http://www.crosswire.org

Some fields allow multiple lines using \ to escape the newline:

About=This is the King James Version of the Holy Bible (also known as the Authorized Version) with embedded Strong's Numbers. The rights to the base text are held by the Crown of England. The Strong's numbers in the OT were obtained from The Bible Foundation: http://www.bf.org. The NT Strong's data was obtained from The KJV2003 Project at CrossWire: http://www.crosswire.org. These mechanisms provide a useful means for looking up the exact original language word in a lexicon that is keyed to Strong's numbers. \
\
Special thanks to the volunteers at Bible Foundation for keying the Hebrew/English data and of Project KJV2003 for working toward the completion of synchronizing the English phrases to the Stephanas Textus Receptus, and to Dr. Maurice Robinson for providing the base Greek text with Strong's and Morphology. We are also appreciative of formatting markup that was provided by Michael Paul Johnson at http://www.ebible.org. Their time and generosity to contribute such for the free use of the Body of Christ is a great blessing and this derivative work could not have been possible without these efforts of so many individuals. It is in this spirit that we in turn offer the KJV2003 Project text freely for any purpose. Any copyright that might be obtained for this effort is held by CrossWire Bible Society (c) 2003 and CrossWire Bible Society hereby grants a general public license to use this text for any purpose. \
Inquiries and comments may be directed to: \
\
CrossWire Bible Society \
kjv2003@crosswire.org \
http://www.crosswire.org

Common mistakes

Using Latin-1 or CP1252 when:

Encoding=UTF-8

Having a Byte Order Mark (BOM) at the beginning of the file[1]

U+FEFF
[KJV]

Repeating the same element with the same value:

Lang=en
...
Lang=en

Repeating the same element with different values, when the element doesn't allow repeats:

Lang=en
Lang=de

Not supplying a value:

About=

Only the field CipherKey allows this.

Having a continuation marker on the last line, causing the next key=value to be appended to the prior field.:

this is text that continues from the prior line \
Lang=en

Notes:

  1. Windows Notepad and Wordpad may silently add these to the file

Continuation

A value can span multiple lines by escaping the return with '\'. This is not a mechanism to make long lines more readable in the module.conf file. It is a means to introducing a break in the rendered output of that field when viewed by a front-end or installer. It is akin to a xHTML <br/>. That is, continuation is a formatting feature.

Most elements in a SWORD conf are expected to have short, one-line values. Elements that are expected to have multiple lines are noted.

RTF

A module.conf supports a very small, restricted subset of RTF markup. Only the following are allowed:

  • \qc - for centering
  • \par - for paragraph breaks
  • \pard - for resetting paragraph attributes, i.e. turning off centering.
  • \u{num}? - for unicode characters, where {num} is a signed, 16-bit representation of the code point and where ? is the ASCII character used in case unicode is not supported. If the {num} is less than 0 then add 65536 to it. This should only be used in modules that have an Encoding=UTF-8, but using the actual UTF-8 character is preferred.

The only uniqueness that RTF provides is centering. If centering is not needed, then use continuation lines instead of RTF.

Localization

Those .conf fields that are essentially text intended for presentation to the end-user may be localized by appending _locale to the field name, where locale is replaced by an appropriate locale code, according to BCP 47. See Lang below for details.

For example, to give a French description, you can have a field:

Description_fr=....

In order to distinguish a regional form from the primary form of a language, e.g. Brazilian Portuguese vs. the Portuguese of Portugal, append the country code as in:

Description_pt-BR=....

Script variants can be given as in:

Description_zh-Hans=.... simplified Chinese ....
Description_zh-Hant=.... traditional Chinese ....

In order for a .conf entry to appear in a localized form, a non-localized form of the same field must also occur within the .conf. For example, in order for a Description_en field to appear, a file must also possess a Description field. The locale of .conf entries without the locale modifier is the default and must reflect the locale/language of the module itself (as specified in Lang=) or English (if there are no localized versions of the field). In general, fields should be provided in the language of the module itself with English translations provided in parallel fields localized with _en. There is no explicit upper bound on the quantity of localized fields, but all localized and localizable fields should be unique.

Notes:

  1. At the present, this is only a planned feature. Module .confs can & should make use of this facility, but at the moment, there is no front-end support for it.
  2. See also Localized Language Names.

Key elements of a SWORD module.conf

Below is a listing of the possible directives in that file.

Some keys can be repeated. Don't repeat ones that can't. One or more of them will be ignored.

Some can have values that span more than 1 line with '\' at the end of a line indicating that the text on the next line continues the value. Don't use continuation unless allowed. It will produce different results in different front ends.

RTF is allowed in some values. Don't use it otherwise. It will produce different results in different front ends.

Some allow HTML <a href="xxx">label</a>hypertext links. HTML is not allowed otherwise.

Values specifications are shown as <content spec>. The < and > are not to be included.

Enumerated values are shown in bold. These should be used exactly as given and no other values should be used.

The order of elements specified in a conf file is immaterial, except where specified otherwise.

Configuration elements not defined in this page are assumed to be ignored by most front-end applications.

Required Elements

Element Values (type or enumerated) Default Value Allows
[ModName] Each conf file begins with [ModName], replacing "ModName" with a short well known abbreviation for the module (e.g., [KJV]). This must be first in the file. Valid characters for this abbreviation are limited to PCRE class [A-Za-z0-9_].[1]

The Abbreviation element is meant to allow for localization of this field.
The .conf file should be named the lowercase of this abbreviation followed by .conf. For example, [MyModule] would be mymodule.conf.

   
Abbreviation[2]

<string>
This field allows for the localization of the ModName. It is meant to be short just like the ModName.

Actually, this element is not required, but it makes the most sense to describe it here.

[ModName] Localization
Description

<string>
This is a short (1 line) title of the module.

  Localization
DataPath <relative system path pointing to the data files>

DataPath is the path to the module data files relative to the SWORD module library root directory. This path should start with "./modules". If the DataPath indicates a directory it should end with a '/'. Otherwise the module name is both the directory and the prefix for each file in that directory. Although DataPath can point to any folder or files under the root of the SWORD module library, the following conventions are recommended and must be used for modules wishing to be included in a CrossWire repository:

Paths used for a module named [MyModule], depending on
(a) the type of module (Bible text, commentary, lexicon or dictionary[3], general book) and
(b) the data driver (ModDrv parameter) are:

./modules/texts/rawtext/mymodule/
./modules/texts/rawtext4/mymodule/
./modules/texts/ztext/mymodule/
./modules/texts/ztext4/mymodule/
./modules/comments/zcom/mymodule/
./modules/comments/zcom4/mymodule/
./modules/comments/hrefcom/mymodule/
./modules/comments/rawcom/mymodule/
./modules/comments/rawcom4/mymodule/
./modules/comments/rawfiles/mymodule/
./modules/lexdict/zld/mymodule/mymodule
./modules/lexdict/rawld/mymodule/mymodule
./modules/lexdict/rawld/devotionals/mymodule/mymodule
./modules/lexdict/rawld/glossaries/mymodule/mymodule
./modules/lexdict/rawld4/mymodule/mymodule
./modules/genbook/rawgenbook/mymodule/mymodule
   
ModDrv

RawText (for uncompressed Bibles)
RawText4 (for uncompressed Bibles having entries greater than 64K bytes)[4]
zText (for compressed Bibles)
zText4 (for compressed Bibles having entries greater than 64K bytes)[4][5]
RawCom (for uncompressed Commentaries)
RawCom4 (for uncompressed Commentaries having entries greater than 64K bytes)
zCom (for compressed Commentaries)
zCom4 (for compressed Commentaries having entries greater than 64K bytes)[5]
HREFCom (each module entry must be only a URL to the body for the entry; experimental)
RawFiles (stores each entry in a simple text file in the datapath; recommended for Personal Commentary)
RawLD (for uncompressed Dictionaries)
RawLD4 (for uncompressed Dictionaries having entries greater than 64K bytes)
zLD (for compressed Dictionaries)
RawGenBook (for uncompressed tree keyed modules)

   
  1. That excludes the space and hyphen characters! An invalid ModName can cause some front-ends to crash.
  2. We strongly advise to avoid using an Abbreviation that's identical to the ModName or Abbreviation of any other module. It only leads to confusion, and may have unexpected consequences for some front-ends.
  3. Daily devotionals & glossaries go in subdirectories under lexdict. A glossary is between two languages.
  4. 4.0 4.1 e.g. If the Bible contains large introduction sections
  5. 5.0 5.1 zText4 & zCom4 modules require MinimumVersion=1.8 or later. Cite error: Invalid <ref> tag; name "sv" defined multiple times with different content

Required Elements with defaults

Element Values (type or enumerated) Default Value Allows
SourceType[1]

Plaintext
GBF (General Bible Format)
ThML (Theological Markup Language)
OSIS (Open Scriptural Information Standard)
TEI (Text Encoding Initiative)
Specifies the markup used in the module. The preferred markup is OSIS. TEI is preferred for dictionaries until OSIS supports dictionaries. While SourceType has a default, it is a best practice to specify it.
In SWORD, for modules encoded with ThML, OSIS or TEI, each verse, dictionary entry, and book division needs to be well-formed XML or it will result in display problems in some front-ends.

Plaintext  
Encoding

Latin-1† (Windows Codepage 1252 (cp1252))
UTF-8
UTF-16
SCSU (Standard Compression Scheme for Unicode)
Indicates how the text in the conf and in the module are encoded.

The preferred encoding of texts is UTF-8. Other than Hebrew, UTF-8 modules must be encoded with Normalization Form C (NFC). Biblical Hebrew requires special handling.[2] A few other languages may require special handling.[3][4]

To date, no modules use UTF-16 or SCSU.

Warning: "Latin-1" is an ambiguously used term. Latin-1 is regularly used as a synonym for ISO-8859-1. Here it means Windows Codepage 1252, a superset of ISO-8859-1. Front-end implementors should use "cp1252" or "windows1252" explicitly, not "Latin-1" provided by some programming language libraries.

Latin-1  
CompressType

ZIP
LZSS (Lempel Ziv Storer Szymanski)
BZIP2
XZ
Used for modules having a ModDrv of zText, zCom or zLD to indicate the compression algorithm. While CompressType has a default, it is best practice to specify it. ZIP is the preferred format.

LZSS  
BlockType

BOOK
CHAPTER
VERSE
Used for modules having a ModDrv of zText (Bibles) and zCom (Commentaries) to indicate how much of the work is compressed into a block. The trade off is size for speed, with BOOK taking the least overall space and the longest time and VERSE taking the greatest overall space and the least time. While BlockType has a default, it is a best practice to specify it. Most Bibles use BOOK and larger Commentaries use CHAPTER. To date, no module uses VERSE.

CHAPTER  
BlockCount

<integer>
Used for modules having a ModDrv of zLD to indicate the number of entries in a compressed block. Higher values will make the module slower, but smaller. It is best practice to take the default and not specify it.

200  
Versification

Catholic
Catholic2
German
KJV
KJVA
LXX
Leningrad
Luther
MT
NRSV
NRSVA
Orthodox
Synodal
SynodalProt
Vulg
Used to specify the versification employed by a Bible module. Refer to Alternate Versification.

KJV  
CipherKey

<string>
Indicates that a module is enciphered and that the module is (un)locked. When the key has no value ("CipherKey=") the module is locked. When it has a value, the module is unlocked.

A good key is something that is hard to guess. Typically in a format matching the pattern: /[0-9]{4}[A-Za-z]{4}[0-9]{4}[A-Za-z]{4}/. Internally the key can be any byte sequence from 1 to 255 bytes in length. But it needs to be readable, plain text, without leading or trailing spaces.

   
KeyType

TreeKey
VerseKey
Used for RawGenBook to indicate whether the module contains a book or a Bible. At this time VerseKey is not yet supported and is being developed as a solution for Bibles which do not conform to any supported versification system in SWORD. It is best practice to take the default and not specify it.

TreeKey  
CaseSensitiveKeys

Used for Dictionaries whose keys are case sensitive. This key is used to suppress normalization to UPPER CASE before comparison.
Only allowable value: true

false  
  1. Omitting this for a non-plaintext module has unpredictable effects.
  2. Unicode normalization can easily break Biblical Hebrew text. See on page 9 in the SBL Hebrew Font User Manual.
  3. e.g. If they are mentioned in Table 10 in the Corrigendum 5 Sequences.
  4. The improper normalization of exceptional codepoints can be prevented by inserting a Combining Grapheme Joiner.

Elements required for proper rendering

Element Values (type or enumerated) Default Value Allows
GlobalOptionFilter

GBFStrongs (For GBF texts having Strong's Numbers)[1]
GBFFootnotes (For GBF texts having footnotes)
GBFMorph (For GBF texts having morphology information)
GBFHeadings (For GBF texts having headings)
GBFRedLetterWords (For GBF texts marking the Words of Christ)[2]
ThMLStrongs (For THML texts having Strong's Numbers)[1]
ThMLFootnotes (For THML texts having footnotes)
ThMLScripref (For THML texts having cross references)
ThMLMorph (For THML texts having morphology information)
ThMLHeadings (For THML texts having headings)
ThMLVariants (For THML texts having variant readings)
ThMLLemma (For THML texts having lemmas)
UTF8Cantillation (For Hebrew texts having cantillation marks)[3]
UTF8GreekAccents (For Greek texts having accents)[4][5]
UTF8HebrewPoints (For Hebrew texts having vowel points)[6]
UTF8ArabicPoints (For Arabic texts having vowel points)[7]
OSISLemma (For OSIS texts having lemmas)[8]
OSISMorphSegmentation (For OSIS texts having morphological segmentation elements)[9]
OSISStrongs (For OSIS texts having Strong's Numbers)[1]
OSISFootnotes (For OSIS texts having informational notes)
OSISScripref (For OSIS texts having cross reference type notes)
OSISMorph (For OSIS texts having morphology information)
OSISHeadings (For OSIS texts having non-canonical headings)
OSISVariants (For OSIS texts having variant readings)
OSISRedLetterWords (For OSIS texts marking the Words of Christ)[2]
OSISGlosses (For OSIS texts with glosses)[10]
OSISRuby[11] (For OSIS texts with glosses)[12]
OSISXlit (For OSIS texts that include transliterated forms)[13]
OSISEnum (For OSIS texts with enumerated words)[14]
OSISReferenceLinks (For OSIS texts with glossary links)[15][16]
Each of these filters removes/hides the text's feature, when activated by the application.[17] These filters are applied in the order that they are listed in the conf. Some filters are dependent on each other for certain features - e.g. crossreferences in notes require both the OSISFootnotes and the OSISScriprefs filters enabled.

  Repeats
Direction

LtoR (Left to Right)
RtoL (Right to Left)
BiDi (Bidirectional)
Indicate whether the language's script is a left to right script or a right to left script.[18] Languages such as Hebrew, Arabic, Urdu, and Farsi have a right to left script. If the RtoL script is transliterated into a LtoR script, set the value to LtoR. If a module has both RtoL and LtoR text, then it is BiDi.

LtoR  
DisplayLevel <integer>

Used for General Book module types (these are keyed with a TreeKey table of contents). Indicates the preferred level from a leaf in the tree to display for context. e.g., 1 will only show the requested entry; 2 will show the entry, surrounded by all siblings, etc.

1  
Font <string>

Specify the font to be used for display of the module if it is available.[19] Omit this line to use the default font. Do not make use of font-specific encodings in your documents, but use Unicode instead and the Private Use Area if necessary for codepoints that are not handled by Unicode.

   
OSISqToTick (deprecated)[20] This attribute is deprecated in favor of the marker attribute on the q element. E.g.:
<q who="Jesus" marker="">....</q>

true/false
When set to false indicates that OSIS quote elements without a marker attribute are not to produce a quotation mark. This is useful for languages (e.g. Thai) and texts (e.g. KJV) that do not have quotation marks. It is also useful for modules that mark the "Words of Christ" on a verse by verse basis, when the quote spans more than one verse.

true  
Feature

StrongsNumbers (for modules that include Strong's numbers)
GreekDef (for dictionary modules with Strong's number encoded Greek definitions)
HebrewDef (for dictionary modules with Strong's number encoded Hebrew definitions)
GreekParse (for modules with Greek morphology expansions)
HebrewParse (for modules with Hebrew morphology expansions)
DailyDevotion (for daily devotionals using one of the LD drivers and keyed with MM.DD)
Glossary (for collections of glosses using one of the LD drivers)
Images (for modules that contain images of any type)
NoParagraphs (for modules without any paragraphing information, which are typically typeset with a verse per line[21])

  Repeats
GlossaryFrom <lang identifier>

Glossaries map one language to another. This value indicates the language being translated from. See Lang below for a discussion of valid values.

   
GlossaryTo <lang identifier>

Glossaries map one language to another. This value indicates the language being translated to. See Lang below for a discussion of valid values.

   
PreferredCSSXHTML <filename>

Names a file in the module's DataPath that should be referenced for the renderer as CSS display controls. Generality is advised: Use controls that are not specific to any particular rendering engine, e.g. WebKit.

   
  1. 1.0 1.1 1.2 See https://en.wikipedia.org/wiki/Strong%27s_Concordance#Strong.27s_numbers
  2. 2.0 2.1 See https://en.wikipedia.org/wiki/Red_letter_edition
  3. See https://en.wikipedia.org/wiki/Cantillation
  4. For detailed background, see https://en.wikipedia.org/wiki/Greek_diacritics
  5. This filter can have undesirable side-effects when applied to non-Greek text!
  6. See https://en.wikipedia.org/wiki/Niqqud
  7. See https://en.wikipedia.org/wiki/Arabic_diacritics
  8. Must precede OSISStrongs.
  9. Currently, only some JSword based front-ends seem to support this feature. The SWORD engine has the switch available, but no change in output is effected.
  10. Minimum SWORD version of 1.7.0 in the module .conf is required for OSISGlosses.
  11. See Ruby character and Furigana
  12. Deprecated in 1.7.0. Use OSISGlosses instead.
  13. The Samaritan Pentateuch module SP is an example of using xlit.
  14. The Samaritan Pentateuch module SP is an example of using enum.
  15. New in SWORD 1.7.0 - This filter requires six vertical bar-delimited fields, of which the following is an example.
    GlobalOptionFilter=OSISReferenceLinks|Reference Material Links|Hide or show links to study helps in the Biblical text.|x-glossary||On
    

    Here are the different field meanings:

    1. "OSISReferenceLinks" = option filter class name (option class name internal to the engine). Always the same for this kind of filter.
    2. "Reference Material Links" = Visible name of this OSISReferenceLinks filter. This is what the user will see in the Global Options toggle lists.
    3. "Hide or show..." = A readable user tip explaining what the filter does.
    4. "x-glossary" = Tells this OSISReferenceLinks filter to filter all references with type="x-glossary".
    5. (empty) = Tells this OSISReferenceLinks filter to also require that subType="something" in order to filter. Empty means ALL type="x-glossary" references will be filtered regardless of subType.
    6. "On" = Default filter toggle value ("On" or "Off")
  16. It is allowed to have multiple OSISReferenceLinks entries in a single conf file.
  17. It's not implied that every front-end supports all of the listed option filters.
  18. JSword validates the direction property against the Lang of the module.
  19. Specifying a font may not be sufficient for some modules. The required font features may depend on a particular smart font engine, which may not be compiled into the front-end application.
  20. For further details, refer to MOD-188 in CrossWire bugs.
  21. This feature is intended to be informational to front-end developers. Ideally, front-ends will render these modules with a verse per line rather than as a single big chapter-length paragraph block.

Optional elements to support particular features

CaseInsensitiveKeys

Intended for use with Lexicon/Dictionary & Glossary modules. This field will make the order of the keys based upon the mixed case keys, but the index is still sorted by byte order of those keys. There are some scripts that don’t have upper/lower case (e.g. Arabic) and some languages where a naïve toUpper() will result in the wrong character (e.g. Turkish/Azeri lowercase dotted i and capital dotted İ).

CaseInsensitiveKeys=true|false

It is fine to use toUpper() for internal normalization, but having keys in all caps when showing to a user is annoying. The problem is that the display order needs to follow something that makes sense to a user when the dictionary is presented as a list.

xulsword has a different solution involving a configuration item not yet used by SWORD master.

LangSortOrder=AaBbCcDdEe... 

This is used by xulsword to sort the keys of a dictionary/glossary in original alphabetical order. Here's an actual example for module TKLDICT which has Lang=tk-Latn (i.e. Türkmençe):

LangSortOrder=AaBbCcÇçDdEeÄ䯿FfGgHhIiJjKkLlMmNnŇňOoÖöPpQqRrSsŞşTtUuÜüVvWwXxYyÝýZzŽž

This method would need to be modified in order to support alphabets (such as Welsh) that include any digraphs.

StrongsPadding

At the heart of our lexicon/dictionary drivers, we have some old logic which tries to detect if a key value is a Strong's number, and if so, then pad it with leading zeros accordingly. To support this logic, the recognition has recently been added for an optional new .conf entry for lexicon/dictionary modules:

StrongsPadding=true|false

Notes:

  1. So as not to break everything, this currently defaults to true if it is not present in the lexdict module's .conf file
  2. It can be set to false if you are building a lexdict module which has entries which may be misconstrued as Strong's numbers.
  3. In a couple years, we'll probably switch the default to false, so it would be nice to add this line and set the value to true on modules which really do require the logic.
  4. This is only available in SWORD version 1.7 or later. JSword never had this problem.

Normalization

Currently, this optional key is a discussion proposal.

Background: "Unicode Normalization can break Biblical Hebrew."

Most modules with Unicode source text are encoded as UTF-8 and normalized to NFC, these being the default settings for both osis2mod & tei2mod.

Now these two module creation tools have a -N command line switch to prevent conversion to UTF-8 and Normalization.

Biblical Hebrew source text with both vowel accents and cantillation may be supplied properly with custom normalization as required by the text provider. It should still be encoded UTF-8.

As there is a need to create modules from source text that has such a custom ordering of the diacritics, it may be useful to provide information in the .conf file for such modules that are intentionally not normalized to NFC during build. The following method is proposed:

Normalization=Custom

It should be assumed that modules where this is specified are made using the -N switch in the module creation tool.

Normalization is useful to ensure (e.g.) that a search index stores all words the same way. That's why for the most part, modules are expected to be in NFC form. Custom normalization is still a normalization. What's different about it is that the combining classes for each character are different from the canonical combining classes defined by the Unicode Consortium.

To create a search index for such a module such that it does not automatically use NFC, give the mkfastmod command with the -N switch.[1]

--David Haslam (talk) 09:33, 7 January 2018 (MST)

  1. Added to SWORD SVN by DM Smith on 2018-01-07.

Strip Filters

SWORD has the concept of "filtering" a module's text at different processing points for purposes other than rendering.
One of these filter-points is for searching and we call these filters Strip Filters.

Strip Filters are typically named something like OSISPlain or GBFPlain, etc. These typically take all the markup out of an entry and prepare the text to be searched, but anything can be done to the text to prepare it further for searching. We typically remove accents and vowel points from Greek and Hebrew respectively.

Any Strip Filter can be added to a module by the module author with a line in the .conf file, such as:

LocalStripFilter=GBFPlain

If diacritics need to be removed from Arabic, then we can certainly add a filter for this as well. The conf line would be:

LocalStripFilter=UTF8ArabicPoints

Our current list of filters can be found by browsing the source folder here:

http://crosswire.org/svn/sword/trunk/src/modules/filters/

They're pretty concise and don't involve much knowledge from the rest of the engine, making them easy to write if we need a new one.

This processing can replace or be complimentary to any processing done by clucene. Here's an example of what's used with the Duke Databank of Papyri with specialist software that's based on SWORD.

LocalStripFilter=PapyriPlain

Since we need to strip markup, and other things clucene will likely never support (see PapyriPlain – annotations like [,],?{,}, underdot) we need this preprocessing mechanism to prepare the text before searching. We also maintain searching functionality apart from "fast indexed searching".[1]

Note:

  1. Currently supplied by clucene, but could be implemented by any other fast search framework that we might want to integrate in future.

General informatic and installer elements

Element Values (type or enumerated) Default Value Allows
About

<string>
A lengthier description and may include copyright, source, etc. information, possibly duplicating information in other elements.

  Continuation
RTF
Localization
SwordVersionDate

<yyyy-mm-dd> (ISO 8601 Date)
Indicates the date that the module was changed.

   
Version

<version string>
Gives the module's revision number. Incrementing it when changes are made alerts users of the SWORD Installers to the presence of updated modules. Please start with version 1.0 and increment by 0.1 for minor updates and by larger values for more major updates such as a new text source. Changes to this conf file should also increment the version number. Do not use non-numbers, such as 1.4a.

CrossWire's standard practice is to indicate updates that only require a .conf-file update/download by incrementing the third most significant number (the revision number). For example, if module version 1.2 requires a .conf-file update. A new .conf file with version number 1.2.1 could be released.

1.0  
History_x.x <string>

x.x is taken from the Version value.

Indicates what has changed between different versions. Each time a version is incremented a history line with that version number should explain the change.

It is recommended that each explanation be suffixed by the corresponding SwordVersionDate value.

  Repeats
Localization
MinimumVersion[1]

<version string>
Identifies the minimum version of the SWORD library required for this module.[2]

1.5.1a  
Category

This is used by installers to further categorize the modules beyond what can be figured out by the ModDrv and Feature.
Biblical Texts (for Bibles)
Commentaries
Lexicons / Dictionaries
Glossaries (for modules with Feature=Glossary)
Daily Devotional (for modules with Feature=DailyDevotion)
Generic Books (for anything else....)
Maps (for modules that primarily consist of maps)
Images (for modules that primarily consist of images)
Cults / Unorthodox / Questionable Material
Essays (for essays)[3]

Biblical Texts
  ModDrv=RawText or
  ModDrv=RawText4 or
  ModDrv=zText or
  ModDrv=zText4
Commentaries
  ModDrv=HRefCom or
  ModDrv=RawCom or
  ModDrv=RawCom4 or
  ModDrv=RawFiles or
  ModDrv=zCom or
  ModDrv=zCom4
Lexicons / Dictionaries
  ModDrv=RawLD or
  ModDrv=RawLD4 or
  ModDrv=zLD
Glossaries
  Feature=Glossary and
  ModDrv=RawLD or
  ModDrv=RawLD4 or
  ModDrv=zLD
Daily Devotional
  Feature=DailyDevotion and
  ModDrv=RawLD or
  ModDrv=RawLD4 or
  ModDrv=zLD
Generic Books
  ModDrv=RawGenBook

 
LCSH <tree/string>

Library of Congress Subject Heading. You may search the Library of Congress catalog or use it as a guide for determining an appropriate LCSH for books that are not in the Library of Congress.

   
Lang

<Language[-Script]?[-Region]?>
The language identifier is a combination of sub-tags for Language and optionally Script, and/or Region, according to BCP 47 and RFC 4647. Private use extensions defined by BCP 47 (e.g. x-, qaa, and Qaaa) should be avoided wherever possible.

Language sub-tag (Regex: /[a-z]{2,3}/):
This is the primary language code of the module according to ISO 639 parts 1, 2, 3 and 5. Some languages have several codes. Use the following to determine the best choice:

When available use a 2-letter ISO 639-1 code (registrar), (e.g. en for English).
If there is none for the given language, use an ISO 639-2/T code (registrar) (e.g. ceb for Cebuano).
Failing that, use a ISO 639-3 code (registrar), which covers over 7000 languages.
Finally, use a ISO 639-5 code (registrar) for macro languages.

The ISO639-3 registrar page gives up-to-date table on all of the above.

Script sub-tag (Regex: /[A-Z][a-z]{3}/):
If a text is script-specific, such as a Latin vs. Cyrillic Serbian Bible or a Bible transliterated into other than its native script, include the ISO 15924 script code (registrar) after the language code (e.g. sr-Latn for Latin script Serbian, sr-Cyrl for Cyrillic script Serbian).

Region sub-tag: (Regex: /[A-Z]{2}/)
If a text is region (country)-specific, such as the Anglicized NIV, include the ISO 3166-1 region code (registrar) after the script code (or language code if no country code is present) (e.g. en-GB for UK English).

Combinations(Regex: /[a-z]{2,3}(-[A-Z][a-z]{3})?(-[A-Z]{2})?/):
Individual sub-tags (language, script, and region) are always separated by a hyphen. Identifiers should be as basic and succinct as possible. A script should not be specified for a language written in its expected script, unless the language has multiple common scripts (as in the case of Serbian above). A region should not be specified unless a text should be categorized separately from others texts in that language that do not specify a region.

en  
InstallSize

<integer>
Indicates the total byte size of the module on disk, excluding the the size of any Lucene index files.

For modules in the CrossWire repositories, this is automatically generated and overwritten if needed.

   
Obsoletes

<ModName>
Each instance of this element gives a former ModName that is made obsolete by this module.

  Repeats
OSISVersion

<version string>
Identifies the OSIS schema version employed in the OSIS source document. The current version is 2.1.1

It is recommended that this be present for every OSIS module.

 
Companion[4]

<ModName[, ModName]*>
Specifies companion module(s) that should be opened together
e.g. When Bible and Commentary and/or Glossary modules are distributed together.

 

Note:

  1. See http://tracker.crosswire.org/browse/API-201
  2. Required to support a Bible/Commentary module that has an Alternate Versification.
  3. Essays is handled as a subset of Generic Books.
  4. Many (xulsword compatible) modules in the IBT Repository make use of this field. See also https://github.com/johnaustindev/osis-converters

Copyright & Licensing related elements

Element Values (type or enumerated) Default Value Allows
Copyright

<string>
Contains the copyright notice for the work, including the year of copyright and the owner of the copyright.

  Continuation
Localization
CopyrightHolder

<string>
Contains the name of the copyright holder.

  Localization
CopyrightDate

<yyyy> (ISO 8601 Year)

  Localization
CopyrightNotes

<string>

  Continuation
Localization
CopyrightContactName

<string>
Contains the name of the copyright holder.

  Continuation
Localization
CopyrightContactNotes

<string>

  Continuation
Localization
CopyrightContactAddress

<string>
Contains the mailing address of the copyright holder.

  Continuation
Localization
CopyrightContactEmail

<string>
Contains the email address of the copyright holder, preferably in the form:
name at xyz dot com

  Localization
ShortPromo

<string>
A link to the home page for the module, perhaps with an encouragement to visit the site.

  HTML Link
Localization
ShortCopyright

<string>

  Localization
DistributionLicense

Public Domain
Copyrighted
Copyrighted; Permission to distribute granted to CrossWire[1]
Copyrighted; Free non-commercial distribution
Copyrighted; Freely distributable
Copyrighted; Permission granted to distribute non-commercially in SWORD format
GFDL
GPL
Creative Commons: by-nc-nd
Creative Commons: by-nc-sa
Creative Commons: by-nc
Creative Commons: by-nd
Creative Commons: by-sa
Creative Commons: by
Creative Commons: CC0

Use one of these strings verbatim. The actual copyright and/or license information is held in other elements. The last seven[2] licenses are Creative Commons licenses.

   
DistributionNotes

<string>
Indicates any additional notes about distribution of the module.

  Continuation
Localization
TextSource

<string>
Indicates, either in prose (such as "CCEL") or as a URL of the source of the text

  Continuation
UnlockURL

<string>
Contains the URL (a bare URL, not a HTML <a> link) of a web page for unlocking instructions/payment

  URL

Note:

  1. Modules in other repositories may have a different organization name instead of CrossWire.
  2. Each link goes to a page that no longer exists!

Uniqueness

For comparing two versions of a module during module development, the module names and locations must be unique. For JSword based front-ends such as Bible Desktop, there is a further requirement, the Description items must be different.

Analysis Tools

  • DMSmith has created a script to analyse conf files and report anomalies.
  • David Haslam has created a User Defined Language called CONF as a Syntax Highlighter for Notepad++ (Windows). Download from [1].

Automated generation

  • For new module submissions to CrossWire, Refdoc now maintains a script called confmaker that includes the automated generation of module conf files, given the minimum non-automatable requirements by the module submitter.