Difference between revisions of "DevTools:Modules"

From CrossWire Bible Society
Jump to: navigation, search
(Module Development Overview)
Line 7: Line 7:
 
#Import the source text using the appropriate tool.
 
#Import the source text using the appropriate tool.
 
#Create a .conf file.
 
#Create a .conf file.
#Check to see that the module displays correctly in several of the SWORD front-end applications.
+
#Install and test that the module displays correctly in several of the SWORD front-end applications.
 
#Submit your module to CrossWire for distribution.
 
#Submit your module to CrossWire for distribution.
  
 
=Creating a module=
 
=Creating a module=
The SWORD Project currently requires that all submitted texts be Unicode (specifically UTF-8) encoded documents. We recommend that texts be marked up in OSIS or TEI, but will still accept texts based on CCEL documents that are marked up in ThML.
+
==Collect and Install Software Tools==
  
==Preparing a Text for Import==
+
==Obtain Source Text and Permission to Distribute==
 +
 
 +
 
 +
==Prepare the Source Text for Import==
 +
 
 +
Note that the SWORD Project requires all submitted texts to be Unicode (UTF-8) encoded documents. We recommend that texts be marked up in OSIS or TEI, but will still accept texts based on CCEL documents that are marked up in ThML.
 
   
 
   
 
===Encoding===
 
===Encoding===
Line 83: Line 88:
 
  Section 2 of Chapter 1 of Book 1 of the War of the Jews
 
  Section 2 of Chapter 1 of Book 1 of the War of the Jews
  
==Importing==
+
==Validate the Source Text==
 +
 
 +
==Import the Source Text==
 
Now that your text is ready to be imported, you will need to use one of the command line utilities for converting documents to SWORD format. Depending on the format of your document at this point, you will need to use the appropriate importer.
 
Now that your text is ready to be imported, you will need to use one of the command line utilities for converting documents to SWORD format. Depending on the format of your document at this point, you will need to use the appropriate importer.
 
*If your text is a valid ThML document, use xml2gbs.
 
*If your text is a valid ThML document, use xml2gbs.
Line 95: Line 102:
  
 
You may find these files in the SWORD Project source distribution or compiled for Win32 at http://crosswire.org/ftpmirror/pub/sword/utils/win32/. Each utility has brief usage information that can be viewed by running it once without any arguments.
 
You may find these files in the SWORD Project source distribution or compiled for Win32 at http://crosswire.org/ftpmirror/pub/sword/utils/win32/. Each utility has brief usage information that can be viewed by running it once without any arguments.
 
==.conf Files==
 
In order to test and before submitting a new module, you need to create a .conf file, which tells Sword how to recognize and what to do with your module. Instructions for creating a .conf file are on the [[DevTools:confFiles]] page.
 
 
==Additional Utilities==
 
There are additional utilities that may be used on SWORD modules:
 
  
 
===Compressing Modules===
 
===Compressing Modules===
Line 112: Line 113:
 
To lock a rawText Bible or rawCom commentary module, use the cipherraw utility. Just run:
 
To lock a rawText Bible or rawCom commentary module, use the cipherraw utility. Just run:
 
  cipherraw </path/to/module> '<key>'
 
  cipherraw </path/to/module> '<key>'
 +
 +
===Miscellaneous tools===
 +
Further miscellaneous tools that are 'not ready for public consumption' but may be useful to modules authors are found in [[DevTools:Misc]].  These includes scripts and programs that are used for the preparation and conversion of various specific modules.
 +
 +
==Create the .conf File==
 +
In order to test and before submitting a new module, you need to create a .conf file, which tells Sword how to recognize and what to do with your module. Instructions for creating a .conf file are on the [[DevTools:confFiles]] page.
 +
 +
==Install and Test the Module==
  
 
===Checking for Missing Verses===
 
===Checking for Missing Verses===
Line 118: Line 127:
 
on an installed module to generate a list.
 
on an installed module to generate a list.
  
===Miscellaneous tools===
+
==Submit the Module to the SWORD Project for Distribution==
Further miscellaneous tools that are 'not ready for public consumption' but may be useful to modules authors are found in [[DevTools:Misc]].  These includes scripts and programs that are used for the preparation and conversion of various specific modules.
 
 
 
==Submitting content to the SWORD Project==
 
 
After you have tested your module, you may wish to submit it to the SWORD Project for public release so that other people can benefit from your work. All modules submitted to the SWORD Project for distribution either on the internet or on CDs should include both the module as a single document and the .conf file.
 
After you have tested your module, you may wish to submit it to the SWORD Project for public release so that other people can benefit from your work. All modules submitted to the SWORD Project for distribution either on the internet or on CDs should include both the module as a single document and the .conf file.
  

Revision as of 22:12, 27 November 2008

Module Development Overview

If you want to learn how to create a SWORD module, this is the place to start. Here is a brief overview of the process:

  1. Collect and install the necessary software tools.
  2. Obtain the source text and permission from the copyright holder if you wish to distribute copyrighted module.
  3. Prepare the source text for import.
  4. Use an XML validator to check that your source file is properly constructed.
  5. Import the source text using the appropriate tool.
  6. Create a .conf file.
  7. Install and test that the module displays correctly in several of the SWORD front-end applications.
  8. Submit your module to CrossWire for distribution.

Creating a module

Collect and Install Software Tools

Obtain Source Text and Permission to Distribute

Prepare the Source Text for Import

Note that the SWORD Project requires all submitted texts to be Unicode (UTF-8) encoded documents. We recommend that texts be marked up in OSIS or TEI, but will still accept texts based on CCEL documents that are marked up in ThML.

Encoding

As mentioned above in the conf's Encoding directive, SWORD modules can be encoded either in Windows Codepage 1252 (cp1252) (a superset of ISO 8859-1) or in UTF-8. See Encoding for a complete explanation and definition.

For English language texts that only make use of ASCII characters, no change to the source encoding will be required. For other European language and most other languages, there probably exist simple encoding converters for ISO and national standards to UTF-8. For more complex source encodings, you may need to create your own converter or adapt an existing one. Some currently available conversion tools that you may find useful, depending on your platform and needs, include:

uconv (part of ICU), available compiled for Win32 at http://crosswire.org/ftpmirror/pub/sword/utils/win32/uconv.zip or in source format from ICU at http://www.icu-project.org/.

font2uni from CCEL, available at http://www.ccel.org/info/gkheb/.

uconv is best suited for standard encodings and font2uni is best suited for font-specific encodings. When creating XML texts, the only entities that should be used are &amp; for '&' and &lt; for '<'. All other entities should be encoded as their UTF-8 equivalents.

Markup

(see also Various Tools)

Internally, SWORD can process text in one of four formats: OSIS, TEI, ThML, and GBF. From these formats, it can convert to other formats, including RTF and HTML, for display. OSIS 2.1 is now the preferred format for Bibles and commentaries. At the moment OSIS does not have thorough support for complex dictionaries. For that reason we support TEI for dictionaries.

You may find documentation for each of these standards at their respective websites:

In SWORD, for modules encoded with ThML and OSIS, each verse, dictionary entry, and book division needs to be well-formed XML or it will result in display problems in some frontends. SWORD only handles the subset of the ThML tags that we have found necessary, but we are willing to supporting additional tags, as the need arises.

Use of ThML for Sword is deprecated. Supported ThML tags include: <sync> (with type parameters of Strongs, morph, & lemma), <scripRef>, and <note> (plus closing tags where appropriate). HTML tags that ThML inherits, which may be used in SWORD modules include <div> (with types of sechead for section headings and title for titles, <i>, <br>, and <b>. Additional HTML tags may be interpreted by those SWORD frontends that render HTML, but will not be translated to RTF for the Win32 frontend. Do not submit untidy HTML and label it ThML--it's rude and lazy.

GBF is deprecated and no GBF modules will be accepted by the SWORD Project. Supported GBF tags include: <WG>, <WH>, <WTG>, <WTH>, <RX>, <RF>, <FI>, <FB>, <FN>, <FR>, <FS>, <FU>, <FO>, <FV>, <CA>, <CL>, <CG>, <CM>, <CT>, <JR>, <JC>, <JL>, <TT>, and <TS> (plus closing tags where appropriate). In addition, SWORD allows full use of UTF-8 rather than merely ASCII as the GBF standard specifies.

Import formats

ThML and OSIS Formatted General Books

With ThML and OSIS formatted general books, provided your document is well-formed and valid XML according to the ThML DTD or the OSIS 2.1 Schema, you should not need to do any further processing. You can use your XML file with xml2gbs. For OSIS encoded Bibles use osis2mod.

vpl Format

vpl or verse-per-line format may only be used in creating Bibles. This format requires that each line start with a verse reference that SWORD can understand, such as "Genesis 1:1" or "Jn 3:16". Most English abbreviations are acceptable. Following the verse reference, the verse itself should be written, in any kind of markup. For example:

Genesis 1:1 In the beginning God created the heaven and the earth.
Genesis 1:2 And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.

This format is used with the utility vpl2mod, discussed below. To import Bibles that have have combined verses, you will need to use imp format, instead of vpl.

imp Format

The imp or import format is the most versatile of the import formats and may be used in creating all types of modules (Bibles, commentaries, dictionaries, daily devotionals, glossaries, general books, etc.) in any supported format (GBF, ThML, OSIS or TEI). Each entry in an imp file may take as many lines as are needed. The first line of the entry will have a format such as "$$$<key>" and will be followed by all lines of text that should be included with that entry. So our above example in imp format would be written as:

$$$Genesis 1:1
In the beginning God created the heaven and the earth.
$$$Genesis 1:2
And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.

Commentaries would follow the same format, but would probably include a greater number of lines of text. If your Bible or commentary uses a single entry to handle multiple verses, simply give a list or range of verses as the key (e.g. "$$$Genesis 1:1-5", "$$$Exodus 1", "$$$Leviticus 1:1,5"). Lexicons, dictionaries, glossaries and daily devotionals would take a form such as:

$$$Adam
Adam was the first man created by God.
$$$Eve
Eve was the first woman created by God.

For daily devotionals, you must encode the key as "$$$mm.dd", such as "$$$01.01" for January 1st and "$$$12.31" for December 31st.

General books are encoded with each book division as a separate entry. The entries are then listed as a tree hierarchy with keys similar to a file system directory structure. For example, if you were encoding the Josephus' Works, you might have a structure like this:

$$$/War
The War of the Jews
$$$/War/Book 1
Book 1 of the War of the Jews
$$$/War/Book 1/Chapter 1
Chapter 1 of Book 1 of the War of the Jews
$$$/War/Book 1/Chapter 1/Section 1
Section 1 of Chapter 1 of Book 1 of the War of the Jews
$$$/War/Book 1/Chapter 1/Section 2
Section 2 of Chapter 1 of Book 1 of the War of the Jews

Validate the Source Text

Import the Source Text

Now that your text is ready to be imported, you will need to use one of the command line utilities for converting documents to SWORD format. Depending on the format of your document at this point, you will need to use the appropriate importer.

  • If your text is a valid ThML document, use xml2gbs.
  • If your text is a valid OSIS Bible, use osis2mod.
  • If your text is a valid OSIS Commentary, use osis2mod.
  • If your text is a valid OSIS document, use xml2gbs.
  • If your text is a vpl format Bible, use vpl2mod.
  • If your text is an imp format Bible or commentary, use imp2vs.
  • If your text is an imp format dictionary, lexicon, glossary, or daily devotional, use imp2ld.
  • If your text is an imp format general book, use imp2gbs.

You may find these files in the SWORD Project source distribution or compiled for Win32 at http://crosswire.org/ftpmirror/pub/sword/utils/win32/. Each utility has brief usage information that can be viewed by running it once without any arguments.

Compressing Modules

To compress a Bible, commentary, or LD module, use the mod2zmod utility. First you will need to install the module so that it can be accessed using the SWORD engine. Next, run:

mod2zmod <modname> <datapath> [blockType [compressType]]

blockType can be 4 = book (default), 3 = chapter, or 1 = verse and indicates the granularity of the compression blocks. The larger the block is, the longer it will take to access a piece of the text, but the smaller the resulting module will be. compressType can be either 1 = LZSS (default) or 2 = Zip.

You may wish to try different compression settings to find out which is best for your module. Typically, we use chapter compression for large commentaries, book compression for Bibles, and the Zip compression algorithm.

Locking Modules

To lock a rawText Bible or rawCom commentary module, use the cipherraw utility. Just run:

cipherraw </path/to/module> '<key>'

Miscellaneous tools

Further miscellaneous tools that are 'not ready for public consumption' but may be useful to modules authors are found in DevTools:Misc. These includes scripts and programs that are used for the preparation and conversion of various specific modules.

Create the .conf File

In order to test and before submitting a new module, you need to create a .conf file, which tells Sword how to recognize and what to do with your module. Instructions for creating a .conf file are on the DevTools:confFiles page.

Install and Test the Module

Checking for Missing Verses

You can use the utility emptyvss to find verses in a module that contain no text, since this may indicate errors in the module. Just run:

emptyvss <module name>

on an installed module to generate a list.

Submit the Module to the SWORD Project for Distribution

After you have tested your module, you may wish to submit it to the SWORD Project for public release so that other people can benefit from your work. All modules submitted to the SWORD Project for distribution either on the internet or on CDs should include both the module as a single document and the .conf file.

The module itself should be an uncompiled, plain text document in either vpl (verse-per-line), imp (import), ThML, OSIS or TEI format, ready to be run through one of the import tools.

Before any module will be considered for posting, we expect that the following minimum set of tags be included in its .conf file: DataPath, ModDrv, Lang, Description, About, DistributionLicense, and TextSource. We also strongly prefer that an LCSH line be included with the .conf file, but will look the LCSH up ourselves if you have trouble deciding on a value. (You can look at other .conf files for examples.)

When you feel your module is ready to be submitted, you may email it to modules@crosswire.org. If you are unable to email it or would prefer to send the files by some other means, you may contact us at the same email address, and we can discuss other arrangements.

Related Pages