Projects:Go Bible/Go Bible Creator

From CrossWire Bible Society
Revision as of 10:58, 1 July 2013 by David Haslam (talk | contribs) (USFM Source Text Parsing: added details)

Jump to: navigation, search

Introduction

Go Bible Creator is a software tool for converting Bible texts into Go Bible Java ME applications that can be loaded onto suitable mobile phones.

Go Bible Creator is a Java application and requires Java SE (Runtime Environment). It can be used on all platforms that support Java SE.

Download Go Bible Creator

Go Bible Creator version 2.4.4 was released as a maintenance fix on 2012-12-14.

The developer kit can be downloaded from either of the following locations:

Release notes (change log) is contained within the ZIP file. Earlier releases are also available in David's Box account.

Before running Go Bible Creator

Before Go Bible Creator can be used you will need to obtain or create suitable source text containing the Bible translation or Bible version.

The developer must ensure that permission has been obtained in writing for using any copyrighted material as source text.

Source text file formats

Go Bible Creator can now handle source text in each of the following file formats:

  • OSIS – a single XML file using OSIS format
  • ThML – a single XML file using ThML format
  • USFM – multiple text files using USFM format
  • XHTML_TE – a single XHTML file using an extended HTML format for the SIL FieldWorks Translation Editor

Notes:

  1. All the input files must be Unicode text encoded as UTF-8 (without BOM).
  2. For use with Go Bible Creator, an OSIS file must be of the container type, not the milestone type.
  3. For further details of OSIS requirements, see Jolon's OSIS page.
  4. For further details of ThML requirements, see Jolon's ThML page.
  5. USFM denotes Unified Standard Format Markers. It is the native file format for UBS Paratext.
  6. Outside SIL, support for XHTML_TE is currently available only in the Symmetrical Scrolling branch.
  7. A sample ThML file for the book of Hebrews was provided in earlier releases, available from Jolon's GoBibleCreator page.
    Further ThML files may be downloaded from CCEL.
  8. Red letter markup is not yet supported for OSIS format.

Preprocessing

It is often the case that some preprocessing will be required to ensure that the source text is compatible for use with Go Bible Creator. It should not be assumed that source text files obtained directly from third parties will meet the exacting requirements of Go Bible Creator. Software tools exist to help with preprocessing.

Running Go Bible Creator

GoBibleCreator is run with the following command line and by default would be run from the directory containing the GoBibleCreator.jar file:

java -Xmx128m -jar GoBibleCreator.jar [PathToCollectionsFile | PathToXMLfile]+

Note: The parameter options are depicted using Extended Backus–Naur Form.

Enhanced techniques

It is possible to specify the location for the GoBibleCreator.jar file as follows, in this example by means of a suitably defined Windows environment variable:

java -Xmx128m -jar %GBC_PATH%\GoBibleCreator.jar ...

Windows users can redirect the generated messages by sending standard output and error output to suitably named log files:

java -Xmx128m -jar %GBC_PATH%\GoBibleCreator.jar %1 1>.\java.log 2>.\error.log

In the above example, it is assumed that %1 is a passed parameter. This technique can be readily adapted by Unix users.

Users can also test for success or failure by examining the errorlevel value. Here is what I have in my Windows command file called MakeGoBible.cmd :

REM Using a defined environment variable and command line parameter
java -Xmx128m -jar %GBC_PATH%\GoBibleCreator.jar %1 1>.\java.log 2>.\error.log
if not errorlevel 1 goto end 
start .\error.log
start .\java.log
:end
rem pause

To use this technique with multiple files, simply replace the command line parameter %1 by %1 %2 %3 %4 %5 %6 %7 %8 %9

Or make a more complex script (i.e. using shift) to capture the outputs after each file is processed.

Tip: For Windows users, it is best to avoid using paths containing a space, which gets confused as a parameter delimiter.

Required parameters

For Java
-xmx128m
Go Bible Creator loads the entire source text file[s] into memory.
Earlier versions of Java required the -xmx128m parameter to increase the total memory available in Java to 128MB.
If you are using the most recent version of Java SE, this option is no longer required.
-jar
This tells Java to process the GoBibleCreator.jar file as a jar file.
For GoBibleCreator
PathToCollectionsFile
Go Bible Creator can optionally process multiple files at once by specifying several PathToCollectionsFiles on the command line.
PathToXMLfile
This is a alternative use of GoBibleCreator described further below. This secondary use appertains only to XML source files in either OSIS or ThML formats.
Invoking GoBibleCreator with an XML file as the specified parameter does not generate any JAD/JAR files, only a new collections.txt file prefilled with extracted book names. To produce the JAD/JAR files you would need to run GoBibleCreator again but this time with the collections.txt file as the only parameter. However, you would normally need to manually edit the Collections.txt file before doing that.

Optional parameters

There are two optional parameters for GoBibleCreator:

-d PathToMainSourceTextDirectory
Specifies a source text directory where all Source-Text folders will be relative to. See below for more details on the Source-Text property in the "collections text" file.
-u
Update option: Using this option Go Bible Creator will update existing JAR files if they exist rather than create new ones. The source text will not be parsed nor will new Bible data be generated however everything else will be updated including the GoBibleCore, book names, UI properties, JAD file, manifest, etc. This option is useful if the source text hasn't changed and there hasn't been any change to the Go Bible Data Format, as it will be much quicker to process each collection. This option first makes a backup copy of the original JAD and JAR files. Caution: Running the update option a second time can overwrite these backups!

Collections text files

To produce the JAD and JAR files run GoBibleCreator with a suitable "collections text" file:

java -Xmx128m -jar GoBibleCreator.jar hebrews/collections.txt

This example uses the file collections.txt (for KJV Hebrews) included with earlier versions of GoBibleCreator (up to version 2.2.6).

Typically, the developer will need to carefully construct a "collections text" file, in order to obtain all the required properties and features of the created Go Bible application[s].

GoBibleCreator will generate JAD and JAR files and place them in the same directory as the specified collections text file.

Notes

  1. Use a suitable Unicode text editor to edit the collections text file.
  2. Any suitable filename may be chosen for a collections text file; it does not have to be called collections.txt
  3. Collections text files may optionally contain blank lines and remark lines. Remark lines must begin with // or with REM
  4. Collections text files must be Unicode text files that should be encoded as UTF-8 without BOM.
  5. If the BOM is included inadvertently, the first line will not be correctly processed (unless it happens to be a blank line).
  6. Collections text files may specify more than one collection of books. This is useful when the developer wishes to provide split collections in addition to one for a complete Bible.
  7. Take particular care when the collections text file needs to include text in a language that reads right to left.
  8. The format and contents of a collections text file is described further below.

Contents

The following sections describe the required and optional properties that can be specified within a collections text file.

The Info property

The Info property line contains the text that is displayed in the Go Bible menu About option.

This where to record all the information needed to describe the Bible version or Bible translation, copyright information, distribution license and conditions, end user terms and conditions, and any other relevant information about the particular Go Bible applications that will be created.

Example:

Info: The Holy Bible: King James Version (including the Apocrypha). This Java ME application was made by David Haslam using Go Bible Creator

I have kept this example reasonably short, only so that horizontal scrolling of this wiki page is avoided. There is no size restriction on the length of text.

Notes:

  1. Go Bible software now automatically appends the Ver. number (of Go Bible Creator), followed by the Viewer Version number.
  2. The display is not formated by the About option – it appears as a (scrollable or scrolling) single paragraph on the mobile phone.
  3. For Bible translations, it is sensible to include in the info line a translation of the English text in the same language as the Bible.
  4. Tip: State clearly the distribution terms and conditions, where applicable.
  5. Tip: Include a URL where your Go Bible application is available for download.
Source text and its properties

Source text for use with Go Bible Creator should not have any missing chapters, duplicate verses, split verses, missing verses, or paragraphs of text tagged using a verse range. When this is not the case, some preprocessing of the source text would normally be necessary. If this requirement is ignored, the resulting Go Bible applications may contain misnumbered verses (or chapters).

Source-Text

This property specifies the source text file name (or folder name in the case of USFM) with a path relative to the directory where the collections text file is stored.

Example 1:

Source-Text: KJV.thml

This assumes the ThML file is in the same directory as the collections text file.

Example 2:

Source-Text: .\USFM

USFM is the name of a folder containing the set of USFM files, each book being a separate file.

Source-Format

The format must now be specified using the following property:

Source-Format: [ OSIS | ThML | USFM | XHTML_TE ]

Example:

Source-Format: ThML 

Omitting this line will result in a message being sent to standard output, but no JAR or JAD files are created. The argument for this property is not case-sensitive.

Source-FileExtension

This property is required when the source format is USFM. Specify the USFM source file extension - do not include the '.' prefix.

Example:

Source-FileExtension: ptx
USFM-TitleTag

This property is optional when the source format is USFM. Specify the tag used within the source files that correlates to the books listed for each and every collection.

USFM-TitleTag: \h

Notes:

  1. The default is the header tag \h
  2. For this purpose other useful tags are \id and \mt (if used)
  3. The book names must match the text after each header tag (\h) or the text after the specified USFM-TitleTag property.
  4. Dirk's preprocessing tool can be used to extract book names from USFM files.
  5. Make sure that no punctuation marks or accents are used (e.g. !?%,'`^, etc.).
  6. Such characters may not display properly on some cell phones and could cause other problems.
  7. If necessary, lightly edit (a separate copy of) the source text files to delete these characters, or to replace them with a space or dash.
USFM-ParseConfig
New in version 2.4.5

This property is optional when the source format is USFM. Specify the file that controls how USFM files are to be parsed.

USFM-ParseConfig: USFMSettings.txt

Notes:

  1. In the absence of this property in the collections file, the default file will be used if
    (a) it's called USFMSettings.txt,
    (b) it's in the same folder as GoBibleCreator.jar, and
    (c) it contains one or more USFM tag lists that are active (not commented out).
  2. Within the new release, the Reference folder contains a file USFMSettings.txt in which all these tag lists have been commented out.
  3. This file is self-documenting, such that an experienced user would have sufficient guidance as to how to make use of this new feature.
  4. The path to this file may be specified if it's not in the same folder as GoBibleCreator.jar
RedLettering

This property is optional when the source format is USFM. Specify whether red lettering for words of Jesus is to be displayed.

RedLettering: false

Notes:

  1. Argument syntax: (true|false). The default is 'true'.
  2. Red lettering is also supported for ThML format, but this property has not yet been extended to apply to ThML.
  3. Red lettering is not yet supported by Go Bible Creator for source formats OSIS or XHTML_TE.
Collections of books

A collection is a list of books under the Collection property. Each book in a collection is listed by the Book property. Go Bible Creator makes a JAR and JAD file for each collection.

Example:

Collection: KJV

Book: Genesis
Book: Exodus
Book: Leviticus
Book: Numbers
Book: Deuteronomy
...
...
Book: Jude
Book: Revelation

Notes:

  1. A collection could be as small as a single book (or even part of a book) or as large as a complete Bible.
  2. The book names must match those within the source text file or files.
  3. A collections text file may contain more than one collection of books to be made from the same source text.
  4. Go Bible Creator is "canon-neutral". A minor exception in Go Bible relates to how to select the scope for the search feature.
  5. The collection name is converted to the filename for the generated JAR and JAD files.
  6. GoBibleCreator first removes any spaces from the resulting filename.
  7. It is unnecessary to include the application name (Go Bible) as part of the collection name.
  8. It is advisable not to use any symbol characters in collection names. They may cause problems for some phone models.
Mapping book names

Many source text files use English book names even though the text is in the native language. Go Bible Creator supports changing the book names using the Book-Name-Map property. Specify this property for every book whose (English name) must be mapped to another name, for example:

Book-Name-Map: Hebrews, The Book of Hebrews
Collection: KJV Hebrews
Book: Hebrews

Notes:

  1. In the first line above, the first string "Hebrews" is the name of the book as it appears in the input file, while the second string "The Book of Hebrews" is what will be displayed on the mobile phone.
  2. The Book-Name-Map property does not change the usage of book names in the rest of the collections text file, they must still specify the book names used in the source text file. The position of book name mapping is not important. The following would work just as well.
Collection: KJV Hebrews
Book: Hebrews

Book-Name-Map: Hebrews, The Book of Hebrews

To map all the book names for the whole Bible, it is usually more legible for all these lines to be grouped together in one section of the collections text file.

This feature may be used to cover various situations:

  • Book names for (non-English) Bible translations (e.g. Book-Name-Map: Genesis, Genèse )
  • Shorter book names than used in the source text (e.g. Book-Name-Map: Acts of the Apostles, Acts )
  • Longer book names than used in the source text (e.g. Book-Name-Map: GEN, Genesis )
  • Change Roman numerals in book names to ordinary numerals (e.g. Book-Name-Map: III John, 3 John )
  • Case changes (e.g. Book-Name-Map: LUKE, Luke )

i.e. The book names in the source text may simply need mapping to book names that are more suitable for use in the Go Bible application.

Language codes

Since Go Bible supports multiple translations it is possible that a user may wish to have both the KJV and Chinese translations on their phone at the same time. If the collection names happen to be the same for both translations then the phone won't allow both to be stored simultaneously. A simple solution to this problem is to use the optional Language-Code property, which causes the language code be joined to the name of each following collection.

Example: (using the code for English)

Language-Code: en

By default a language code is appended to the collection name. It is permissible to explicitly specify where the language code is joined, by using an optional property argument. Syntax:

Language-Code: en,[prefix|suffix]

Example:

Language-Code: en,prefix

If there are multiple collections specified, one could even specify a different language code and position before each collection, even though this would normally make little sense.

Empty-Verse-Text

This optional property permits users to specify what should be inserted when an empty verse is detected in the source text. The default is nothing at all.

Example:

Empty-Verse-Text: [...]

This applies only to empty verses, not to missing verses.

Translating the User Interface

This is described in Projects:Go Bible/UI Translation. See also the lower half of Jolon's Go Bible Creator page.

Right Aligned Text

Some languages such as Arabic, Farsi and Hebrew are read from right to left. This also requires text to be aligned to the right rather than the left. Go Bible (by default) aligns text to the left. The alignment can be changed using the Align property. Example:

Align: Right
Splitting books
This option is rarely used now.

Some older phones had JAR filesize limits that prevented some of the larger books from fitting on the phone. Example: Nokia Series 40 phones (MIDP 1.0) had a 64KB JAR limit. This prevented some larger books (such as Psalms) from loading into such phones. Go Bible Creator allows books to split up by indicating which chapters will appear in each collection. The following example creates two collections, the first contains the first six chapters of Hebrews and the second contains the remaining seven chapters:

Source-Text: ../Hebrews.thml
Source-Format: ThML
Info: This collection contains the book of Hebrews split in two.
Collection: KJV Hebrews 1-6
Book: Hebrews, 1, 6
Collection: KJV Hebrews 7-13
Book: Hebrews, 7, 13

When splitting books in this way, the commas (,) are required as delimiters. The first number indicates the first chapter to include and the second number indicates the last chapter to include. Specifying chapters numbers that are out of bounds, or in the wrong order would cause an error message. Here we changed the collection names simply to indicate to the user which chapters are in each collection. Even so, any collection name may be used.

Support for MIDP 1.0
This option is no longer supported.

By default, Go Bible Creator produces MIDP 2.0 compatible JAD and JAR files. These files will not run on MIDP 1.0 devices. Earlier versions of Go Bible Creator supported an optional MIDP property. The generated files for MIDP 1.0 had less functionality than the MIDP 2.0 applications, such as no support for full-screen viewing or "Send verse by SMS". Most modern phones now support MIDP 2.0 or later.

As from version 2.4.0, we have withdrawn support for making MIDP 1.0 compatible applications. The syntax as used for earlier software versions is described in Jolon's old page.

Currently, Go Bible applications do not make use of any of the extra features defined in the MIDP 2.1 and MIDP 3.0 specifications.

Rebranding

This is a set of optional properties that facilitate the rebranding or customization of the Go Bible application:

Phone-Icon-Filepath

The path to a new 20x20 png icon if you wish to replace the default cross icon

Phone-Icon-Filepath: new_logo.png
Application-Name

Customize the application name that will show up in the phone's menu - the default is 'Go Bible'

Application-Name: My GB Reader
MIDlet-Description
This standard property for JAD files is not yet implemented.
MIDlet-Description: A brief description of the Go Bible application
MIDlet-Vendor

Customize the name of the vendor that appears in the JAD file. The default is still 'Jolon Faichney'.

MIDlet-Vendor: My GB Vendor
MIDlet-Info-URL

Customize the URL that appears in the JAD file to indicate where to obtain further information about the MIDlet.

MIDlet-Info-URL: http://www.mygbdomain.org/gbreader/

The default is http://wap.jolon.org which is an index to the welcome pages for his earlier Go Bible builds.

Creating files for WAP download

Go Bible is most commonly downloaded to a PC and then copied from PC to phone. However, Java applications can also be downloaded directly to cell phones from a WAP server. This method is less commonly used, as it is often more expensive and sometimes difficult to access. To ease the process of creating files for WAP download the optional Wap-site property can be used in the collections text file.

Example: (Syntax only – this WAP server is currently not operational.)

Wap-site: http://wap.crosswire.org/gobible/en/kjv/

If the Wap-site property is declared then a separate wap subdirectory is generated containing JAD and JAR files ready for uploading to a WAP server. In addition, a Welcome.html file is generated containing links to all of the JAD files. The Wap-site property should contain the full URL to the directory that will contain the JAD files. This URL is used within the Welcome.html file as well as for setting the MIDlet-Jar-URL property in each of the JAD files.

Notes:

  1. The integrity and completeness of JAD files can be checked online at Handylearn Jad Checker.
  2. JAD files generated by Go Bible Creator are slightly deficient according to this checker. The errors & omissions can be rectified by manual editing.
  3. There is a bug in Go Bible Creator 2.4.1 such that the file Welcome.html is not generated. See [1].
  4. See also this ten-minute guide to setting up a WAP site.
  5. The Firefox add-on wmlbrowser can be useful for debugging. It simulates WAP browsing by viewing WML (Wireless Markup Language) pages.
Codepage

Go Bible Creator 2.3.x and later can process USFM files even if they are encoded using a Windows code page. In order to use this feature, the optional property Codepage should be specified.

Example: (Greek)

Codepage: Cp1253

Notes:

  1. This property applies only for USFM as the source text.
  2. It is much preferred to preprocess your source text files to UTF-8, otherwise you may get "Couldn't find book:" errors and other problems.
  3. If this is omitted, it is assumed that the source files are by default encoded as UTF-8. Despite the default, it is also allowed to specify
Codepage: UTF-8
Versification

Go Bible is essentially canon-neutral. This means that Go Bible collections are not tied to a fixed versification scheme. Go Bible Creator can readily make Bibles that have an Alternate Versification. The only feature that is tied to the book order of the Protestant Canon is the search feature. i.e. Predefined search scopes only apply when there are exactly 66 books specified in the collection.

At the same time, it should be clearly understood that special provision must be made for source texts in which there are missing chapters or verses, verse ranges, split verses, or verses out of numerical order. These situations typically require some preprocessing methods before using Go Bible Creator.

USFM Source Text Parsing

Up to version 2.4.5, Go Bible Creator parses USFM source text with the following provisos:

  1. It assumes that there are
    (a) no missing verses,
    (b) no verse ranges,
    (c) no verses out of order, or
    (d) no two verses in the same chapter with the same verse number.
  2. It also assumes that there are
    (a) no missing chapters,
    (b) every book starts at chapter 1, and
    (c) every chapter starts at verse 1
  3. Generally, it parses the tags as defined in USFM Reference version 2.35
  4. It does not handle the new 'nested tag' syntax (using /+) as defined in USFM Reference 2.4
  5. It assumes that end tags for tag pairs will always be present.
  6. Outside of footnotes and cross-references, it requires that end tags for character level tag pairs will always be present.
  7. As from version 2.4.5, tags are parsed as specified in the ancillary file USFMSettings.txt (or the differently named equivalent).

After running Go Bible Creator

It should not be assumed that an absence of error message means that the generated Go Bible applications are without error due to either text or markup issues.

Testing Go Bible applications

Go Bible applications should be thoroughly tested before being distributed. There are two main ways to do this:

Using a Java ME emulator

In order to test full functionality (including touch screen emulation), then the emulator to use would be the one that comes as part of Oracle's Java Platform Micro Edition Software Development Kit 3.0 for Windows.

I generally test Go Bible applications using the emulator from mpowerplayer[1]. This is available as both a standalone application and as a webstart application. The standalone edition comes free with the mplayit developer kit, but it is a self-contained JAR application, independent of the rest of the SDK.

  1. This now redirects to apptap.com, which may indicate that there has been an acquisition. The direct link to the SDK still works as of 2012-01-31.
Using suitable mobile phones

The Unicode font coverage available in an emulator is generally broader than that in many mobile phones. It is therefore always sensible to test Go Bible applications that use other alphabets on several of the most common mobile phones marketed in the region where the Bible translation language is mostly spoken.

Limitations

Go Bible applications made using Go Bible Creator have some technical limitations. These include:

  1. Go Bible MIDlets are not digitally signed.
  2. Some optional MIDP 2.0 Reserved Attribute Names are not used. i.e.
MIDlet-Description
MIDlet-Install-Notify
MIDlet-Delete-Notify
MIDlet-Delete-Confirm
MIDlet-Push-<n>
MIDlet-Data-Size
MIDlet-Permissions
MIDlet-Permissions-Opt

Help

The main channel for user help with Go Bible Creator is still the Go Bible Forum.
Please register and sign in to make best use of the forum.