Difference between revisions of "File Formats"

From CrossWire Bible Society
Jump to: navigation, search
(Conversion Tools)
m (The SWORD Project Utilities: <BR>)
 
(16 intermediate revisions by the same user not shown)
Line 8: Line 8:
 
Our module file format is proprietary in the sense that we see no need to document it and certainly no need to stick to it. We change it when we need to. We therefore do not encourage direct interaction with it, but firmly recommend use of the API (either C++ or Java). This is the place where we seek stability and consistency.
 
Our module file format is proprietary in the sense that we see no need to document it and certainly no need to stick to it. We change it when we need to. We therefore do not encourage direct interaction with it, but firmly recommend use of the API (either C++ or Java). This is the place where we seek stability and consistency.
  
The SWORD Project supports currently and actively the following markup for module creation: OSIS, TEI, ThML and plain text.
+
The SWORD Project supports currently and actively the following markup for module creation: OSIS, [https://tei-c.org/ TEI], ThML and plain text.
  
 
==The SWORD Project Utilities==
 
==The SWORD Project Utilities==
Precompiled versions of many of these programs are available in most Linux distributions, using the distribution's package installer. For Windows, they can be found [http://crosswire.org/ftpmirror/pub/sword/utils/win32 here].<ref>If you have Xiphos installed in Windows, the Sword utilities are available in the Xiphos\bin folder.</ref><ref>The latest binaries may be found [http://dl.thehellings.com/sword-utils/ here], though currently without cipherraw.exe (Link broken!)</ref>
+
Precompiled versions of many of these programs are available in most '''Linux''' distributions, using the distribution's package installer.<BR>For '''Windows''', they can be found [https://github.com/devroles/mingw_sword_package here].<ref>If you have '''Xiphos''' installed in Windows, the Sword utilities are available in the Xiphos\bin folder.</ref><ref>The latest binaries may be found [https://github.com/devroles/mingw_sword_package/releases/tag/1.9.0a here], though currently without cipherraw.exe</ref>
  
 
===Module Creation Tools===
 
===Module Creation Tools===
 
It is recommended that Unicode text files used for module creation be [[Encoding|encoded]] as UTF-8.<ref>[http://en.wikipedia.org/wiki/Newline EOLs] should be either Unix style (LF) or Windows style (CRLF). Text files with Mac style EOLs (CR) may give rise to errors or other unexpected behaviour.</ref>
 
It is recommended that Unicode text files used for module creation be [[Encoding|encoded]] as UTF-8.<ref>[http://en.wikipedia.org/wiki/Newline EOLs] should be either Unix style (LF) or Windows style (CRLF). Text files with Mac style EOLs (CR) may give rise to errors or other unexpected behaviour.</ref>
* imp2gbs - imports free-form General books in IMP format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* imp2gbs &ndash; imports free-form General books in IMP format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
* imp2ld - imports lexicons, dictionaries, and daily devotionals in IMP format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* imp2ld &ndash; imports lexicons, dictionaries, and daily devotionals in IMP format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
* imp2vs - imports Bibles and commentaries in IMP format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* imp2vs &ndash; imports Bibles and commentaries in IMP format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
* vpl2mod - imports Bibles and commentaries in Verse-Per-Line format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* vpl2mod &ndash; imports Bibles and commentaries in Verse-Per-Line format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
* [[osis2mod]] - imports Bibles and commentaries in OSIS format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* [[osis2mod]] &ndash; imports Bibles and commentaries in OSIS format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
* xml2gbs - imports free-form General books in OSIS or ThML format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* tei2mod &ndash; imports lexicons, dictionaries in TEI format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* xml2gbs &ndash; imports free-form General books in OSIS or ThML format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
  
 
===Diagnostic Tools===
 
===Diagnostic Tools===
* mod2imp - creates an IMP file<ref>The IMP file may contain a residue of XML markup</ref> from an installed module [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* mod2imp &ndash; creates an IMP file<ref>The IMP file may contain a residue of XML markup</ref> from an installed module [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
* emptyvss - exports a list of verses missing from the module (useful for testing modules during development) [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* emptyvss &ndash; exports a list of verses missing from the module (useful for testing modules during development) [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
  
 
===Legacy format conversion Tools===
 
===Legacy format conversion Tools===
* gbf2osis.pl - a PERL utility for converting GBF to OSIS [http://crosswire.org/ftpmirror/pub/sword/utils/perl/gbf2osis.pl &dagger;]
+
* gbf2osis.pl &ndash; a PERL utility for converting GBF to OSIS [http://crosswire.org/ftpmirror/pub/sword/utils/perl/gbf2osis.pl &dagger;]
* step2vpl - export a STEP book in Verse-Per-Line (VPL) format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* step2vpl &ndash; export a STEP book in Verse-Per-Line (VPL) format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 
* [[DevTools:Misc#thml2osis|thml2osis]] - converts ThML to OSIS format.
 
* [[DevTools:Misc#thml2osis|thml2osis]] - converts ThML to OSIS format.
  
 
===OSIS Utilities===
 
===OSIS Utilities===
* vs2osisref - returns the osisRef of a given (text form) verse reference [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* vs2osisref &ndash; returns the osisRef of a given (text form) verse reference [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
* xml2gbs - imports free-form General books in OSIS or ThML format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* xml2gbs &ndash; imports free-form General books in OSIS or ThML format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
  
 
===Miscellaneous===
 
===Miscellaneous===
* cipherraw - used to encipher SWORD modules [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* cipherraw &ndash; used to encipher SWORD modules [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
* [[Frontends:Diatheke|diatheke]] - a basic CLI SWORD front-end [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* [[Frontends:Diatheke|diatheke]] &ndash; a basic CLI SWORD front-end [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
* [[mkfastmod]] - creates a search index for a module<ref>Aside: To create a list of installed modules with descriptions, enter the following command, optionally redirecting stderr to a log file.<pre>mkfastmod /? 2>mkfastmod.log</pre></ref> [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* [[mkfastmod]] &ndash; creates a search index for a module<ref>Aside: To create a list of installed modules with descriptions, enter the following command, optionally redirecting stderr to a log file.<pre>mkfastmod /? 2>mkfastmod.log</pre></ref> [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
* [[mod2zmod]] - creates a compressed module from an installed module [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
+
* [[mod2zmod]] &ndash; creates a compressed module from an installed module [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 
==== Notes on SWORD Tools ====
 
==== Notes on SWORD Tools ====
  
Line 45: Line 46:
  
 
===Recommended Non-SWORD Utilities===
 
===Recommended Non-SWORD Utilities===
* uconv - a utility from [http://icu-project.org/ ICU] for converting between various character encodings, perform normalization, transliterate texts, etc. (It's similar to iconv, but much, much more powerful.) uconv.exe is part of the [http://crosswire.org/ftpmirror/pub/sword/utils/win32 sword utilities]
+
* uconv &ndash; a utility from [http://icu-project.org/ ICU] for converting between various character encodings, perform normalization, transliterate texts, etc. (It's similar to iconv, but much, much more powerful.) uconv.exe is part of the [http://crosswire.org/ftpmirror/pub/sword/utils/win32 sword utilities]
* xmllint - a utility (part of the [http://xmlsoft.org/ libxml2] distribution) for validating XML documents [http://crosswire.org/ftpmirror/pub/sword/utils/win32 *]
+
* xmllint &ndash; a utility (part of the [http://xmlsoft.org/ libxml2] distribution) for validating XML documents [http://crosswire.org/ftpmirror/pub/sword/utils/win32 *]
  
 
==Formats for which CrossWire maintains converters==
 
==Formats for which CrossWire maintains converters==
Line 54: Line 55:
 
[http://paratext.org/usfm ''Unified Standard Format Markers'']
 
[http://paratext.org/usfm ''Unified Standard Format Markers'']
  
This plain-text format is a common internal-use format within Bible translation agencies and Bible societies. It is the native format of [http://paratext.org/ ParaTExt]. Paratext is used by more than 60% of all Bible translators world-wide. The current release is Paratext 7.5.
+
This plain-text format is a common internal-use format within Bible translation agencies and Bible societies. It is the native format of [http://paratext.org/ ParaTExt]. Paratext is used by more than 60% of all Bible translators world-wide. The current release is [https://pt8.paratext.org/ ParaTExt 8.0].
 +
 
 +
Though '''USFM 2.4''' suffices for most Bibles, [https://ubsicap.github.io/usfm/ USFM 3.0] is now available and has several new features. The standard is open source and is maintained at [https://github.com/ubsicap/usfm ubsicap/usfm].
  
 
CrossWire now has a Python script called usfm2osis.py<ref>This replaces our earlier Perl script [http://crosswire.org/ftpmirror/pub/sword/utils/perl/usfm2osis.pl usfm2osis.pl].</ref> which converts USFM to OSIS for subsequent import to SWORD's native format. See [[Converting SFM Bibles to OSIS]].
 
CrossWire now has a Python script called usfm2osis.py<ref>This replaces our earlier Perl script [http://crosswire.org/ftpmirror/pub/sword/utils/perl/usfm2osis.pl usfm2osis.pl].</ref> which converts USFM to OSIS for subsequent import to SWORD's native format. See [[Converting SFM Bibles to OSIS]].
Line 64: Line 67:
 
'''Note:'''
 
'''Note:'''
 
<references />
 
<references />
 
===Go Bible===
 
Following an agreement made in July 2008 with the program's author Jolon Faichney, [[Projects:Go Bible|Go Bible]] was adopted by CrossWire as its Java ME software project.
 
 
To achieve the navigation speed and general ease of use on even the simplest of Java mobile phones, Go Bible data is fully indexed, as well as being compressed (as are all JAR files).  The format is described in [http://code.google.com/p/gobible/wiki/GoBibleDataFormat Go Bible data format]. Go Bible data is structured as Book | Chapter | Verse text and does not support notes, headings and cross-references, etc. The developer kit [http://gobible.jolon.org/developer/welcome.html Go Bible Creator] can take either USFM, ThML or OSIS as the source text format, but they usually have to be made specially suitable. For example, OSIS files produced by Snowfall Software's SFMToOSIS script are not structured the same. Work has begun to make an [http://en.wikipedia.org/wiki/XSL_Transformations XSLT] script to convert such OSIS XML files to the format suitable for Go Bible. [[Projects:Go Bible/Go Bible Creator|Go Bible Creator]] version 2.3.2 and onwards can take a folder of USFM files as the source text format.
 
 
Go Bible source code is now available [https://crosswire.org/svn/gobible/ here] on the CrossWire Repository. ''To access this you will need to have an account''.
 
 
GoBibleDataFormat is being extended in the [[Projects:Go Bible/SymScroll|SymScroll]] branch.
 
  
 
==Other Utilities==
 
==Other Utilities==
Line 86: Line 80:
  
 
== See also ==
 
== See also ==
 +
* [[DevTools:IMP Format|IMP Format]] &ndash; general import format used for various module types
 +
* [[DevTools:GBF|General Bible Format (GBF)]] &ndash; legacy format now deprecated
 +
* [[DevTools:ThML|Theological Markup Language (ThML)]] &ndash; legacy format now deprecated
 
* [[Frontends:Bookmarks Standard]]
 
* [[Frontends:Bookmarks Standard]]
 
* [[File Formats Cruft]]
 
* [[File Formats Cruft]]

Latest revision as of 12:11, 20 February 2021

This page lists some of the more common file formats relevant to The SWORD Project, associated utilities, and other CrossWire projects.

CrossWire Bible Society respects copyright. As such, conversion of material that is under copyright without permission from the copyright holders is not supported by The SWORD Project.

SWORD modules

Other than the source code for the SWORD API, there is no documentation for the file format of a SWORD module. The intention is that the SWORD API (or the JSword implementation) is used directly or via other language bindings.

Our module file format is proprietary in the sense that we see no need to document it and certainly no need to stick to it. We change it when we need to. We therefore do not encourage direct interaction with it, but firmly recommend use of the API (either C++ or Java). This is the place where we seek stability and consistency.

The SWORD Project supports currently and actively the following markup for module creation: OSIS, TEI, ThML and plain text.

The SWORD Project Utilities

Precompiled versions of many of these programs are available in most Linux distributions, using the distribution's package installer.
For Windows, they can be found here.[1][2]

Module Creation Tools

It is recommended that Unicode text files used for module creation be encoded as UTF-8.[3]

  • imp2gbs – imports free-form General books in IMP format to SWORD format
  • imp2ld – imports lexicons, dictionaries, and daily devotionals in IMP format to SWORD format
  • imp2vs – imports Bibles and commentaries in IMP format to SWORD format
  • vpl2mod – imports Bibles and commentaries in Verse-Per-Line format to SWORD format
  • osis2mod – imports Bibles and commentaries in OSIS format to SWORD format
  • tei2mod – imports lexicons, dictionaries in TEI format to SWORD format
  • xml2gbs – imports free-form General books in OSIS or ThML format to SWORD format

Diagnostic Tools

  • mod2imp – creates an IMP file[4] from an installed module
  • emptyvss – exports a list of verses missing from the module (useful for testing modules during development)

Legacy format conversion Tools

  • gbf2osis.pl – a PERL utility for converting GBF to OSIS
  • step2vpl – export a STEP book in Verse-Per-Line (VPL) format
  • thml2osis - converts ThML to OSIS format.

OSIS Utilities

  • vs2osisref – returns the osisRef of a given (text form) verse reference
  • xml2gbs – imports free-form General books in OSIS or ThML format to SWORD format

Miscellaneous

  • cipherraw – used to encipher SWORD modules
  • diatheke – a basic CLI SWORD front-end
  • mkfastmod – creates a search index for a module[5]
  • mod2zmod – creates a compressed module from an installed module

Notes on SWORD Tools

  1. If you have Xiphos installed in Windows, the Sword utilities are available in the Xiphos\bin folder.
  2. The latest binaries may be found here, though currently without cipherraw.exe
  3. EOLs should be either Unix style (LF) or Windows style (CRLF). Text files with Mac style EOLs (CR) may give rise to errors or other unexpected behaviour.
  4. The IMP file may contain a residue of XML markup
  5. Aside: To create a list of installed modules with descriptions, enter the following command, optionally redirecting stderr to a log file.
    mkfastmod /? 2>mkfastmod.log

Recommended Non-SWORD Utilities

  • uconv – a utility from ICU for converting between various character encodings, perform normalization, transliterate texts, etc. (It's similar to iconv, but much, much more powerful.) uconv.exe is part of the sword utilities
  • xmllint – a utility (part of the libxml2 distribution) for validating XML documents *

Formats for which CrossWire maintains converters

The SWORD Project uses primary source e-texts. These texts come in numerous formats. CrossWire maintains converters for a number of formats, described below. The converters may target other markup formats, e.g. TEI or OSIS, or may simply export binary data to text, as is the case with our STEP exporter. Specific discussion of each of the available converters is found elsewhere on this page.

USFM

Unified Standard Format Markers

This plain-text format is a common internal-use format within Bible translation agencies and Bible societies. It is the native format of ParaTExt. Paratext is used by more than 60% of all Bible translators world-wide. The current release is ParaTExt 8.0.

Though USFM 2.4 suffices for most Bibles, USFM 3.0 is now available and has several new features. The standard is open source and is maintained at ubsicap/usfm.

CrossWire now has a Python script called usfm2osis.py[1] which converts USFM to OSIS for subsequent import to SWORD's native format. See Converting SFM Bibles to OSIS.

USFM uses a separate file for each Bible book. USFM is also supported by the open-source program called Bibledit. There are examples of Bibles in USFM format available for download at [1]. These include the KJV, ASV, and WEB Bibles.

USFM is one of the formats that can be used by Go Bible Creator.

Note:

  1. This replaces our earlier Perl script usfm2osis.pl.

Other Utilities

These are not part of The SWORD Project, but may be useful. A link is given for each.

Go Bible utilities

  • Go Bible Creator - a Java SE program for converting either ThML or OSIS or USFM to Go Bible. It is being enhanced by SIL to be capable of converting source text in XHTML-TE format.
  • Go Bible Creator USFM Preprocessor – This is a tool to parse through and identify, correct and publish USFM file formats into a file format that can easily be put into the Go Bible mobile phone program.

ThML Utilities

  • CCEL Desktop - a program for viewing and developing CCEL books [2]

See also