Difference between revisions of "File Formats"

From CrossWire Bible Society
Jump to: navigation, search
m (USFM: removed "Register.html" from Paratext link)
m (USFM: https://paratext.org/download/ Paratext 9.4)
 
(281 intermediate revisions by 8 users not shown)
Line 1: Line 1:
The SWORD Project respects [[copyright]].  As such, conversion of material that is under copyright is not supported by The SWORD Project.
+
This page lists some of the more common file formats ''relevant'' to The SWORD Project, associated utilities, and other CrossWire projects.
  
This page merely lists some of the more common file formats relevant to The SWORD Project and associated utilities.
+
CrossWire Bible Society respects [[copyright]].  As such, conversion of material that is under copyright without permission from the copyright holders is not supported by The SWORD Project.
  
==File Formats==
+
== SWORD modules ==
Bible study programs use a plethora of markup formats. Even more have been suggested for use in creating Bibles and other religious material. This subsection describes some of the most common of those formats.
+
Other than the source code for the SWORD API, there is no documentation for the file format of a '''SWORD module'''. The intention is that the [[DevTools:SWORD|SWORD API]] (or the [[DevTools:JSword|JSword]] implementation) is used directly or via other language bindings.
  
===GBF===
+
Our module file format is proprietary in the sense that we see no need to document it and certainly no need to stick to it. We change it when we need to. We therefore do not encourage direct interaction with it, but firmly recommend use of the API (either C++ or Java). This is the place where we seek stability and consistency.
''General Bible Format''
 
  
This markup format is intended as an aid to preparing Bible texts (specifically the WEB and WEB:ME) for use with various Bible search programs. The complete specification is at http://www.ebible.org/bible/gbf.htm.  
+
The SWORD Project supports currently and actively the following markup for module creation: OSIS, [https://tei-c.org/ TEI], ThML and plain text.
  
This markup format was previously used for some SWORD modules but is now [http://en.wikipedia.org/wiki/Deprecation deprecated] in favor of OSIS. The rudimentary [http://crosswire.org/ftpmirror/pub/sword/utils/perl/gbf2osis.pl gbf2osis.pl] utility may be used to convert GBF to OSIS for import to SWORD's native format.
+
==The SWORD Project Utilities==
 +
Precompiled versions of many of these programs are available in most '''Linux''' distributions, using the distribution's package installer.<BR>For '''Windows''', they can be found [https://github.com/devroles/mingw_sword_package here].<ref>If you have '''Xiphos''' installed in Windows, the Sword utilities are available in the Xiphos\bin folder.</ref><ref>The latest binaries may be found [https://github.com/devroles/mingw_sword_package/releases/tag/1.9.0a here], though currently without cipherraw.exe</ref>
  
===HTML===
+
===Module Creation Tools===
''Hyper Text Markup Language''
+
It is recommended that Unicode text files used for module creation be [[Encoding|encoded]] as UTF-8.<ref>[http://en.wikipedia.org/wiki/Newline EOLs] should be either Unix style (LF) or Windows style (CRLF). Text files with Mac style EOLs (CR) may give rise to errors or other unexpected behaviour.</ref>
 +
* imp2gbs &ndash; imports free-form General books in IMP format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* imp2ld &ndash; imports lexicons, dictionaries, and daily devotionals in IMP format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* imp2vs &ndash; imports Bibles and commentaries in IMP format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* vpl2mod &ndash; imports Bibles and commentaries in Verse-Per-Line format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* [[osis2mod]] &ndash; imports Bibles and commentaries in OSIS format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* tei2mod &ndash; imports lexicons, dictionaries in TEI format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* xml2gbs &ndash; imports free-form General books in OSIS or ThML format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
  
This is the basic markup language of the World Wide Web. Some SWORD front-ends, such as [http://www.bibletime.info/ BibleTime], [http://gnomesword.sourceforge.net/ GnomeSword], and [http://www.crosswire.org/bibledesktop/ Bible Desktop], use HTML for presentation.
+
===Diagnostic Tools===
 +
* mod2imp &ndash; creates an IMP file<ref>The IMP file may contain a residue of XML markup</ref> from an installed module [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* emptyvss &ndash; exports a list of verses missing from the module (useful for testing modules during development) [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
  
===IMP===
+
===Legacy format conversion Tools===
''Import Format''
+
* gbf2osis.pl &ndash; a PERL utility for converting GBF to OSIS [http://crosswire.org/ftpmirror/pub/sword/utils/perl/gbf2osis.pl &dagger;]
 +
* step2vpl &ndash; export a STEP book in Verse-Per-Line (VPL) format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* [[DevTools:Misc#thml2osis|thml2osis]] - converts ThML to OSIS format.
  
This proprietary file format is used by SWORD for import of all types of modules. The three utilities [http://crosswire.org/ftpmirror/pub/sword/utils/win32/imp2vs.exe imp2vs] (for Bibles and verse-indexed commentaries), [http://crosswire.org/ftpmirror/pub/sword/utils/win32/imp2ld.exe imp2ld] (for lexicons, dictionaries, and daily-devotionals), and [http://crosswire.org/ftpmirror/pub/sword/utils/win32/imp2gbs.exe imp2gbs] (for all other types of books) can be used to import IMP files to SWORD's native formats.
+
===OSIS Utilities===
 
+
* vs2osisref &ndash; returns the osisRef of a given (text form) verse reference [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
An IMP file consists of any number of entries. Each entry consists of a key line and any number of content lines. The key line consists of a line beginning with "$$$". For example, "$$$Gen 1:1" would be the key line for the Genesis 1:1 entry of a Bible or commentary module.
+
* xml2gbs &ndash; imports free-form General books in OSIS or ThML format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 
 
The content lines of an entry may consist of any text (provided that the first three characters of the line are not "$$$"). The internal markup of the content may be in any format supported by SWORD, namely OSIS for any module type or ThML for freeform books from CCEL.
 
 
 
===LitML===
 
''Liturgical Markup Language''
 
 
 
This markup format is a descendant of, and complement to ThML, described [http://hildormen.org/docs/LitML/Guidelines-LitML10-1.0.html here].
 
 
 
The markup reflects its orientation towards liturgy and hymns.
 
 
 
===OSIS===
 
''Open Scripture Information Standard''
 
 
 
The Open Scripture Information Standard (OSIS) is "a common format for many visions." It is an XML format for marking up scripture and related text, part of an initiative composed of translators, publishers, scholars, software manufacturers, and technical experts, coordinated by the Bible Technologies Group. It is co-sponsored by the American Bible Society and the Society of Biblical Literature.
 
 
 
The most recent XML schema is [http://www.bibletechnologies.net/osisCore.2.1.1.xsd OSIS 2.1.1], and a manual is also [http://www.bibletechnologies.net/20Manual.dsp available].
 
 
 
This markup format is recommended by the CrossWire Bible Society and can be used for creating all types of resources for The SWORD Project. Support for OSIS is actively maintained and support for any unsupported elements or features needed for a module you may be working on may be requested.
 
 
 
===PDF===
 
''Portable Document Format''
 
 
 
This is an ISO track file format for platform independent rendering of documents. It is derived from Postscript and is maintained by Adobe. Documents may be text, images, or scanned images of text. Even textual documents cannot reasonably be expected to allow plain-text export. As such, it is designed to be a "read only" format.
 
 
 
===RTF===
 
''Rich Text Format''
 
 
 
This is a markup format designed by Microsoft. It is used as the markup language for presentation The SWORD Project for Windows. It is also the internal markup format used within STEP books (see below). The format is of limited use as an archival format and there are no plans for SWORD to support it beyond its current use for presentation. On Windows systems, RTF files can be saved as Unicode files using the Wordpad program, the resulting text file being encoded as UTF-16 with BOM.
 
 
 
===LaTeX===
 
 
 
[http://en.wikipedia.org/wiki/LaTeX LaTeX] is a document markup language and document preparation system for the TeX typesetting program. Some third party source texts for Bible related content made available in PDF format may have been typeset using LaTeX. Sometimes it may be worthwhile asking the owner if the source text might be made available in LaTeX format, especially if there is no other alternative suitable as a starting point for conversion towards making a SWORD module. There are currently no plans for SWORD to support it.
 
 
 
The [http://www.myanmarbible.com/bible/ Myanmar Bible Society] has a utility called bibleTec2osis.pl for converting from TeX into OSIS.
 
 
 
===STEP===
 
''Standard Template for Electronic Publishing''
 
 
 
This file format was formerly used by [http://www.quickverse.com/ QuickVerse] and [http://www.wordsearchbible.com/ WORDsearch], and is currently used for some [http://www.e-sword.net/ e-Sword] books.
 
 
 
While not an open standard, the publicly released documentation and specifications for this format can be found partially mirrored at
 
http://www.crosswire.org/bsisg/. Some utilities for working with this format are listed below. It is unlikely that the SWORD Project will support this format in the future as it is largely dead.
 
 
 
===ThML===
 
''Theological Markup Language''
 
 
 
This format is a variant of XML based on TEI and ThML, developed by and for the [http://www.ccel.org/ Christian Classics Ethereal Library]. The specifications for this markup format are available at http://www.ccel.org/ThML/.
 
 
 
This markup format is used in some SWORD resources, but only the creation of free-form "General book" modules based on existing CCEL resources is currently supported. Other works and new works should be created using the OSIS format.
 
  
===Unbound Bible Format===
+
===Miscellaneous===
''Unbound Bible Format''
+
* cipherraw &ndash; used to encipher SWORD modules [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* [[Frontends:Diatheke|diatheke]] &ndash; a basic CLI SWORD front-end [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* [[mkfastmod]] &ndash; creates a search index for a module<ref>Aside: To create a list of installed modules with descriptions, enter the following command, optionally redirecting stderr to a log file.<pre>mkfastmod /? 2>mkfastmod.log</pre></ref> [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* [[mod2zmod]] &ndash; creates a compressed module from an installed module [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
==== Notes on SWORD Tools ====
  
The [http://unbound.biola.edu/ BIOLA's Unbound Bible] offers many of their resources for download in a proprietary, but relatively simple [http://en.wikipedia.org/wiki/Tab_delimited tab-delimited] plain-text format (TDT). There are usually two variants, one with versification mapping to the [http://en.wikipedia.org/wiki/American_Standard_Version ASV], and the other without verse mapping.
+
<references />
  
There is no widespread use of this format, but the rudimentary [http://crosswire.org/ftpmirror/pub/sword/utils/perl/unb2osis.pl unb2osis.pl] utility may be used to convert Unbound Bible format to OSIS for import to SWORD's native format.
+
===Recommended Non-SWORD Utilities===
 +
* uconv &ndash; a utility from [http://icu-project.org/ ICU] for converting between various character encodings, perform normalization, transliterate texts, etc. (It's similar to iconv, but much, much more powerful.) uconv.exe is part of the [http://crosswire.org/ftpmirror/pub/sword/utils/win32 sword utilities]
 +
* xmllint &ndash; a utility (part of the [http://xmlsoft.org/ libxml2] distribution) for validating XML documents [http://crosswire.org/ftpmirror/pub/sword/utils/win32 *]
  
It is a relatively simple task to create a script or filter to convert TDT format to [http://en.wikipedia.org/wiki/Comma-separated_values CSV] format and/or ''vice versa''.
+
==Formats for which CrossWire maintains converters==
 +
The SWORD Project uses primary source e-texts. These texts come in numerous formats. CrossWire maintains converters for a number of formats, described below. The converters may target other markup formats, e.g. TEI or OSIS, or may simply export binary data to text, as is the case with our STEP exporter. Specific discussion of each of the available converters is found elsewhere on this page.
  
 
===USFM===
 
===USFM===
[http://confluence.ubs-icap.org/display/USFM/Home ''Unified Standard Format Markers'']
+
[http://paratext.org/usfm ''Unified Standard Format Markers'']
  
This plain-text format is a common internal-use format within Bible translation agencies and Bible societies. It is the native format of [http://paratext.ubs-translations.org/ Paratext]. The rudimentary [http://crosswire.org/ftpmirror/pub/sword/utils/perl/usfm2osis.pl usfm2osis.pl] utility may be used to convert USFM to OSIS for import to SWORD's native format. USFM uses a separate file for each Bible book.
+
This plain-text format is a common internal-use format within Bible translation agencies and Bible societies. It is the native format of [http://paratext.org/ Paratext]. Paratext is used by more than 60% of all Bible translators world-wide. The current release is [https://paratext.org/download/ Paratext 9.4].
  
See also: [[Converting SFM Bibles to OSIS]]
+
Though '''USFM 2.4''' suffices for most Bibles, [https://ubsicap.github.io/usfm/ USFM 3.0] is now available and has several new features. The standard is open source and is maintained at [https://github.com/ubsicap/usfm ubsicap/usfm].
  
===USFX===
+
CrossWire now has a Python script called usfm2osis.py<ref>This replaces our earlier Perl script [http://crosswire.org/ftpmirror/pub/sword/utils/perl/usfm2osis.pl usfm2osis.pl].</ref> which converts USFM to OSIS for subsequent import to SWORD's native format. See [[Converting SFM Bibles to OSIS]].
''Unified Scripture Format XML''
 
  
This XML file format is designed to provide clean conversions from Scripture to USFM compliant file formats. A more comprehensive description can be found at http://ebt.cx/usfx/. There is no widespread use of this format and there are no plans for SWORD to support it in any way.
+
USFM uses a separate file for each Bible book. USFM is also supported by the open-source program called [http://bibledit.org/ Bibledit]. There are examples of Bibles in USFM format available for download at [http://ebible.org/]. These include the [http://ebible.org/bible/kjv/kjvsf.zip KJV], [http://ebible.org/bible/asv/asvsf.zip ASV], and [http://ebible.org/bible/web/websf.zip WEB] Bibles.
  
===VPL===
+
USFM is one of the formats that can be used by [[Projects:Go Bible/Go Bible Creator|Go Bible Creator]].
''Verse-Per-Line''
 
  
This plain-text format is used for by SWORD for import of Bibles. It consists of one verse per line, with an optional verse reference at the beginning. The [http://crosswire.org/ftpmirror/pub/sword/utils/win32/vpl2mod.exe vpl2mod] utility may be used for import. VPL is deprecated in favor of the IMP format, which is more widely useful.
+
'''Note:'''
 +
<references />
  
===XSEM===
+
==Other Utilities==
''XML Scripture Encoding Model''
+
These are not part of The SWORD Project, but may be useful. A link is given for each.
  
This XML format was proposed by SIL. A comprehensive description of the markup language can be found
+
===Go Bible utilities===
[http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=XSEM&_sc=1 here].
+
* [[Projects:Go Bible/Go Bible Creator|Go Bible Creator]] - a Java SE program for converting either ThML or OSIS or USFM to [[Projects:Go Bible|Go Bible]]. It is being enhanced by SIL to be capable of converting source text in [[File Formats#XHTML|XHTML-TE]] format.
 
 
The formal specifications can be downloaded as a ZIP file
 
[http://scripts.sil.org/cms/scripts/render_download.php?site_id=nrsi&format=file&media_id=XSEM_Source&filename=XSEM_Source.zip here].
 
 
 
The designers of this markup language were instrumental in the writing of the OSIS Specification and it has largely been [http://en.wikipedia.org/wiki/Deprecation deprecated] in favor of using OSIS. There is no widespread use of this format and there are no plans for SWORD to support it in any way.
 
 
 
===XML===
 
''eXtensible Markup Language''
 
 
 
This is generic family of markup formats.  Links to a number of XML specifications can be found [http://xml.coverpages.org/xmlApplications.html here].  Each flavor has its own specifications. SWORD supports markup in the XML formats OSIS and ThML internally
 
 
 
===Zefania XML===
 
[http://www.zefania.de/ Zefania] is an XML format for Bible markup with only the most simple structural tags for book/chapter/verse, notes, etc. The [http://crosswire.org/ftpmirror/pub/sword/utils/perl/zef2osis.pl zef2osis.pl] utility may be used to convert Zefania XML to OSIS for import to SWORD's native format.
 
 
 
===Go Bible===
 
To achieve the navigation speed and general ease of use on even the simplest of Java mobile phones, Go Bible data is fully indexed, as well as being compressed (as are all JAR files).  The format is described in [http://code.google.com/p/gobible/wiki/GoBibleDataFormat Go Bible data format]. Go Bible data is structured as Book | Chapter | Verse text and does not support notes, headings and cross-references, etc. The developer kit [http://gobible.jolon.org/developer/welcome.html Go Bible Creator] can take either ThML or OSIS as the source text format, but they usually have to be made specially suitable. For example, OSIS files produced by Snowfall Software USFM2OSIS script are not structured the same. Work has begun to make an [http://en.wikipedia.org/wiki/XSL_Transformations XSLT] script to convert such OSIS XML files to the format suitable for Go Bible. Go Bible Creator version 2.3.2 and onwards can also take a folder of USFM files as the source text format.
 
 
 
Following an agreement made in July 2008 with the program's author Jolon Faichney, Go Bible is being adopted by CrossWire as its Java ME software project. See [[User:David Haslam|here]] for preliminary information. ''Volunteers wanted''.
 
 
 
==Utility Programs==
 
Unless otherwise specified, the utility programs listed in this section do not work with file formats used by The SWORD Project. 
 
 
 
===The SWORD Project===
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/cipherraw.exe cipherraw] - used to encipher SWORD modules
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/diatheke.exe diatheke] - a basic CLI SWORD frontend
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/mkfastmod.exe mkfstmod] - creates a search index for a module
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/mod2zmod.exe mod2zmod] - creates a compressed module from an installed module
 
 
 
====IMP Tools====
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/imp2gbs.exe imp2gbs] - imports free-form General books in IMP format to SWORD format
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/imp2ld.exe imp2ld] - imports lexicons, dictionaries, and daily devotionals in IMP format to SWORD format
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/imp2vs.exe imp2vs] - imports Bibles and commentaries in IMP format to SWORD format
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/mod2imp.exe mod2imp] - creates an IMP file from an installed module
 
 
 
====VPL Tools====
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/vpl2mod.exe vpl2mod] - imports Bibles and commentaries in Verse-Per-Line format to SWORD format
 
 
 
===GBF Tools===
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/perl/gbf2osis.pl gbf2osis.pl] - a PERL utility for converting GBF to OSIS
 
 
 
* [http://ebible.org/translation/gbf.html gbfconvertor, including gbf2osis, gbf2xsem, & gbf2sf] - utilities for converting GBF to OSIS, XSEM, and SFM
 
* [http://ebible.org/translation/gbf.html gbfsrc] - utilities for converting GBF to "HTML, RTF, TeX, plain ASCII text, a format readable by BibleWorks 5 or later, and a couple of less useful formats"
 
  
===OSIS Utilities===
+
* [http://gbcpreprocessor.codeplex.com/ Go Bible Creator USFM Preprocessor] &ndash; This is a tool to parse through and identify, correct and publish USFM file formats into a file format that can easily be put into the Go Bible mobile phone program.
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/mod2osis.exe mod2osis] - creates an OSIS file from an installed module
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/osis2mod.exe osis2mod] - imports Bibles and commentaries in OSIS format to SWORD format
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/vs2osisref.exe vs2osisref] - returns the osisRef of a given (text form) verse reference
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/xml2gbs.exe xml2gbs] - imports free-form General books in OSIS or ThML format to SWORD format
 
 
 
===STEP Utilities===
 
 
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/step2vpl.exe step2vpl] - export a STEP book in Verse-Per-Line (VPL) format
 
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/stepdump.exe stepdump] - dumps the contents of a STEP book
 
 
 
* [http://www.customconsulting.us/step2rtf.zip step2rtf] - extracts the internal RTF text from STEP books
 
* [http://www.customconsulting.us/stepr-0.3.1.tgz stepr] - a rudimentary STEP reader
 
  
 
===ThML Utilities===
 
===ThML Utilities===
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/xml2gbs.exe xml2gbs] - imports free-form General books in OSIS or ThML format to SWORD format
+
* CCEL Desktop - a program for viewing and developing CCEL books [http://ccel-desktop.sourceforge.net/]
* [http://ccel-desktop.sourceforge.net/ CCEL Desktop] - a program for viewing and developing CCEL books
 
* [http://www.crosswire.org/wiki/index.php/DevTools:Misc#thml2osis thml2osis] - converts ThML to OSIS format.
 
  
===Zefania Utilities===
+
== See also ==
 
+
* [[DevTools:IMP Format|IMP Format]] &ndash; general import format used for various module types
* [http://crosswire.org/ftpmirror/pub/sword/utils/perl/zef2osis.pl zef2osis.pl] &ndash; a PERL utility for converting Zefania XML to OSIS
+
* [[DevTools:GBF|General Bible Format (GBF)]] &ndash; legacy format now deprecated
* [http://sourceforge.net/project/showfiles.php?group_id=202842&package_id=243464&release_id=534590 Zefania TextKonvertor]
+
* [[DevTools:ThML|Theological Markup Language (ThML)]] &ndash; legacy format now deprecated
* [http://wp1066500.wp101.webpack.hosteurope.de/zef/content/view/61/61/ Zefania XML Bible Book Names Changer]
+
* [[Frontends:Bookmarks Standard]]
* [http://www.grabner-online.de/download/zefania_2_sword_win32.zip Zefania_2_sword_win32] &ndash; ''sed based scripts maintained by JensG''
+
* [[File Formats Cruft]]
 
 
===Go Bible utilities===
 
* [http://gobible.jolon.org/developer/welcome.html Go Bible Creator] &ndash; a Java SE program for converting either ThML or OSIS or USFM to Go Bible. Go Bible Creator version 2.3.2 may be downloaded [http://go-bible.googlegroups.com/web/GoBibleCreator_Version_2.3.2.zip here], while still unavailable the main Go Bible website.
 
  
===Other Utilities===
+
[[Category:Development tools]]
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/uconv.zip uconv] - a utility from ICU for converting between various character encodings, perform normalization, transliterate texts, etc. (It's similar to iconv, but much, much more powerful.)
+
[[Category:File formats]]
* [http://crosswire.org/ftpmirror/pub/sword/utils/win32/libxml2-2.6.32+.win32.zip xmllint] - a utility (part of the libxml2 distribution) for validating XML documents
+
[[Category:OSIS]]
 +
[[Category:ThML]]
 +
[[Category:Utilities]]
 +
[[Category:USFM]]
 +
[[Category:Unicode]]
 +
[[Category:Bibledit]]
 +
[[Category:Paratext]]

Latest revision as of 11:13, 23 November 2024

This page lists some of the more common file formats relevant to The SWORD Project, associated utilities, and other CrossWire projects.

CrossWire Bible Society respects copyright. As such, conversion of material that is under copyright without permission from the copyright holders is not supported by The SWORD Project.

SWORD modules

Other than the source code for the SWORD API, there is no documentation for the file format of a SWORD module. The intention is that the SWORD API (or the JSword implementation) is used directly or via other language bindings.

Our module file format is proprietary in the sense that we see no need to document it and certainly no need to stick to it. We change it when we need to. We therefore do not encourage direct interaction with it, but firmly recommend use of the API (either C++ or Java). This is the place where we seek stability and consistency.

The SWORD Project supports currently and actively the following markup for module creation: OSIS, TEI, ThML and plain text.

The SWORD Project Utilities

Precompiled versions of many of these programs are available in most Linux distributions, using the distribution's package installer.
For Windows, they can be found here.[1][2]

Module Creation Tools

It is recommended that Unicode text files used for module creation be encoded as UTF-8.[3]

  • imp2gbs – imports free-form General books in IMP format to SWORD format
  • imp2ld – imports lexicons, dictionaries, and daily devotionals in IMP format to SWORD format
  • imp2vs – imports Bibles and commentaries in IMP format to SWORD format
  • vpl2mod – imports Bibles and commentaries in Verse-Per-Line format to SWORD format
  • osis2mod – imports Bibles and commentaries in OSIS format to SWORD format
  • tei2mod – imports lexicons, dictionaries in TEI format to SWORD format
  • xml2gbs – imports free-form General books in OSIS or ThML format to SWORD format

Diagnostic Tools

  • mod2imp – creates an IMP file[4] from an installed module
  • emptyvss – exports a list of verses missing from the module (useful for testing modules during development)

Legacy format conversion Tools

  • gbf2osis.pl – a PERL utility for converting GBF to OSIS
  • step2vpl – export a STEP book in Verse-Per-Line (VPL) format
  • thml2osis - converts ThML to OSIS format.

OSIS Utilities

  • vs2osisref – returns the osisRef of a given (text form) verse reference
  • xml2gbs – imports free-form General books in OSIS or ThML format to SWORD format

Miscellaneous

  • cipherraw – used to encipher SWORD modules
  • diatheke – a basic CLI SWORD front-end
  • mkfastmod – creates a search index for a module[5]
  • mod2zmod – creates a compressed module from an installed module

Notes on SWORD Tools

  1. If you have Xiphos installed in Windows, the Sword utilities are available in the Xiphos\bin folder.
  2. The latest binaries may be found here, though currently without cipherraw.exe
  3. EOLs should be either Unix style (LF) or Windows style (CRLF). Text files with Mac style EOLs (CR) may give rise to errors or other unexpected behaviour.
  4. The IMP file may contain a residue of XML markup
  5. Aside: To create a list of installed modules with descriptions, enter the following command, optionally redirecting stderr to a log file.
    mkfastmod /? 2>mkfastmod.log

Recommended Non-SWORD Utilities

  • uconv – a utility from ICU for converting between various character encodings, perform normalization, transliterate texts, etc. (It's similar to iconv, but much, much more powerful.) uconv.exe is part of the sword utilities
  • xmllint – a utility (part of the libxml2 distribution) for validating XML documents *

Formats for which CrossWire maintains converters

The SWORD Project uses primary source e-texts. These texts come in numerous formats. CrossWire maintains converters for a number of formats, described below. The converters may target other markup formats, e.g. TEI or OSIS, or may simply export binary data to text, as is the case with our STEP exporter. Specific discussion of each of the available converters is found elsewhere on this page.

USFM

Unified Standard Format Markers

This plain-text format is a common internal-use format within Bible translation agencies and Bible societies. It is the native format of Paratext. Paratext is used by more than 60% of all Bible translators world-wide. The current release is Paratext 9.4.

Though USFM 2.4 suffices for most Bibles, USFM 3.0 is now available and has several new features. The standard is open source and is maintained at ubsicap/usfm.

CrossWire now has a Python script called usfm2osis.py[1] which converts USFM to OSIS for subsequent import to SWORD's native format. See Converting SFM Bibles to OSIS.

USFM uses a separate file for each Bible book. USFM is also supported by the open-source program called Bibledit. There are examples of Bibles in USFM format available for download at [1]. These include the KJV, ASV, and WEB Bibles.

USFM is one of the formats that can be used by Go Bible Creator.

Note:

  1. This replaces our earlier Perl script usfm2osis.pl.

Other Utilities

These are not part of The SWORD Project, but may be useful. A link is given for each.

Go Bible utilities

  • Go Bible Creator - a Java SE program for converting either ThML or OSIS or USFM to Go Bible. It is being enhanced by SIL to be capable of converting source text in XHTML-TE format.
  • Go Bible Creator USFM Preprocessor – This is a tool to parse through and identify, correct and publish USFM file formats into a file format that can easily be put into the Go Bible mobile phone program.

ThML Utilities

  • CCEL Desktop - a program for viewing and developing CCEL books [2]

See also