Difference between revisions of "File Formats"

From CrossWire Bible Society
Jump to: navigation, search
(Zefania Utilities: link to zefania modulsplitter)
m (USFM: https://paratext.org/download/ Paratext 9.4)
 
(378 intermediate revisions by 10 users not shown)
Line 1: Line 1:
The SWORD Project respects [[copyright]], and other Intellectual property Rights.  As such, conversion of material that is under copyright is not supported by The SWORD Project.
+
This page lists some of the more common file formats ''relevant'' to The SWORD Project, associated utilities, and other CrossWire projects.
  
This page merely lists some of the more common file formats, and Bible Study Programs, and why conversion of their resources (if applicable) is discouraged.
+
CrossWire Bible Society respects [[copyright]].  As such, conversion of material that is under copyright without permission from the copyright holders is not supported by The SWORD Project.
  
EULA is the abbreviation for ''End User License Agreement''. This is the agreement that governs the terms and conditions of using a product. As such, it is enforcible, in the United States, under contract law.
+
== SWORD modules ==
 +
Other than the source code for the SWORD API, there is no documentation for the file format of a '''SWORD module'''. The intention is that the [[DevTools:SWORD|SWORD API]] (or the [[DevTools:JSword|JSword]] implementation) is used directly or via other language bindings.
  
==Bible Study Programs==
+
Our module file format is proprietary in the sense that we see no need to document it and certainly no need to stick to it. We change it when we need to. We therefore do not encourage direct interaction with it, but firmly recommend use of the API (either C++ or Java). This is the place where we seek stability and consistency.
There are a plethora of Bible Study Programs that people use. Some are FLOSS. Some are Commercially distirbuted.  Some are gratis, but have restrictions on them. This subsection merely lists the programs, and why their material may not(if applicable) be converted into a format that The SWORD Project utilizes.
 
  
===Bible Works===
+
The SWORD Project supports currently and actively the following markup for module creation: OSIS, [https://tei-c.org/ TEI], ThML and plain text.
This is a commercial product.  The software contains a EULA that prohibits reverse engineering its file format. All material that uses this file format is protected by Copyright.  As such it is both illegal, and immoral to convert it to a format that The SWORD Project uses.
 
  
=== e-Sword===
+
==The SWORD Project Utilities==
This program is distributed gratis. Most of the resources for this program are protected by copyright. As such, conversion into a format that The SWORD Project is discouraged by CrossWire. There are no tools that directly convert the file format used by this program to one used by The SWORD Project.
+
Precompiled versions of many of these programs are available in most '''Linux''' distributions, using the distribution's package installer.<BR>For '''Windows''', they can be found [https://github.com/devroles/mingw_sword_package here].<ref>If you have '''Xiphos''' installed in Windows, the Sword utilities are available in the Xiphos\bin folder.</ref><ref>The latest binaries may be found [https://github.com/devroles/mingw_sword_package/releases/tag/1.9.0a here], though currently without cipherraw.exe</ref>
  
===Libronix Digital Library System===
+
===Module Creation Tools===
This is a commercial product. The program contains a EULA that prohibits reverse engineering its file format. All material that uses this file format is protected by Copyright. As such it is both illegal, and immoral to convert it to a format that The SWORD Project uses.
+
It is recommended that Unicode text files used for module creation be [[Encoding|encoded]] as UTF-8.<ref>[http://en.wikipedia.org/wiki/Newline EOLs] should be either Unix style (LF) or Windows style (CRLF). Text files with Mac style EOLs (CR) may give rise to errors or other unexpected behaviour.</ref>
 +
* imp2gbs &ndash; imports free-form General books in IMP format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* imp2ld &ndash; imports lexicons, dictionaries, and daily devotionals in IMP format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* imp2vs &ndash; imports Bibles and commentaries in IMP format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* vpl2mod &ndash; imports Bibles and commentaries in Verse-Per-Line format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* [[osis2mod]] &ndash; imports Bibles and commentaries in OSIS format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* tei2mod &ndash; imports lexicons, dictionaries in TEI format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* xml2gbs &ndash; imports free-form General books in OSIS or ThML format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
  
===PC Study Bible===
+
===Diagnostic Tools===
This is a commercial product.  The program contains a EULA that prohibits reverse engineering its file format. All material that uses this file format is protected by Copyright.  As such it is both illegal, and immoral to convert it to a format that The SWORD Project uses.
+
* mod2imp &ndash; creates an IMP file<ref>The IMP file may contain a residue of XML markup</ref> from an installed module [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* emptyvss &ndash; exports a list of verses missing from the module (useful for testing modules during development) [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
  
===The Watch Tower Library===
+
===Legacy format conversion Tools===
This program is distributed gratis. The program contains a EULA that prohibits reverse engineering its file format. The EULA also prohibits redistribution of the material to individuals who are not current Jehovah's Witness in good standing. Additionally, the material in this collection is protected by copyright.  
+
* gbf2osis.pl &ndash; a PERL utility for converting GBF to OSIS [http://crosswire.org/ftpmirror/pub/sword/utils/perl/gbf2osis.pl &dagger;]
 +
* step2vpl &ndash; export a STEP book in Verse-Per-Line (VPL) format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* [[DevTools:Misc#thml2osis|thml2osis]] - converts ThML to OSIS format.
  
===Zefania===
+
===OSIS Utilities===
This is a FLOSS Bible Study Program. Some of the resources available for this program are under copyright. The conversion of those resources to The SWORD Project is discouraged.  
+
* vs2osisref &ndash; returns the osisRef of a given (text form) verse reference [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* xml2gbs &ndash; imports free-form General books in OSIS or ThML format to SWORD format [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
  
==File Formats==
+
===Miscellaneous===
Bible Study programs use a plethora of file formats. Even more have been suggested for use in creating Bibles, and other religious material. This subsection merely lists some of the most common of those formats.
+
* cipherraw &ndash; used to encipher SWORD modules [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* [[Frontends:Diatheke|diatheke]] &ndash; a basic CLI SWORD front-end [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* [[mkfastmod]] &ndash; creates a search index for a module<ref>Aside: To create a list of installed modules with descriptions, enter the following command, optionally redirecting stderr to a log file.<pre>mkfastmod /? 2>mkfastmod.log</pre></ref> [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
* [[mod2zmod]] &ndash; creates a compressed module from an installed module [http://crosswire.org/ftpmirror/pub/sword/utils/win32 &dagger;]
 +
==== Notes on SWORD Tools ====
  
===GBF===
+
<references />
General Bible Format
 
  
This file format is intended as an aid to preparing Bible Texts for use with various Bible search programs. The complete specification is at http://www.ebible.org/bible/gbf.htm.  
+
===Recommended Non-SWORD Utilities===
 +
* uconv &ndash; a utility from [http://icu-project.org/ ICU] for converting between various character encodings, perform normalization, transliterate texts, etc. (It's similar to iconv, but much, much more powerful.) uconv.exe is part of the [http://crosswire.org/ftpmirror/pub/sword/utils/win32 sword utilities]
 +
* xmllint &ndash; a utility (part of the [http://xmlsoft.org/ libxml2] distribution) for validating XML documents [http://crosswire.org/ftpmirror/pub/sword/utils/win32 *]
  
This file format is used for creating some resources for The SWORD Project.
+
==Formats for which CrossWire maintains converters==
 
+
The SWORD Project uses primary source e-texts. These texts come in numerous formats. CrossWire maintains converters for a number of formats, described below. The converters may target other markup formats, e.g. TEI or OSIS, or may simply export binary data to text, as is the case with our STEP exporter. Specific discussion of each of the available converters is found elsewhere on this page.
===HTML===
 
Hyper Text Markup Language
 
 
 
This is the basic language of the World Wide Web.  Some Bible programs use it for their resources.
 
 
 
===LitML===
 
Liturgical Markup Language
 
 
 
The home page for this markup language is http://www.oremus.org/LitML/.
 
This is described at http://hildormen.org/blogs/index.php/2004/09/22/p28 and http://hildormen.org/docs/LitML/Guidelines-LitML10-1.0.html.
 
 
 
This is a descendant of, and complement to ThML. An additional influence is HTML 4.0.
 
 
 
The markup reflects its orientation towards liturgy and hymns.
 
 
 
===OSIS===
 
Open Scriptural Information Standard.
 
 
 
The Open Scripture Information Standard (OSIS) is an XML schema for marking up scripture and related text, part of an "open scripture" initiative composed of translators, publishers, scholars, software manufacturers, and technical experts who are coordinated by the Bible Technologies Group. It is co-sponsored by the American Bible Society and the Society of Biblical Literature.
 
 
 
The specifications for this file format can be found at http://www.bibletechnologies.net/20Manual.dsp.
 
 
 
This file format is used for creating some resources for The SWORD Project.
 
 
 
===PDF===
 
Portable Document Format
 
 
 
This is an ISO track file format for platform independent rendering of documents.  As such, it is designed to be a "read only" format.
 
===RTF===
 
Rich Text Format
 
 
 
This is a file format that is "owned' by Microsoft, Inc.  It is used as the markup language for presentation, bu several Bible Study Programs, and their related file formats.
 
 
 
===STML===
 
Sacred Text Markup Language.
 
 
 
This is a proprietary markup language used by sacred-text.com. 
 
 
 
===STEP===
 
Standard Template Electronic Publishing.
 
 
 
This file format was used by Quickverse between roughly 1996 and 2002.  All material that was distributed in this file format is either under copyright, or has a EULA which prohibits format conversion.
 
 
 
Most of the documentation and specifications for this format can be found at
 
http://web.archive.org/web/20040204143502/http://www.crosswire.org/bsisg/ ;
 
http://web.archive.org/web/20021019135604/www.crosswire.org/bsisg/ ;
 
 
 
===ThML===
 
Theological Markup Language
 
 
 
The specifications for this file format are available at http://www.ccel.org/ThML/.
 
 
 
This file format is used for creating some resources for The SWORD Project.
 
  
 
===USFM===
 
===USFM===
Unified Standard Format Markers
+
[http://paratext.org/usfm ''Unified Standard Format Markers'']
 
 
===USFX===
 
Unified Scripture Format XML
 
 
 
This XML file format is designed to provide clean conversions from Scripture to USFM compliant file formats. A more comprehensive description can be found at http://ebt.cx/usfx/.
 
 
 
===XSEM===
 
XML Scripture Encoding Model
 
 
 
This XML format was proposed by SIL. A comprehensive description of the markup language can be found at http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=XSEM&_sc=1.
 
  
The formal specifications can be found at
+
This plain-text format is a common internal-use format within Bible translation agencies and Bible societies. It is the native format of [http://paratext.org/ Paratext]. Paratext is used by more than 60% of all Bible translators world-wide. The current release is [https://paratext.org/download/ Paratext 9.4].
http://scripts.sil.org/cms/scripts/render_download.php?site_id=nrsi&format=file&media_id=XSEM_Source&filename=XSEM_Source.zip
 
  
===XML===
+
Though '''USFM 2.4''' suffices for most Bibles, [https://ubsicap.github.io/usfm/ USFM 3.0] is now available and has several new features. The standard is open source and is maintained at [https://github.com/ubsicap/usfm ubsicap/usfm].
eXtensible Markup Language
 
  
This is generic family of markup formats.  Links to a number of XML specifications can be found at http://xml.coverpages.org/xmlApplications.html. Each flavor has its own specifications.
+
CrossWire now has a Python script called usfm2osis.py<ref>This replaces our earlier Perl script [http://crosswire.org/ftpmirror/pub/sword/utils/perl/usfm2osis.pl usfm2osis.pl].</ref> which converts USFM to OSIS for subsequent import to SWORD's native format. See [[Converting SFM Bibles to OSIS]].
  
===Zefania XML===
+
USFM uses a separate file for each Bible book. USFM is also supported by the open-source program called [http://bibledit.org/ Bibledit]. There are examples of Bibles in USFM format available for download at [http://ebible.org/]. These include the [http://ebible.org/bible/kjv/kjvsf.zip KJV], [http://ebible.org/bible/asv/asvsf.zip ASV], and [http://ebible.org/bible/web/websf.zip WEB] Bibles.
This is the native file format for Zefania. There are several tools that will convert material into this file format.
 
  
==Utility Programs==
+
USFM is one of the formats that can be used by [[Projects:Go Bible/Go Bible Creator|Go Bible Creator]].
Unless otherwise specified, the utility programs listed in this section do not work with file formats used by The SWORD Project.
 
  
===GBF Tools===
+
'''Note:'''
* [http://ebible.org/translation/gbf.html gbfconvertor, including gbf2osis, gbf2xsem, & gbf2sf]
+
<references />
* [http://ebible.org/translation/gbf.html gbfsrc]
 
* [http://gbf2bc.sourceforge.net/ GBF-to-BibleConverter]
 
  
===The SWORD Project===
+
==Other Utilities==
* cipherraw
+
These are not part of The SWORD Project, but may be useful. A link is given for each.
* Diaspora
 
* Diatheke
 
* imp2gbs
 
* imp2ld
 
* imp2vs
 
* mkfstmod
 
* mod2imp
 
* mod2osis
 
* mod2vpl
 
* mod2zmod
 
* osis2mod
 
* step2vpl
 
* stepdump
 
* vpl2mod
 
* vs2osisref
 
* xml2gbs
 
  
===STEP Utilities===
+
===Go Bible utilities===
The specifications for STEP were publicly released. Some peopel created tools for this file format.
+
* [[Projects:Go Bible/Go Bible Creator|Go Bible Creator]] - a Java SE program for converting either ThML or OSIS or USFM to [[Projects:Go Bible|Go Bible]]. It is being enhanced by SIL to be capable of converting source text in [[File Formats#XHTML|XHTML-TE]] format.
  
* Step2RTF;
+
* [http://gbcpreprocessor.codeplex.com/ Go Bible Creator USFM Preprocessor] &ndash; This is a tool to parse through and identify, correct and publish USFM file formats into a file format that can easily be put into the Go Bible mobile phone program.
* Step2VPL;
 
* Stepdump;
 
* STEPr;
 
* The STEP Publisher's ToolKit;
 
  
 
===ThML Utilities===
 
===ThML Utilities===
* cceldesktop.  
+
* CCEL Desktop - a program for viewing and developing CCEL books [http://ccel-desktop.sourceforge.net/]
  
===Zefania Utilities===
+
== See also ==
 +
* [[DevTools:IMP Format|IMP Format]] &ndash; general import format used for various module types
 +
* [[DevTools:GBF|General Bible Format (GBF)]] &ndash; legacy format now deprecated
 +
* [[DevTools:ThML|Theological Markup Language (ThML)]] &ndash; legacy format now deprecated
 +
* [[Frontends:Bookmarks Standard]]
 +
* [[File Formats Cruft]]
  
* KonvSetup;
+
[[Category:Development tools]]
* Zefania BpeST;
+
[[Category:File formats]]
* Zefania Diatheke;
+
[[Category:OSIS]]
* [https://sourceforge.net/project/showfiles.php?group_id=202842&package_id=243464&release_id=536663 Zefania Module Splitter;]
+
[[Category:ThML]]
* Zefania TextKonvertor;
+
[[Category:Utilities]]
* ZXML-BCV;
+
[[Category:USFM]]
* ZXML2BCV.xsl;
+
[[Category:Unicode]]
[[Link title]]
+
[[Category:Bibledit]]
 +
[[Category:Paratext]]

Latest revision as of 11:13, 23 November 2024

This page lists some of the more common file formats relevant to The SWORD Project, associated utilities, and other CrossWire projects.

CrossWire Bible Society respects copyright. As such, conversion of material that is under copyright without permission from the copyright holders is not supported by The SWORD Project.

SWORD modules

Other than the source code for the SWORD API, there is no documentation for the file format of a SWORD module. The intention is that the SWORD API (or the JSword implementation) is used directly or via other language bindings.

Our module file format is proprietary in the sense that we see no need to document it and certainly no need to stick to it. We change it when we need to. We therefore do not encourage direct interaction with it, but firmly recommend use of the API (either C++ or Java). This is the place where we seek stability and consistency.

The SWORD Project supports currently and actively the following markup for module creation: OSIS, TEI, ThML and plain text.

The SWORD Project Utilities

Precompiled versions of many of these programs are available in most Linux distributions, using the distribution's package installer.
For Windows, they can be found here.[1][2]

Module Creation Tools

It is recommended that Unicode text files used for module creation be encoded as UTF-8.[3]

  • imp2gbs – imports free-form General books in IMP format to SWORD format
  • imp2ld – imports lexicons, dictionaries, and daily devotionals in IMP format to SWORD format
  • imp2vs – imports Bibles and commentaries in IMP format to SWORD format
  • vpl2mod – imports Bibles and commentaries in Verse-Per-Line format to SWORD format
  • osis2mod – imports Bibles and commentaries in OSIS format to SWORD format
  • tei2mod – imports lexicons, dictionaries in TEI format to SWORD format
  • xml2gbs – imports free-form General books in OSIS or ThML format to SWORD format

Diagnostic Tools

  • mod2imp – creates an IMP file[4] from an installed module
  • emptyvss – exports a list of verses missing from the module (useful for testing modules during development)

Legacy format conversion Tools

  • gbf2osis.pl – a PERL utility for converting GBF to OSIS
  • step2vpl – export a STEP book in Verse-Per-Line (VPL) format
  • thml2osis - converts ThML to OSIS format.

OSIS Utilities

  • vs2osisref – returns the osisRef of a given (text form) verse reference
  • xml2gbs – imports free-form General books in OSIS or ThML format to SWORD format

Miscellaneous

  • cipherraw – used to encipher SWORD modules
  • diatheke – a basic CLI SWORD front-end
  • mkfastmod – creates a search index for a module[5]
  • mod2zmod – creates a compressed module from an installed module

Notes on SWORD Tools

  1. If you have Xiphos installed in Windows, the Sword utilities are available in the Xiphos\bin folder.
  2. The latest binaries may be found here, though currently without cipherraw.exe
  3. EOLs should be either Unix style (LF) or Windows style (CRLF). Text files with Mac style EOLs (CR) may give rise to errors or other unexpected behaviour.
  4. The IMP file may contain a residue of XML markup
  5. Aside: To create a list of installed modules with descriptions, enter the following command, optionally redirecting stderr to a log file.
    mkfastmod /? 2>mkfastmod.log

Recommended Non-SWORD Utilities

  • uconv – a utility from ICU for converting between various character encodings, perform normalization, transliterate texts, etc. (It's similar to iconv, but much, much more powerful.) uconv.exe is part of the sword utilities
  • xmllint – a utility (part of the libxml2 distribution) for validating XML documents *

Formats for which CrossWire maintains converters

The SWORD Project uses primary source e-texts. These texts come in numerous formats. CrossWire maintains converters for a number of formats, described below. The converters may target other markup formats, e.g. TEI or OSIS, or may simply export binary data to text, as is the case with our STEP exporter. Specific discussion of each of the available converters is found elsewhere on this page.

USFM

Unified Standard Format Markers

This plain-text format is a common internal-use format within Bible translation agencies and Bible societies. It is the native format of Paratext. Paratext is used by more than 60% of all Bible translators world-wide. The current release is Paratext 9.4.

Though USFM 2.4 suffices for most Bibles, USFM 3.0 is now available and has several new features. The standard is open source and is maintained at ubsicap/usfm.

CrossWire now has a Python script called usfm2osis.py[1] which converts USFM to OSIS for subsequent import to SWORD's native format. See Converting SFM Bibles to OSIS.

USFM uses a separate file for each Bible book. USFM is also supported by the open-source program called Bibledit. There are examples of Bibles in USFM format available for download at [1]. These include the KJV, ASV, and WEB Bibles.

USFM is one of the formats that can be used by Go Bible Creator.

Note:

  1. This replaces our earlier Perl script usfm2osis.pl.

Other Utilities

These are not part of The SWORD Project, but may be useful. A link is given for each.

Go Bible utilities

  • Go Bible Creator - a Java SE program for converting either ThML or OSIS or USFM to Go Bible. It is being enhanced by SIL to be capable of converting source text in XHTML-TE format.
  • Go Bible Creator USFM Preprocessor – This is a tool to parse through and identify, correct and publish USFM file formats into a file format that can easily be put into the Go Bible mobile phone program.

ThML Utilities

  • CCEL Desktop - a program for viewing and developing CCEL books [2]

See also