Difference between revisions of "Converting SFM Bibles to OSIS"

From CrossWire Bible Society
Jump to: navigation, search
(Preparing (U)SFM files for conversion)
(Converting USFM files to OSIS: switch recommendation to usfm2osis.pl, removed note about an unreleased version of SFMToOSIS that may never come, added/rearranged various other details)
Line 15: Line 15:
 
=Converting USFM files to OSIS=
 
=Converting USFM files to OSIS=
  
There exist two publicly distributed programs capable of converting USFM files to OSIS: usfm2osis.pl and SFMToOSIS.
+
There exist two publicly distributed programs capable of converting USFM files to OSIS: usfm2osis.pl and SFMToOSIS. If you have the time, ability, and inclination, you may want to try both programs and see which one works best for you.
 +
 
 +
==Using usfm2osis.pl==
 +
 
 +
[http://crosswire.org/ftpmirror/pub/sword/utils/perl/usfm2osis.pl usfm2osis.pl] is a simple Perl script intended only for converting USFM files to OSIS. usfm2osis.pl was written by CrossWire, so its OSIS output is geared towards the use of OSIS documents in modules for The SWORD Project. usfm2osis.pl is actively maintained by CrossWire, and bug reports and feature requests are welcomed at [mailto:sword-support@crosswire.org sword-support].
 +
 
 +
usfm2osis.pl requires a Perl interpreter in the system path. Then you can run:
 +
perl usfm2osis.pl <osisWork> [-o OSIS-file] [-e USFM encoding] <USFM filenames|wildcard>
 +
 
 +
osisWork should be a value such as Bible.en.WEB.2007.
 +
 
 +
If you include an OSIS-file value, the output will be written there. Otherwise, it will be written to a file name based on your osisWork.
 +
 
 +
The USFM encoding argument should indicate the character encoding found in the source files. If none is give, utf8 is the default. The list of available encodings depends on your system. Executing the script with no arguments will print the list (as will executing it with an invalid encoding value).
 +
 
 +
The final argument is a list of filenames or a wildcard value such as *.sfm containing the SFM data.
  
 
==Using SFMToOSIS==
 
==Using SFMToOSIS==
  
[http://www.virtualstorehouse.org/downloads.html SFMToOSIS] is the method of converting SFM & USFM files to OSIS XML files that CrossWire recommends. It is a robust, but somewhat difficult to setup and use, program produced by [http://www.snowfallsoftware.com/ Snowfall Software]. The program requires that you install a [http://www.python.org/ Python] interpreter and that the interpreter be within the system path. The program can also convert OSIS files back to SFM.
+
[http://www.virtualstorehouse.org/downloads.html SFMToOSIS] is a robust, but somewhat difficult to setup and use, program produced by [http://www.snowfallsoftware.com/ Snowfall Software]. The program requires that you install a [http://www.python.org/ Python] interpreter and that the interpreter be within the system path. The program can also convert OSIS files back to SFM. SFMToOSIS more robust than usfm2osis.pl and more likely to produce valid OSIS files automatically.
 
+
A newer version is being developed with a graphical user interface for non-technical users that includes workflow tracking, keyboarding from scratch, and converting from formats other than USFM/SFM files.
+
  
 
The program also requires a Paratext .ssf file. If you do not have such a file, you can use the following sample and adjust it to the specifics of your own text:
 
The program also requires a Paratext .ssf file. If you do not have such a file, you can use the following sample and adjust it to the specifics of your own text:
Line 42: Line 55:
  
 
Next, update the ''run.bat'' file that came with SFMToOSIS to point to your SFM files and SSF file. Then run ''run.bat''.
 
Next, update the ''run.bat'' file that came with SFMToOSIS to point to your SFM files and SSF file. Then run ''run.bat''.
 
==Using usfm2osis.pl==
 
 
[http://crosswire.org/ftpmirror/pub/sword/utils/perl/usfm2osis.pl usfm2osis.pl] is a simple Perl script intended only for converting USFM files to OSIS. It is not as robust as SFMToOSIS and is less likely to produce valid OSIS files automatically. If it works for you, that's great, but you may find that using SFMToOSIS produces cleaner OSIS output.
 
 
usfm2osis.pl requires a Perl interpreter in the system path. Then you can run:
 
perl usfm2osis.pl <osisWork> [-o OSIS-file] [-e USFM encoding] <USFM filenames|wildcard>
 
 
osisWork should be a value such as Bible.en.WEB.2007.
 
 
If you include an OSIS-file value, the output will be written there. Otherwise, it will be written to a file name based on your osisWork.
 
 
The USFM encoding argument should indicate the character encoding found in the source files. If none is give, utf8 is the default. The list of available encodings depends on your system. Executing the script with no arguments will print the list (as will executing it with an invalid encoding value).
 
 
The final argument is a list of filenames or a wildcard value such as *.sfm containing the SFM data.
 
  
 
=Importing OSIS files into SWORD=
 
=Importing OSIS files into SWORD=

Revision as of 08:08, 24 August 2009

Introduction

Standard Format Markers (SFM) and its more standardized derivative USFM have been used for decades to store Bibles for printing and display in programs like UBS' Paratext. The format is popular among Bible translation agencies and Bible societies. The basic format of SFM is simply plaintext with backslash(\) codes. For example, a new Bible verse is signaled by \v followed by the number of the verse.

Older texts in various dialects of SFM can still be found. It was common for various agencies, Bible societies, and even regional offices of such groups to have their own SFM standards. Unified Standard Format Markers (USFM) was developed to standardize SFM and encourage interoperability so that Bibles from one agency could be reasonably expected to operate with the software and stylesheets employed by another. The current version of USFM is 2.2, defined at http://confluence.ubs-icap.org/display/USFM/Home.

Preparing (U)SFM files for conversion

The simplicity of writing SFM also makes it easy to write poor SFM that fails to correspond to any kind of standard. The first task in preparing to convert SFM files to OSIS is to clean the text. The more regular your source files are, the more likely the conversion process will operate correctly.

One method of cleaning up your files is to import them into an SFM editor such as Bibledit (which runs on Linux, Mac OS X, and Windows) or SIL FieldWorks Translation Editor (which runs on Windows). The editors will frequently perform some basic corrections to the SFM syntax, but Bibledit in particular can perform a number of checks to correct specific errors common to SFM.

Another useful tool is the GoBibleCreator USFM Preprocessor, a utility to examine and normalize USFM files, including identification of non-standard SF markers. Despite its name, its use is not limited to conversion of documents for use in GoBible.

Converting USFM files to OSIS

There exist two publicly distributed programs capable of converting USFM files to OSIS: usfm2osis.pl and SFMToOSIS. If you have the time, ability, and inclination, you may want to try both programs and see which one works best for you.

Using usfm2osis.pl

usfm2osis.pl is a simple Perl script intended only for converting USFM files to OSIS. usfm2osis.pl was written by CrossWire, so its OSIS output is geared towards the use of OSIS documents in modules for The SWORD Project. usfm2osis.pl is actively maintained by CrossWire, and bug reports and feature requests are welcomed at sword-support.

usfm2osis.pl requires a Perl interpreter in the system path. Then you can run:

perl usfm2osis.pl <osisWork> [-o OSIS-file] [-e USFM encoding] <USFM filenames|wildcard>

osisWork should be a value such as Bible.en.WEB.2007.

If you include an OSIS-file value, the output will be written there. Otherwise, it will be written to a file name based on your osisWork.

The USFM encoding argument should indicate the character encoding found in the source files. If none is give, utf8 is the default. The list of available encodings depends on your system. Executing the script with no arguments will print the list (as will executing it with an invalid encoding value).

The final argument is a list of filenames or a wildcard value such as *.sfm containing the SFM data.

Using SFMToOSIS

SFMToOSIS is a robust, but somewhat difficult to setup and use, program produced by Snowfall Software. The program requires that you install a Python interpreter and that the interpreter be within the system path. The program can also convert OSIS files back to SFM. SFMToOSIS more robust than usfm2osis.pl and more likely to produce valid OSIS files automatically.

The program also requires a Paratext .ssf file. If you do not have such a file, you can use the following sample and adjust it to the specifics of your own text:

<ScriptureText>
<BooksPresent>111111111111111111111111111111111111111111111111111111111111111111000000000000000000000000000000000</BooksPresent>
<Encoding>UTF-8</Encoding>
<ChapterMarker>c</ChapterMarker>
<Copyright></Copyright>
<FileNameForm>41MAT</FileNameForm>
<FileNamePostPart>.SFM</FileNamePostPart>
<FileNamePrePart></FileNamePrePart>
<FullName>The Bible in English</FullName>
<Language>ENGLISH</Language>
<LeftToRight>T</LeftToRight>
<Name>EN</Name>
<VerseMarker>v</VerseMarker>
<Versification>4</Versification>
</ScriptureText>

Next, update the run.bat file that came with SFMToOSIS to point to your SFM files and SSF file. Then run run.bat.

Importing OSIS files into SWORD

Before you import your OSIS files to SWORD format, you should check that it is valid OSIS. (And before you submit any files to modules@crosswire.org, you must ensure that your files are valid OSIS. Invalid OSIS files will not be accepted.) See OSIS Bible validation for further instructions on OSIS validation.

Instructions for running osis2mod are available at osis2mod usage.