Difference between revisions of "DevTools:Locale Files"

From CrossWire Bible Society
Jump to: navigation, search
(Maintained locale files)
(Book Abbrevs)
 
(7 intermediate revisions by 3 users not shown)
Line 2: Line 2:
 
The file name is generally the language code with extension .conf <br>
 
The file name is generally the language code with extension .conf <br>
 
As Unicode text files, locale files should be encoded UTF-8 (without BOM) and the file name should include "-utf8" after the language code.<br>
 
As Unicode text files, locale files should be encoded UTF-8 (without BOM) and the file name should include "-utf8" after the language code.<br>
Other encodings are allowed, but the encoding should always be specified in the Meta section.
+
Other encodings are deprecated.  
  
 
== Example ==
 
== Example ==
Line 15: Line 15:
 
  Name=de
 
  Name=de
 
  Description=German
 
  Description=German
  Encoding=ISO8859-1
+
  Encoding=UTF-8
  
 
The above information is used to define the locale.  They should be
 
The above information is used to define the locale.  They should be
fairly obvious.  Name should be taken from a standard abbrev, probably
+
fairly obvious.  Name should be, in preferred order, an ISO 639-1 alpha 2 language code, an ISO 639-2+ alpha 3 lanugage code, or an Ethnologue 3 letter language codeSWORD includes a fairly comprehensive list of these available at: sword/locales.d/locales.conf.
returned from echo $LANGPlease understand that this, and all entries
+
This, and all entries are case sensitive.
are case sensitive.
 
  
 
=== Text ===
 
=== Text ===
Line 39: Line 38:
  
 
=== Book Abbrevs ===
 
=== Book Abbrevs ===
This section permits a "many-to-one" mapping for each book, as required by the references used in the Bible translation or commentary, etc.
+
This section specifies all possible alternative book names and abbreviations for each Bible book.  These are required for the locale to work correctly within the SWORD engine.  This section teaches the verse reference parser all possible ways to represent a book name.  The verse parser will allow partial matches and will sort entries in this section alphabetically by their codepoint values and give preference to partial matches with lower alphabetical order, e.g, "1 C" is a partial match for 1 CORINTHIANS=1Cor and 1 CHRONICLES=1Chr, but 1 CHRONICLES will be preferred because it is alphabetically first.  If the desire is to have "1 C" prefer 1Cor, instead, then a 3rd entry is required: 1 C=1Cor.  This preference for disambiguation is important to consider for partial matches, as they are prolific throughout Biblical literature, e.g., Jud=Jude (instead of Judges), Jo=John (instead of Job, Jonah, or Joshua)
 +
 +
The format is:
 +
uppercase alternative book name or abbreviation=osisID
 +
SWORD will sort this list, but it is preferred and standard to keep this list in alphabetical order to assist the authors in defining and tuning these preferences, .e.g.,
  
 
  [Book Abbrevs]
 
  [Book Abbrevs]
Line 46: Line 49:
 
  1 CORINTHIANS=1Cor
 
  1 CORINTHIANS=1Cor
 
  1 JN=1Jn
 
  1 JN=1Jn
 +
1C=1Cor
 +
1CHRONICLES=1Chr
 +
1CORINTHIANS=1Cor
 +
I C=1Cor
 +
I CHRONICLES=1Chr
 +
I CORINTHIANS=1Cor
 +
IC=1Cor
 +
ICHRONICLES=1Chr
 +
ICORINTHIANS=1Cor
  
These are the abbreviations for each book and are REQUIRED for the
+
 
locale to work correctly in the engine.  They are actually more than
+
As expressed earlier, notice that 1 Chronicles would come, alphabetically
just abbreviations; they tell the parser how to incrementally parse
 
versekey text.  Notice that 1 Chronicles would come, alphabetically
 
 
before 1 Corinthians.  The above entries say: 1Cor (which is the OSIS book id for 1 Corinthians)
 
before 1 Corinthians.  The above entries say: 1Cor (which is the OSIS book id for 1 Corinthians)
has precedence up through "1 C", any character beyond that will
+
has precedence up through "1 C", any character beyond that will disambiguate the entry anyway, so the default 1 CHRONICLES or 1
disambiguate the entry anyway, so the default 1 CHRONICLES or 1
+
CORINTHIANS entries would correctly resolve partial matches with more characters starting with "1 C".  
CORINTHIANS entries would take over from there.  
 
  
English abbreviations are no longer required to
 
be in the abbreviations section as they are in there by default;
 
in the example above they are in there for demonstration purposes.
 
  
 
'''IMPORTANT''':
 
'''IMPORTANT''':
  
There MUST be at least 1 abbreviation entry for each book name
+
All verse references output by SWORD must be legal, parsable reference as input to SWORD.  This means that there MUST be at least 1 abbreviation entry for each book name
comprised of a toupper (uppercase function) of the entire string
+
which is comprised of a toupper (uppercase function) of the entire string EXACTLY as you have translated it in the [Text] section.
EXACTLY as you have translated it in the [Text] section.
 
  
Following are the REQUIRED entries from our excerpt book names above.
+
For example, the following are the REQUIRED entries for our book names from the excerpt [Text] section example above.
 
  1. MOSE=Gen
 
  1. MOSE=Gen
 
  2. MOSE=Ex
 
  2. MOSE=Ex
Line 77: Line 82:
 
alphabetical precedence, but might want Matthew or Mark).  
 
alphabetical precedence, but might want Matthew or Mark).  
 
In this case, you would put in an entry MA=Matt or MA=Mark
 
In this case, you would put in an entry MA=Matt or MA=Mark
 +
 +
=== Pref Abbrevs ===
 +
This section designates the preferred abbreviation for each book.  These are typically used when SWORD is asked to display a very short verse reference or a short Bible book name.  The format for these entries is: osisID=Preferred Abbreviation, e.g.,
 +
 +
[Pref Abbrevs]
 +
Gen=1Mo
 +
Exod=2Mo
 +
Lev=3Mo
 +
Num=4Mo
 +
Deut=5Mo
 +
Josh=Jos
 +
Judg=Rich
 +
 +
Each preferred abbreviation must necessarily be parsable by the exhaustive list of abbreviations in the [Book Abbrevs] section.
 +
  
 
You can test your locale with the sword/tests/parsekey test program (this  
 
You can test your locale with the sword/tests/parsekey test program (this  
Line 98: Line 118:
  
 
[[Category:Localization|Locale Files]]
 
[[Category:Localization|Locale Files]]
 +
[[Category:SWORD]]

Latest revision as of 20:59, 19 August 2019

A locale file is stored in the locales.d folder under the Sword path.
The file name is generally the language code with extension .conf
As Unicode text files, locale files should be encoded UTF-8 (without BOM) and the file name should include "-utf8" after the language code.
Other encodings are deprecated.

Example

Locales require a few things. Let's step through the German locale:

excerpts from /sword/locales.d/de.conf:

Meta

[Meta]
Name=de
Description=German
Encoding=UTF-8

The above information is used to define the locale. They should be fairly obvious. Name should be, in preferred order, an ISO 639-1 alpha 2 language code, an ISO 639-2+ alpha 3 lanugage code, or an Ethnologue 3 letter language code. SWORD includes a fairly comprehensive list of these available at: sword/locales.d/locales.conf. This, and all entries are case sensitive.

Text

This section requires a "one-to-one" mapping for each string to be translated.

The following entries are translation strings for anything you might want. REQUIRED are the book names of the Bible, including deuterocanonical books if used. Other things might be option name, value, tip, translations, or any text returned from the engine.

If you find any errors or omissions, please post a message that you found a constant string in the engine not being (properly) translated.

[Text]
Genesis=1. Mose
Exodus=2. Mose
Leviticus=3. Mose

# <snipped rest of book names>

Observe that a full-stop is a permitted character in a book name.

Book Abbrevs

This section specifies all possible alternative book names and abbreviations for each Bible book. These are required for the locale to work correctly within the SWORD engine. This section teaches the verse reference parser all possible ways to represent a book name. The verse parser will allow partial matches and will sort entries in this section alphabetically by their codepoint values and give preference to partial matches with lower alphabetical order, e.g, "1 C" is a partial match for 1 CORINTHIANS=1Cor and 1 CHRONICLES=1Chr, but 1 CHRONICLES will be preferred because it is alphabetically first. If the desire is to have "1 C" prefer 1Cor, instead, then a 3rd entry is required: 1 C=1Cor. This preference for disambiguation is important to consider for partial matches, as they are prolific throughout Biblical literature, e.g., Jud=Jude (instead of Judges), Jo=John (instead of Job, Jonah, or Joshua)

The format is: uppercase alternative book name or abbreviation=osisID SWORD will sort this list, but it is preferred and standard to keep this list in alphabetical order to assist the authors in defining and tuning these preferences, .e.g.,

[Book Abbrevs]
1 C=1Cor
1 CHRONICLES=1Chr
1 CORINTHIANS=1Cor
1 JN=1Jn
1C=1Cor
1CHRONICLES=1Chr
1CORINTHIANS=1Cor
I C=1Cor
I CHRONICLES=1Chr
I CORINTHIANS=1Cor
IC=1Cor
ICHRONICLES=1Chr
ICORINTHIANS=1Cor


As expressed earlier, notice that 1 Chronicles would come, alphabetically before 1 Corinthians. The above entries say: 1Cor (which is the OSIS book id for 1 Corinthians) has precedence up through "1 C", any character beyond that will disambiguate the entry anyway, so the default 1 CHRONICLES or 1 CORINTHIANS entries would correctly resolve partial matches with more characters starting with "1 C".


IMPORTANT:

All verse references output by SWORD must be legal, parsable reference as input to SWORD. This means that there MUST be at least 1 abbreviation entry for each book name which is comprised of a toupper (uppercase function) of the entire string EXACTLY as you have translated it in the [Text] section.

For example, the following are the REQUIRED entries for our book names from the excerpt [Text] section example above.

1. MOSE=Gen
2. MOSE=Ex
3. MOSE=Lev

That's it for requirements. Tuning your locale can be important for the user experience. Many [Book Abbrevs] entries may be added to assign precedence if, for example, you find you are getting taken to the wrong entries from text like: "Ma 1:1" (would be Malachi by default because of alphabetical precedence, but might want Matthew or Mark). In this case, you would put in an entry MA=Matt or MA=Mark

Pref Abbrevs

This section designates the preferred abbreviation for each book. These are typically used when SWORD is asked to display a very short verse reference or a short Bible book name. The format for these entries is: osisID=Preferred Abbreviation, e.g.,

[Pref Abbrevs]
Gen=1Mo
Exod=2Mo
Lev=3Mo
Num=4Mo
Deut=5Mo
Josh=Jos
Judg=Rich

Each preferred abbreviation must necessarily be parsable by the exhaustive list of abbreviations in the [Book Abbrevs] section.


You can test your locale with the sword/tests/parsekey test program (this program is in the SWORD source along with several other programs that are used to validate the configuration files) and try different strings to see how they parse.

A full-stop is a permitted character in a localized book abbreviation. Other punctuation characters commonly used in verse references are not allowed in localized book names. These include the hyphen '-' (used for verse ranges), the colon ':' (used to separate chapter and verse numbers), and the comma ',' (used for verse lists). Additionally, numerals in non-initial position are not permitted in book names (i.e. '3John' is valid but 'Psalm151' is not).

Submissions

If you create a new locale file as part of the process towards making a module, please submit it to CrossWire.

Submissions should be sent to sword-support@crosswire.org

Maintained locale files

On the CrossWire server, the locale files are stored in /space/home/ftp/pub/sword/raw/locales.d
Users with FTP or SCP access are able to download them from that folder.

Corrections to errors in locale files should be sent to sword-support@crosswire.org