Talk:Localized Language Names

From CrossWire Bible Society
Revision as of 11:04, 13 November 2009 by Dmsmith (talk | contribs) (comment on native forms)

Jump to: navigation, search

Specifying Scripts

Some languages have multiple scripts. What is the proper way to show that? E.g. Traditional vs Simplified Chinese? And I think Azeri has multiple scripts.

For BibleDesktop, we have localized zh (traditional) and zh_CN (simplified). This is not quite right, but fits pragmatically based on Java's locale and resource bundle mechanism.

For Java a locale is lang, lang_country, lang_country_dialect, or lang__dialect (where country is unstated). There is no standard for dialect, so it could be used for script.

I think there is a standard for scripts and that CLDR and ICU are starting to do something with it. Any thoughts??? --Dmsmith 00:37, 13 November 2009 (UTC)

Script codes come from ISO 15924, and according to BCP 47, the way to include them in a locale is between the language and region, so en-Latn-US would be English written in Latin script in/for the US. (This is a bad example because BCP 47 says not to overspecify by naming a script when it should be obvious.) BCP 47 does specify using hyphen to separate tags, but I would guess that Java would be happy with the same style of tags if you just map the hyphens to underscores.
Traditional Chinese is zh-Hant, simplified is zh-Hans. Mongolian in Mongolian script, Cyrillic, and Latin would be mn-Mong, mn-Cyrl, and mn-Latn respectively.
This is all incorporated into CLDR and ICU, but more importantly is officially recognized/maintained by ISO, Unicode, and IANA. --Osk 03:11, 13 November 2009 (UTC)
Many thanks. For Chinese you gave the simplified form. When we add support to SWORD for these language codes, how do you see us supporting/specifying both Hans and Hant? --Dmsmith 10:32, 13 November 2009 (UTC)

Native Forms

Perhaps you may have gathered from my response on sword-devel. I'm not really keen on showing the native name. I think there are too many difficulties.

  • Many of these show up as boxes when browsing the page. What font should be used? (In CSS, it is possible to suggest fonts. That way if the user had them installed, they could view the page.)
  • Is grc really koine? I didn't think they spoke koine in 1400. And when koine was spoken, I think they used all caps and didn't use accents. And spaces between words did not exist.
  • In Middle English or Old English, would they really have spelled Middle, Old and English that way? I think these are modern forms, not native forms. And would they have described their language as Old, or Middle? After all, it is really difficult to read middle let alone old English.
  • In awc, Western Acipa, do they really say Western?
  • In cak, Central Cakchiquel ,do they really say, Central?
  • If a module is in the ug language, when there are several script choices, which is it Uyghurche‎ or ئۇيغۇرچە? That is, does a person who speaks ug know both scripts? Or is one of them "squiggly worms?"

For this reason, I think it is a good idea to have native forms, but for it to be flexible for frontends to decide how to implement it.

JSword uses language names as nodes in a tree. If they are localized to the end user, then it is clear what they are getting. If they can't read it, perhaps because of the script or because of a wrong font, then all they know is that it doesn't make sense to them and to avoid it. (To me it makes sense in viewing a conf to show the native representation, because that gives it context.)