Difference between revisions of "Talk:Transliteration"

From CrossWire Bible Society
Jump to: navigation, search
(moved from article)
 
 
(2 intermediate revisions by the same user not shown)
Line 6: Line 6:
 
* Index the transliteration to provide search feature support
 
* Index the transliteration to provide search feature support
 
* Keyboard entry of transliteration words in the search dialogue for the original text
 
* Keyboard entry of transliteration words in the search dialogue for the original text
 +
 +
 +
: This isn't actually a feature of Xiphos itself, rather it's how transliteration is currently exposed to front ends by the engine. The engine applies transliteration like any other option filter, so it passes a whole chunk of text (verse/dictionary entry) to ICU and instructs ICU to do [script] to Latin transliteration. Then it outputs the result. The filter is pretty dumb, so it has no notion of any markup. That means we can't instruct it to convert any non-Latin script, since it will also convert all markup to that script, in which case we end up with garbage. Likewise, any data held in attributes will get transliterated to Latin with the current filters.
 +
: Something we ''could'' do for OSIS (and potentially plaintext) content is to chunk the text and transliterate individual words (defined as text with whitespace on both sides), putting the resultant words within <w> elements with xlit attributes.
 +
: '''Indexing''' transliterated text is a little different from this, and would simply require indexing both the original and transliterated forms. It will increase the index sizes, and wouldn't require that transliterated display be turned on at all. --[[User:Osk|Osk]] 02:58, 19 October 2010 (UTC)
 +
 +
=== Ruby markup ===
 +
A module could be created with the transliterated words coded in OSIS using Ruby markup.[http://www.w3.org/TR/ruby/] This may be especially useful for transliterating those languages which are not yet supported by ICU.
 +
 +
 +
: I think there's some confusion on terminology here. Ruby is really just a typesetting term or an HTML entity that is meant to signify the same appearance as that typeset style.
 +
: OSIS would simply use <w xlit="..."> markup to embed transliteration. This may then be translated by a display filter to ruby markup or some other interlinear display (or something else entirely).
 +
: I don't particularly think this is useful in the case of languages/scripts not supported by ICU. It would be much faster to write a new ICU transliterator than to transliterate by hand. Hard coded transliteration can be useful when the transliteration is not necessarily predictable, however. --[[User:Osk|Osk]] 03:06, 19 October 2010 (UTC)

Latest revision as of 03:06, 19 October 2010

Transliteration options

Whereas Xiphos offers transliteration as a module option, giving an alternative view of the module, other ideas might be explored by front-end developers. Here are some ideas that might be considered:

  • Interlinear display option, with the original text and transliterated text on alternate lines
  • Index the transliteration to provide search feature support
  • Keyboard entry of transliteration words in the search dialogue for the original text


This isn't actually a feature of Xiphos itself, rather it's how transliteration is currently exposed to front ends by the engine. The engine applies transliteration like any other option filter, so it passes a whole chunk of text (verse/dictionary entry) to ICU and instructs ICU to do [script] to Latin transliteration. Then it outputs the result. The filter is pretty dumb, so it has no notion of any markup. That means we can't instruct it to convert any non-Latin script, since it will also convert all markup to that script, in which case we end up with garbage. Likewise, any data held in attributes will get transliterated to Latin with the current filters.
Something we could do for OSIS (and potentially plaintext) content is to chunk the text and transliterate individual words (defined as text with whitespace on both sides), putting the resultant words within <w> elements with xlit attributes.
Indexing transliterated text is a little different from this, and would simply require indexing both the original and transliterated forms. It will increase the index sizes, and wouldn't require that transliterated display be turned on at all. --Osk 02:58, 19 October 2010 (UTC)

Ruby markup

A module could be created with the transliterated words coded in OSIS using Ruby markup.[1] This may be especially useful for transliterating those languages which are not yet supported by ICU.


I think there's some confusion on terminology here. Ruby is really just a typesetting term or an HTML entity that is meant to signify the same appearance as that typeset style.
OSIS would simply use <w xlit="..."> markup to embed transliteration. This may then be translated by a display filter to ruby markup or some other interlinear display (or something else entirely).
I don't particularly think this is useful in the case of languages/scripts not supported by ICU. It would be much faster to write a new ICU transliterator than to transliterate by hand. Hard coded transliteration can be useful when the transliteration is not necessarily predictable, however. --Osk 03:06, 19 October 2010 (UTC)