Complete Lexicon Functionality

From CrossWire Bible Society
Revision as of 12:33, 20 March 2009 by Dmsmith (talk | contribs) (Solutions related to module creation)

Jump to: navigation, search

Issues with the Current Lexdict Module Driver

  • It handles basic glossaries in non-accented Latin scripts as well as standard Strong's modules quite well. However, the module is created with an index that is ordered by bytes, and this reorders dictionaries in some languages in a way that is undesireable.
  • It does not allow for front-matter or for searching entry text or browsing in a tree structure.
  • Only one type of key is supported for each module. For TEI modules, using n="<key1>|<key2>" simply results in the two keys being merged together.

What Complete Lexicon Functionality Looks Like

  • A complete lexicon should be able to have front-matter (preface, introduction, bibliographic information, tables of abbreviations, etc.).
  • Quick lookup from a Bible module should be easily accessible by hovering over, right clicking over, or double clicking on a word.
  • Users should be able to browse a lexicon by letter using a tree structure.
  • The print order of entries should be preserved to ensure that quick lookup is accurate.
  • The user should be able to search the text of dictionary entries.

Solutions related to module creation

The fundamental design issue is that for a search to be successful, the search request has to be normalized with the same rules that normalize the key for lookup. The second design issue is that of speed. A search on a million entry dictionary requires that the key be indexed.

Some current behaviors need to be replaced:

  • The keys are normalized to UPPER CASE.
  • Normalized keys are shown to the user.
  • Keys are shown in the order that they are indexed.
  • Keys have to be unique.

A possible solution:

  • Make index 0 always hold front-matter, or move all front-matter into a GenBook set of files.
  • Replace the normalization process with the creation of a CollationKey (see ICU for details of their implementation). A CollationKey is an internal representation of the entry's headword that can be sorted.
  • Add an original order index, where the headwords are ordered as in the input. This index points to the data file.
  • Modify the search index to hold the normalized key and the position in the original order index. This index can also hold other normalized representations of the headword, such as stripped of accents, transliterated, ....
  • The search result will be the best position in the original order index.

Solutions related to the engine

Solutions related to front-ends