Complete Lexicon Functionality
From CrossWire Bible Society
Revision as of 14:00, 20 March 2009 by Refdoc (talk | contribs) (→What Complete Lexicon Functionality Looks Like)
Contents
Issues with the Current Lexdict Module Driver
- It handles basic glossaries in non-accented Latin scripts as well as standard Strong's modules quite well. However, the module is created with an index that is ordered by bytes, and this reorders dictionaries in some languages in a way that is undesireable.
- It does not allow for front-matter or for searching entry text or browsing in a tree structure.
- Only one type of key is supported for each module. For TEI modules, using n="<key1>|<key2>" simply results in the two keys being merged together.
What Complete Lexicon Functionality Looks Like
- A complete lexicon should be able to have front-matter (preface, introduction, bibliographic information, tables of abbreviations, etc.).
- Quick lookup from a Bible module should be easily accessible by hovering over, right clicking over, or double clicking on a word.
- Users should be able to browse a lexicon by letter using a tree structure.
- The print order of entries should be preserved to ensure that quick lookup is accurate.
- The user should be able to search the text of dictionary entries.
- The user should be able to search the tag fields of dictionary entries
The fundamental design issue is that for a search to be successful, the search request has to be normalized with the same rules that normalize the key for lookup. The second design issue is that of speed. A search on a million entry dictionary requires that the key be indexed.
Some current behaviors need to be replaced:
- The keys are normalized to UPPER CASE.
- Normalized keys are shown to the user.
- Keys are shown in the order that they are indexed.
- Keys have to be unique.
A possible solution:
- Make index 0 always hold front-matter, or move all front-matter into a GenBook set of files.
- Replace the normalization process with the creation of a CollationKey (see ICU for details of their implementation). A CollationKey is an internal representation of the entry's headword that can be sorted.
- Add an original order index, where the headwords are ordered as in the input. This index points to the data file.
- Modify the search index to hold the normalized key and the position in the original order index. This index can also hold other normalized representations of the headword, such as stripped of accents, transliterated, ....
- The search result will be the best position in the original order index.