Difference between revisions of "Whiteboard/TEI Dictionary Proposal"

From CrossWire Bible Society
Jump to: navigation, search
(Problems with the Current Model: this was false)
(Proposed Features of a New Model)
Line 16: Line 16:
 
*Perhaps an arbitrary (numeric?) key for entries could be created that would be hidden from the user to make life easier for developers, but topic maps could connect corresponding entries in numbered dictionaries (Strongs) and dictionaries keyed to natural language (BDB, etc.). This could be extensible over the long-term.  
 
*Perhaps an arbitrary (numeric?) key for entries could be created that would be hidden from the user to make life easier for developers, but topic maps could connect corresponding entries in numbered dictionaries (Strongs) and dictionaries keyed to natural language (BDB, etc.). This could be extensible over the long-term.  
 
*Continuous scrolling would facilitate displaying page numbers and browsing entries.
 
*Continuous scrolling would facilitate displaying page numbers and browsing entries.
 +
*Markup should look similar to a Genbook, with the option of navigating using a tree structure.
  
 
=Example TEI File=
 
=Example TEI File=

Revision as of 12:08, 23 April 2012

Summary

The current SWORD lexicon model allows for only flat dictionaries, which is ideally suited to Strongs but not to more recent lexica. A new model needs to take into account page numbers, non-alphabetic sorting, and hierarchical entries. Ideally it would also allow for browsing a dictionary more like a book.

Problems with the Current Model

  • Dictionaries are flat. Dictionaries that are hierarchical must be flattened, but BDB (forthcoming) is hierarchical. Roots form super-entries, and the lexicon as a whole is not strictly alphabetical. See the example document below, which is abstracted from BDB.
  • Entries are sorted according to unicode code points. This leads to a number of problems.
    • In many languages and scripts (including some Latin scripts), sorting by unicode code points does not preserve the proper alphabetic order.
    • If BDB were sorted in this way, it would nullify the information about the connections between words based on the roots.
  • There is no way to identify page numbers found in print editions of a given module. This information is particularly important for those doing academic work.
  • In practice, front-ends usually display lexicon entries as isolated containers. This model works well for Strongs because Strongs-tagged texts take you directly to the correct entry. This does not work for all dictionaries, though. When looking up a word, you might want to scan up and down the "page" to find the entry you are looking for. This is especially important if natural language keys are used so that the text might get you to roughly the right place in the dictionary but not necessarily the exact entry you need. Dictionaries need to allow for fuzzy lookup.

Proposed Features of a New Model

  • The order of the lexicon should be the same order as the XML file used to compile the module.
  • Perhaps an arbitrary (numeric?) key for entries could be created that would be hidden from the user to make life easier for developers, but topic maps could connect corresponding entries in numbered dictionaries (Strongs) and dictionaries keyed to natural language (BDB, etc.). This could be extensible over the long-term.
  • Continuous scrolling would facilitate displaying page numbers and browsing entries.
  • Markup should look similar to a Genbook, with the option of navigating using a tree structure.

Example TEI File

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.crosswire.org/2008/TEIOSIS/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.crosswire.org/2008/TEIOSIS/namespace http://www.crosswire.org/OSIS/teiP5osis.1.4.xsd">
<text>
<body>
  <pb n="1"/>
  <div1><head>א</head>
    <superEntry id="אבב" trans="abb">Entry text
      <entry id="אב" trans="ab" strong="H3">Entry text</entry>
      <entry id="אביב" trans="abib" strong="H24">Entry text</entry>
    </superEntry>
    <superEntry id="אבד" trans="abd" strong="H6">
  <pb n="2"/>
      <entry id="אבד" trans="abd" strong="H8">Entry text</entry>
      <entry id="אבדה" trans="abdh" strong="H9">Entry text</entry>
      <entry id="אבדון" trans="abdwn" strong="H10">Entry text</entry>
    </superEntry>
  </div1>
</body>
</text>
</TEI>