Difference between revisions of "Whiteboard/TEI Dictionary Proposal"

From CrossWire Bible Society
Jump to: navigation, search
(Features of the Current Model)
(Summary)
Line 1: Line 1:
 
=Summary=
 
=Summary=
  
The current SWORD lexicon model allows for only flat dictionaries, which is ideally suited to Strongs but not to more recent lexica. A new model needs to take into account page numbers, non-alphabetic sorting, and hierarchical entries. Ideally it would also allow for browsing a dictionary.
+
The current SWORD lexicon model allows for only flat dictionaries, which is ideally suited to Strongs but not to more recent lexica. A new model needs to take into account page numbers, non-alphabetic sorting, and hierarchical entries. Ideally it would also allow for browsing a dictionary more like a book.
  
 
=Problems with the Current Model=
 
=Problems with the Current Model=

Revision as of 02:03, 23 April 2012

Summary

The current SWORD lexicon model allows for only flat dictionaries, which is ideally suited to Strongs but not to more recent lexica. A new model needs to take into account page numbers, non-alphabetic sorting, and hierarchical entries. Ideally it would also allow for browsing a dictionary more like a book.

Problems with the Current Model

  • Dictionaries are flat. Dictionaries that are hierarchical must be flattened, but BDB (forthcoming) is hierarchical. Roots form super-entries, and the lexicon as a whole is not strictly alphabetical. See the example document below, which is abstracted from BDB.
  • Entries are sorted according to unicode code points. This leads to a number of problems. In many languages and scripts (including some Latin scripts), sorting by unicode code points does not preserve the proper alphabetic order. If BDB were sorted in this way, it would nullify the information about the connections between words based on the roots.
  • There is no way to identify page numbers found in print editions of a given module. This information is particularly...
  • In practice, front-ends usually display lexicon entries as isolated containers. This model works well for Strongs because Strongs-tagged texts take you directly to the correct entry. This is not all dictionaries work, though. When looking up a word, you might want to scan up and down the "page" to find the entry you are looking for. This is especially important if natural language keys are used so that the text might get you to roughly the right place in the dictionary but not necessarily the exact entry you need.

Proposed Features of a New Model

Example TEI File

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.crosswire.org/2008/TEIOSIS/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.crosswire.org/2008/TEIOSIS/namespace http://www.crosswire.org/OSIS/teiP5osis.1.4.xsd">
<text>
<body>
  <pb n="1"/>
  <div1><head>א</head>
    <superEntry id="אבב" trans="abb">Entry text
      <entry id="אב" trans="ab" strong="H3">Entry text</entry>
      <entry id="אביב" trans="abib" strong="H24">Entry text</entry>
    </superEntry>
    <superEntry id="אבד" trans="abd" strong="H6">
  <pb n="2"/>
      <entry id="אבד" trans="abd" strong="H8">Entry text</entry>
      <entry id="אבדה" trans="abdh" strong="H9">Entry text</entry>
      <entry id="אבדון" trans="abdwn" strong="H10">Entry text</entry>
    </superEntry>
  </div1>
</body>
</text>
</TEI>