Difference between revisions of "Frontends:Diatheke"

From CrossWire Bible Society
Jump to: navigation, search
(Output encodings: <ref>Output encodings determine how any printable non-ASCII characters in the output are encoded.</ref><ref>There is no hyphen in UTF8 or UTF16 even though a hyphen might be expected.</ref>)
(Known weaknesses: savlm)
Line 219: Line 219:
  
 
Search type '''regex''' does not yet properly support UTF-8 encoded text. (See above).
 
Search type '''regex''' does not yet properly support UTF-8 encoded text. (See above).
 +
 +
For some output filters and/or formats, the XML snippets may include the undefined attribute name '''savlm''' in the '''w''' elements. e.g.
 +
Genesis 1:1: <w savlm="strong:H07225">In the beginning</w>
 +
This seems to be a bug in the source code. Evidently, it denotes "save lemma".
 +
<pre>
 +
grep savlm src/modules/filters/*.cpp
 +
src/modules/filters/osishtmlhref.cpp: SWBuf savelemma = tag.getAttribute("savlm");
 +
src/modules/filters/osislatex.cpp: SWBuf savelemma = tag.getAttribute("savlm");
 +
src/modules/filters/osisosis.cpp: tag.setAttribute("savlm", 0);
 +
src/modules/filters/osisrtf.cpp: SWBuf savelemma = tag.getAttribute("savlm");
 +
src/modules/filters/osisstrongs.cpp: SWBuf savlm = l;
 +
src/modules/filters/osisstrongs.cpp: wtag.setAttribute("savlm", savlm);
 +
src/modules/filters/osiswebif.cpp: SWBuf savelemma = tag.getAttribute("savlm");
 +
src/modules/filters/osisxhtml.cpp: SWBuf savelemma = tag.getAttribute("savlm");
 +
</pre>
  
 
=== Windows edition ===
 
=== Windows edition ===

Revision as of 16:29, 7 March 2017

What is diatheke?

Diatheke is a very simple command line interface (CLI) front-end to the SWORD Project's Bible software library. Essentially, "diatheke" is the stuff contained within the file "corediatheke.cpp" in the apps/console/diatheke directory of the SWORD source tree. Corediatheke.cpp contains only one function that is intended to be called from any program using diatheke, and that function performs exactly one lookup in the SWORD library per call. Examples of calls would be a query for a verse (or verse list/range), a search, a request for a list of modules, etc.

Where's the name 'diatheke' come from?

Diatheke means 'testament' or 'commandment'. And diatheke (the program) was originally a command line application. commandment... command line app... It's a pun.

How is diatheke useful to me?

Probably it isn't, but there are a number of front-ends to diatheke (yes, front-ends to a front-end) that are of use. These include:

  • diatheke/TCL: a BibleBot for eggdrop that interfaces with diatheke/CLI
  • diatheke/CGI: a Perl/CGI interface to diatheke/CLI
  • HANDiatheke: a Palm PQA interface to diatheke/CGI
  • ActiveDiatheke: an ActiveX control (.OCX) interface to SWORD

The above four are no longer under active development and may even be no longer available.

Release history

This section needs expanding.
  • Version 4.6 was released during 2013.
  • Version 4.7 was released on Aug 30 2015.

How do I get diatheke?

This section needs updating.

To get the very latest version, grab the SWORD source tree from our SVN repository using the URL: https://crosswire.org/svn/sword/trunk

e.g.

   $ svn checkout https://crosswire.org/svn/sword/trunk sword

If you don't want to use SVN, you can try grabbing a recent release from ftp://ftp.crosswire.org/pub/sword/source/

The Sword utilities for Windows are also installed when Xiphos is installed. These include a copy of diatheke.exe

For diatheke/CLI and diatheke/CGI you can download version 4.0 from:

For diatheke/TCL and HANDiatheke you can download version 2.0 from ftp://ftp.crosswire.org/pub/sword/frontend/diatheke/diatheke-2.0.tar.gz.

For ActiveDiatheke you can download a preliminary version from ftp://ftp.crosswire.org/pub/sword/frontend/diatheke/ActiveDiatheke.zip.

Diatheke option filters

Module option filters are off by default in diatheke. They must be specified to include the featured property in the output.

Valid (output) option_filters values are:

a              (Greek Accents) for modules with GlobalOptionFilter=UTF8GreekAccents[1]
b (Bi-Directional Reordering) for modules with Direction=BiDi or Direction=RtoL
c (Hebrew Cantillation) for modules with GlobalOptionFilter=UTF8Cantillation
e (Word Enumerations) for modules with GlobalOptionFilter=OSISEnum
f (Footnotes) for modules with footnotes (GBF/ThML/OSIS)
g (Glosses/Ruby) for modules with GlobalOptionFilter=OSISGlosses or GlobalOptionFilter=OSISRuby
h (Section Headings) for modules with headings (GBF/ThML/OSIS)
l (Lemmas) for modules with GlobalOptionFilter=ThMLLemma or GlobalOptionFilter=OSISLemma
m (Morphology) for modules with GlobalOptionFilter=ThMLMorph or GlobalOptionFilter=OSISMorph
n (Strong's numbers) for modules with Strong's Numbers (GBF/ThML/OSIS)
p (Arabic Vowels) for modules with GlobalOptionFilter=UTF8ArabicPoints
r (Arabic Shaping) for modules with Arabic/Persian script (required for proper rendering in Linux)
s (Scripture Crossrefs) for modules with GlobalOptionFilter=ThMLScripref or GlobalOptionFilter=OSISScripref
t (Algorithmic Transliterations via ICU)
v (Hebrew Vowels) for modules with GlobalOptionFilter=UTF8HebrewPoints
w (Red Words of Christ) for modules with RedLetterWords (GBF/ThML/OSIS)
x (Encoded Transliterations) for modules with GlobalOptionFilter=OSISXlit

Notes:

  1. Refer to Module configuration files for this and similar items listed.

Diatheke search types

This section needs expanding.

Valid search_type values are: phrase (default), regex, multiword, attribute, lucene, multilemma.

Search type lucene only works when the module already has a Lucene search index. Such an index can be created by means of either an installed front-end app such as Xiphos, or using the command line Sword utility mkfastmod.

Search type regex has some limitations. It doesn't yet fully support UTF-8 encoded text, so the results you get may not be what you expected. For example:

diatheke -b KJV -s regex -k Abed...nego

gives:

Verses containing "Abed...nego"-- Daniel 1:7 ; Daniel 2:49 ; Daniel 3:12 ; Daniel 3:13 ; Daniel 3:14 ; Daniel 3:16 ; Daniel 3:19 ; Daniel 3:20 ; Daniel 3:22 ; Daniel 3:23 ; Daniel 3:26 ; Daniel 3:28 ; Daniel 3:29 ; Daniel 3:30 -- 14 matches total (KJV)

Each dot in the search query represents "any single byte", so the wide UTF-8 character U+2013 en dash in the 'hyphenated' name Abed–nego can match three dots (E2 80 93), depending on whether SWORD/diatheke was compiled with or without cxx11regex.

How do I use diatheke/CLI?

Calling diatheke without any parameters will result in the command line syntax help being output to stderr.

The query_key (-k) must be the last argument because all further arguments are added to the key.

Examples

The following are a few examples of calling diatheke from the command line: (booknames can be abbreviated, providing this avoids ambiguity)

Retrieve Acts ch 10 diatheke -b KJV -k "Acts 10"
First five verses of above diatheke -b KJV -m 5 -k "Acts 10"
Acts chapters 1 and 2 diatheke -b KJV -k "Acts 1-2"
Genesis 1:1 diatheke -b KJV -k G 1:1
Galatians 1:1 w/ Strong's (if available) diatheke -b KJV -o n -k "Ga 1:1"
I Corinthians 1:1 (also "ic 1:1") diatheke -b KJV -o n -k "1c 1:1"
Revelation 1:1-1:7 diatheke -b KJV -k "Rev 1:1-7"
Revelation 1:1 diatheke -b KJV -m 1 -k "R 1:1-7"
Revelation 1:1,1:3,1:7 as HTML (w/ <p>, <i>, etc. tags) diatheke -b KJV -f HTML -k R 1:1,3,7
verses with "my people", quotations optional diatheke -b KJV -s phrase -k "my people"
verses with "skin" and "bones" diatheke -b KJV -s multiword -k skin bones
verses with "church" or "assembly" diatheke -b KJV -s regex -k "church | assembly"
Strong's Greek 3056 diatheke -b StrongsGreek -k 3056
Definition of "horn" in Two Babylons diatheke -b 2BabDict -k horn
Entry for John 1:1 in Family Bible Notes diatheke -b Family -k Jn 1:1
Entry for "Lion" in Scripture Alphabet Of Animals diatheke -b SAOA -k "Lion"
Entry for "olive-tree" in Easton's Bible Dictionary diatheke -b Easton -k olive-tree
Matthew 24 from Westcott Hort Greek NT transliterated into Latin script diatheke -b WHNU -t Latin -o mn -k "Mt 24"

Diatheke output

The plain text output of diatheke marks any OSIS highlight elements (e.g. <hi type="italic">n</hi>) by wrapping the highlighted text between asterisks (e.g. *n*). It does this whatever the value of the type attribute.

Output formats

Valid output_format values are: CGI, GBF, HTML, HTMLHREF, LaTeX[1], OSIS, RTF[2], ThML, XHTML, and plain[3] text (default).

Output encodings

Valid output_encoding values are: Latin1, UTF8 (default), UTF16, HTML, and RTF.[4][5]

Redirecting the output

It being a command line utility, the output from diatheke can be readily redirected to a file using the normal features of the command shell. This may be especially useful for (e.g.) a search that has a large number of results.[6][7]

Notes:

  1. Only after diatheke version 4.7
  2. The output does not include any header lines that would facilitate the output including (e.g.) a font colour table, were the output to be redirected to a Rich Text File. Thus using the output filter -o w (Red Words of Christ) would require such a header to be added by the user in order to ensure the red letter text can be viewed as such.
  3. The word plain here is merely a handle to distinguish the default from the other output formats.
    It does not imply that the output encoding is restricted in any way.
  4. Output encodings determine how any printable non-ASCII characters in the output are encoded.
  5. There is no hyphen in UTF8 or UTF16 even though a hyphen might be expected.
  6. When using diatheke to search, the results are all output as a single line without any breaks.
  7. Diatheke search results contain only the verse references where the pattern is matched (if any).

Known weaknesses

Both editions (Linux & Windows)

Diatheke does not support OSISReferenceLinks.

Currently, diatheke does not output:

  • Any canonical Psalm titles even with the option filter -o h for section headings.
  • Any pilcrow symbols in the KJV Bible or similar modules where these are encoded as a marker attribute in the OSIS milestone element.
  • Any quotation marks in modules where these are encoded as a marker attribute in the OSIS q element.
  • Any footnote text with the option filter -o f for footnotes; only a pair of brackets [] is output at the location of each note tag.

Currently, diatheke does not distinguish different highlight types in the OSIS hi element. It treats all such styles as if they were bold by wrapping the text between two asterisks.

Search type regex does not yet properly support UTF-8 encoded text. (See above).

For some output filters and/or formats, the XML snippets may include the undefined attribute name savlm in the w elements. e.g.

Genesis 1:1: <w savlm="strong:H07225">In the beginning</w>

This seems to be a bug in the source code. Evidently, it denotes "save lemma".

grep savlm src/modules/filters/*.cpp
src/modules/filters/osishtmlhref.cpp:	SWBuf savelemma = tag.getAttribute("savlm");
src/modules/filters/osislatex.cpp:	SWBuf savelemma = tag.getAttribute("savlm");
src/modules/filters/osisosis.cpp:	tag.setAttribute("savlm", 0);
src/modules/filters/osisrtf.cpp:	SWBuf savelemma = tag.getAttribute("savlm");
src/modules/filters/osisstrongs.cpp:	SWBuf savlm = l;
src/modules/filters/osisstrongs.cpp:	wtag.setAttribute("savlm", savlm);
src/modules/filters/osiswebif.cpp:	SWBuf savelemma = tag.getAttribute("savlm");
src/modules/filters/osisxhtml.cpp:	SWBuf savelemma = tag.getAttribute("savlm");

Windows edition

The utility diatheke.exe is among the Sword utilities compiled for Win32.

Under the Windows command shell (cmd.exe), diatheke does not correctly handle non-ASCII characters in the query key. Thus, for example, the following command that works OK in Linux will fail in Windows:

diatheke -b KJV -s phrase -k Æneas

In Windows, the non-ASCII character "Æ" gets changed to U+00E3 LATIN SMALL LETTER A WITH TILDE.

The response in Windows is then:

Verses containing "ãneas"-- none (KJV)

The root cause is that Windows shell assumes text is encoded as UTF-16 LE whereas SWORD requires all text to be encoded as UTF-8. This problem mainly affects using the search options in diatheke, where a query key is more likely to contain non-ANSI characters. Even so, for any locale in which some Bible book names contain non-ANSI characters, the problem would also affect diatheke when the query key is a reference that contains such a character.

Tools that use Diatheke

AutoKey script for The SWORD Project

Ryan (Adyeth) has developed a script for the AutoKey 0.6x utility to do paste Bible text given a reference. It works with OpenOffice, plain text editors, or any other Linux program where you might need to paste scripture passages. It requires Diatheke in order to function. You can download it from his website.