Difference between revisions of "User:Dmsmith/KJV 2.6"

From CrossWire Bible Society
Jump to: navigation, search
(Punctuation & typos: === catchWord osisRef ===)
(catchWord osisRef: We might further enhance the study notes by adding a suitable '''osisRef''' to each '''catchWord''' element. '''Fine grain''' references could be used to locate the catchWord t)
Line 256: Line 256:
=== catchWord osisRef ===
=== catchWord osisRef ===
We might further enhance the study notes by adding a suitable '''osisRef''' to each '''catchWord''' element. '''Fine grain''' references could be used to locate the catchWord text.
* For a single keyword, we could use the '''@s[word]''' operator to search for the keyword.<ref>This will only find the first occurrence of the word.</ref>
* For a keyphrase (multiple words), we could use the '''@cp''' operators to specify the range of codepoints in the verse text.
The former would be easier to implement than the latter.<ref>Care would be required when the catchWord text ends with an ellipsis.</ref>
<references />
[[User:David Haslam|David Haslam]]
== Note tag placement ==
== Note tag placement ==

Revision as of 09:35, 23 January 2016

This page is for recommended changes to the KJV module version 2.6 (or later).


1 Cor 15:27 The comma at "him, it is" should not be italicized. There should be a comma between "excepted which". --Dmsmith 11:19, 16 February 2014 (MST)

Done --Dmsmith 17:54, 18 February 2014 (MST)

Words of Christ

The Old Scofield only highlights Words of Christ (WoC) as they come directly from his mouth. Not what others say he said. Not translation of what he said, such as translation from Aramaic. In 2.6, there are 3 error in markup (maybe more, but these are known). Red is what should be WoC, black is currently red, but shouldn't be:

Mat 8:26 And he saith unto them, Why are ye fearful, O ye of little faith? Then he arose, and rebuked the winds and the sea; and there was a great calm.

Mat 19:18 He saith unto him, Which? Jesus said, Thou shalt do no murder, Thou shalt not commit adultery, Thou shalt not steal, Thou shalt not bear false witness,

Act 1:4 And, being assembled together with them, commanded them that they should not depart from Jerusalem, but wait for the promise of the Father, which, saith he, ye have heard of me.

Done--Dmsmith 06:56, 19 February 2014 (MST)

Added words

Added words & punctuation

Split all transChange elements that contain punctuation marks, so that the punctuation and following space is normal text. 1 Cor 15:27 is one example of many such occurrences.

  • Found and fixed:
    • 67 ,
    • 10 ;
    • 6 :
    • 1 ?

--Dmsmith 18:38, 18 February 2014 (MST)

Added words & Strong's

Review and correct each instance in which a w element for Strong's & Morph is found within a transChange element. The markup probably belongs to the preceding word. The following are the instances and word(s) that precede that are not contained by a <w> element.

Gen 14.10	<transChange type="added"><w lemma="strong:H0875">was full of</w></transChange>
Exod 15.12	<transChange type="added"><w lemma="strong:H02098">which</w></transChange>
Exod 15.16	<transChange type="added"><w lemma="strong:H02098">which</w></transChange>
Exod 34.19	<transChange type="added"><w lemma="strong:H02142" morph="strongMorph:TH8735">that is male</w></transChange>
Num 1.16	These <transChange type="added"><w lemma="strong:H07148">were</w></transChange>
Num 3.19	These <transChange type="added"><w lemma="strong:H01992">are</w></transChange>
Num 10.28	Thus <transChange type="added"><w lemma="strong:H0428">were</w></transChange>
Num 13.3	<transChange type="added"><w lemma="strong:H01992">were</w></transChange>
Num 14.28	unto them, <transChange type="added"><w lemma="strong:H03808">As truly as</w></transChange>
Num 20.13	This <transChange type="added"><w lemma="strong:H01992">is</w></transChange>
1Sam 30.27	To <transChange type="added"><w lemma="strong:H0834">them</w></transChange>
2Kgs 19.31	<transChange type="added"><w lemma="strong:H06635" morph="strongMorph:TH8675">of hosts</w></transChange>
2Chr.10.16	<transChange type="added"><w lemma="strong:H07200" morph="strongMorph:TH8804">saw</w></transChange>
Ezra 2.65	and <transChange type="added"><w lemma="strong:H0428">there were</w></transChange>
Ps.17.6		unto me <transChange type="added"><w lemma="strong:H08085" morph="strongMorph:TH8798">and hear</w></transChange>
Ps 39.3		<transChange type="added"><w lemma="strong:H0227">then</w></transChange>
Jer 6.14	<transChange type="added"><w lemma="strong:H01323" morph="strongMorph:TH8676">of the daughter</w></transChange>
Jer 28.9	<transChange type="added"><w lemma="strong:H0227">then</w></transChange>
Jer.51.53	<transChange type="added"><w lemma="strong:H0227">yet</w></transChange>
I'll need some help determining how these should be changed, if at all. It may be that the KJV uses italics for a purpose other than "added" words. --Dmsmith 19:33, 18 February 2014 (MST)
Italics is presentational formating. The transChange element is semantic. David Haslam 05:19, 19 February 2014 (MST)
The 1611 and 1769 editions of the KJV didn't have an eText with semantic markup. Any semantic markup we have deduces the intention of the authors/printers from the orthographic representation. --Dmsmith 07:00, 19 February 2014 (MST)
We submitted this list to David Instone-Brewer and received a detailed reply. David Haslam 06:37, 17 September 2015 (MDT)

Added words & the Divine Name

We found one instance of the Divine Name element within a transChange element. This is probably inappropriate.

The one example is in 2 Chronicles 17:4. It is rendered in italics and small caps. So accordingly it is an added word representing the tetragrammaton. It is represented properly in OSIS. --Dmsmith 17:49, 18 February 2014 (MST)

Tagging the Divine Name

This much more complicated than we thought. Observations:

  1. The divine name is also tagged within some study notes (even twice within the same note a few times).
  2. More than one Strong's number is involved.
  3. Five instances are in the NT where the Greek word κυριος is tagged.
  4. There is one instance where two Strong's numbers are joined to the divine name.
  5. In many places, there are some English words between the Strong's tag and the divine name tag.
  6. There are places where the divine name is tagged, even though it is within a transChange element (see previous subsection).
  7. The three hyphenated forms of the divine name (Jehovah–jireh, Jehovah–nissi, Jehovah–shalom) are not tagged in the main text, only in the study notes.
  8. The other two hyphenated forms of the divine name (Jehovah–shammah, Jehovah–tsidkenu) occur only in the study notes, where the English word (Lord) is tagged.

Divine Name tagging in the KJV follows the "small caps" orthographic representation of Lord, God, Yah.
As such, it is found in added words and notes not being associated with Strong's Numbers.
The Strong's Numbers tagged are H3068, H3069, H3072 and H3050. The first is the tetragrammaton. The second and third are variations of it. The last is Yah.
In the NT, the orthographic representation of Lord as the divine name are backed by Greek, not Hebrew.
The instance of two Strong's Numbers being associated with divine name needs to be reviewed. The leading word is "face", often translated "before" or "presence".
Jer.26.19 <w morph="strongMorph:TH8762" lemma="strong:H02470">and besought</w>
<w lemma="strong:H06440 strong:H03068">the <divineName>Lord</divineName></w>
Here the leading word is left untranslated.

Reference text policy

It may be sensible to review whether we chose the most suitable published text as our reference standard. The most widely accepted one is the Cambridge University Press - Concord Reference Bible.

Need two things: an e-text and permission for the text. (I think the "crown" claims copyright.) --Dmsmith 19:36, 18 February 2014 (MST)
Crown Copyright applies to the Authorised Version per se, not just to those printed by CUP, who are merely one of the licensed printers for all the works that come under Royal Letters Patent. Refer to our Copyright page David Haslam 05:08, 19 February 2014 (MST)

Hebrew words

Following the addition of Greek words in the NT from the TR in version 2.6, is it planned to do likewise for Hebrew (& Aramaic) words in the OT from the MT ?

It would be wonderful. However, the tagging of Strong's numbers provided a map to the TR in the src="x y" attribute, where that gave the position of the word in the TR. So, the addition of the Greek was trivial. We have nothing like that for the MT. It won't be trivial. Also, the Strong's tagging in the OT is not comprehensive. In any given verse only some of the words from the MT are tagged. In the NT all the TR words were present in the tagging, even if empty (i.e. untranslated.)
We are more likely to update the morphology of the OT first for those that have some kind of morphology today.
--Dmsmith 09:31, 25 February 2014 (MST)

Psalm 119 Acrostic Stanza Titles

The 22 Hebrew letter acrostic titles in Psalm 119 should be displayed before the first verse of each eight-verse stanza. Currently, the next verse tag is displayed before each stanza title. This is incorrect when compared to the KJV printed edition. The mod2imp output for the first such title is:

$$$Psalms 119:1
<title canonical="true" type="acrostic"><foreign n="א">ALEPH.</foreign></title>
<w lemma="strong:H0835">Blessed</w> <transChange type="added">are</transChange> 
<w lemma="strong:H08549">the undefiled</w> 
<w lemma="strong:H01870">in the way</w>, 
<w lemma="strong:H01980" morph="strongMorph:TH8802">who walk</w> 
<w lemma="strong:H08451">in the law</w> 
<w lemma="strong:H03068">of the <divineName>Lord</divineName></w>.
<note type="study">undefiled: or, perfect, or, sincere</note>

Do we need to wrap each stanza title between suitably constructed milestone preverse div elements?

The preverse div should never be constructed in xml. It is created by osis2mod.
Done in version 2.8----Dmsmith 05:33, 20 December 2015 (MST)
Did you just move the titles to before the stanza? David Haslam 05:43, 20 December 2015 (MST)
So far, yes. I'm testing it. I may have to put it in a section div or change osis2mod. If a div, that would be the first instance of a section div, which may add vertical whitespace that is not present elsewhere in the KJV.--Dmsmith 05:46, 20 December 2015 (MST)

Multiple whitespace

Within the text source for 2.7 (kjvfull.xml) there are 39 instances of double spaces (outside the header):

  • 36 are immediately after the "w" in a w element
  • 2 are after a closing " but within a w element
  • 1 is between two w elements; in Phil.4.2, and is displayed by SWORD as
    that they  be of the same mind

The latter should be corrected.

Done in version 2.8----Dmsmith 05:31, 20 December 2015 (MST)

Missing punctuation in notes

  • There are 3 study notes contain the abbreviation "Heb" with no full-stop after the abbreviation. The locations are 2Chr.2.16, Isa.9.20, Jer.13.21
Done in version 2.8----Dmsmith 05:36, 20 December 2015 (MST)

Add language identifier for foreign element

Suggest add the following attribute to the foreign element in each acrostic title:


Refer to OSIS Reference Manual.

Done in version 2.8--Dmsmith 06:18, 20 December 2015 (MST)

Should also identify and add other foreign elements.--Dmsmith 06:18, 20 December 2015 (MST)

MENE, MENE, TEKEL, UPHARSIN (what language code?) David Haslam 10:00, 20 December 2015 (MST)


There are 75 instances of the whole word "Selah" in the KJV.[1] The first is in II Kings 14:7. The rest are found in Psalms (71) and Habbakuk (3).
Of those in Psalms, these 13 locations have the peculiarity in that the Strongs markup includes other words besides Selah.

    <w lemma="strong:H05542">thereof. Selah</w>.
    <w lemma="strong:H05542">me. Selah</w>.
    <w lemma="strong:H05542">himself. Selah</w>.
    <w lemma="strong:H05542">before them. Selah</w>.
    <w lemma="strong:H05542">for us. Selah</w>.
    <w lemma="strong:H05542">themselves. Selah</w>.
    <w lemma="strong:H05542">upon us; Selah</w>.
    <w lemma="strong:H05542">of it. Selah</w>.
    <w lemma="strong:H05542">thee. Selah</w>.
    <w lemma="strong:H05542">there. Selah</w>.
    <w lemma="strong:H05542">thee? Selah</w>.
    <w lemma="strong:H05542">for me. Selah</w>.
    <w lemma="strong:H05542">themselves. Selah</w>.

These words & punctuation do not belong properly to the "Selah" but are part of the preceding sentence or phrase. It may therefore be sensible to convert them like this:

    thereof. <w lemma="strong:H05542">Selah</w>.
    me. <w lemma="strong:H05542">Selah</w>.
    himself. <w lemma="strong:H05542">Selah</w>.
    before them. <w lemma="strong:H05542">Selah</w>.
    for us. <w lemma="strong:H05542">Selah</w>.
    themselves. <w lemma="strong:H05542">Selah</w>.
    upon us; <w lemma="strong:H05542">Selah</w>.
    of it. <w lemma="strong:H05542">Selah</w>.
    thee. <w lemma="strong:H05542">Selah</w>.
    there. <w lemma="strong:H05542">Selah</w>.
    thee? <w lemma="strong:H05542">Selah</w>.
    for me. <w lemma="strong:H05542">Selah</w>.
    themselves. <w lemma="strong:H05542">Selah</w>.

This issue was already communicated by email on 2015-09-09. David Haslam

It can be readily fixed by a simple Perl replacement, thus: Perl pattern [(<w lemma="strong:H05542">)(.+)(Selah</w>)] with [$2$$1$$3]

  [X] Match case
  Maximum text buffer size 4096
  [ ] Maximum match (greedy)
  [ ] Allow comments
  [ ] '.' matches newline
  [X] UTF-8 Support

David Haslam Note:

  1. Although OSIS defines an attribute value type="selah", this only applies to the poetry line element l, none of which are used in the KJV.

Punctuation and Strongs

A much more general issue was also reported. Namely, tagged w elements that span beyond the end of a sentence or phrase. Many of these can be identified by the fact that the spanned text includes at least one terminating punctuation mark [.,;:!?)]. Some of these even contain two or more such punctuation marks, so devising a regexp is a bit fraught. Moreover, for some of those that have a comma, it may be perfectly valid to include the preceding word[s]. Less likely for the other punctuation marks.

Searching for different regexps such as [>.+\?.+</w>] I counted the following:

Count    Punctuation mark
 219     Full-stop       
7646     Comma (of which 444 have two or more commas)
1215     Colon
1064     Semicolon
  22     Exclamation mark
 254     Question mark
  11     Right parenthesis
  13     Left parenthesis (all these also contain another pm)

It's often the case that the English word that matches the Strong's tag is the last word before the </w>. Even so, I have not proven that this applies to 100% of the above patterns.

This issue was also reported by email on 2015-09-10. David Haslam

Parsing the study notes

These subsections document my analyses of the KJV study notes, as well as proposing ways in which we might improve them using standard OSIS markup.
In Xiphos, the text within a catchWord or a rdg element is displayed in italics.

The proposals detailed in the following subsections have been implemented in KJV module version 2.9 that was released on 2016-01-21.

OSIS catchWord element

Most of the study notes in the KJV source text have recognisable key words or key phrases. These should be marked up using the OSIS catchWord element. e.g.

<note type="study">the light from…: Heb. between the light and between the darkness</note>

should become

<note type="study"><catchWord>the light from…</catchWord>: Heb. between the light and between the darkness</note>

The catchWord element would be added by pattern matching to the first colon in the note text.
Out of 6959 study notes, there are 147 notes with more than a single colon, 140 of which have ": or, ".
David Haslam

OSIS rdg element

Many of the study notes record alternative or more literal renderings of the Hebrew, Chaldee[1] or Greek[2] text.
We might wish to wrap all such readings within the OSIS rdg element. e.g.

<note type="study">And the evening…: Heb. And the evening was, and the morning was etc.</note>

would become

<note type="study"><catchWord>And the evening…</catchWord>: Heb. <rdg>And the evening was, and the morning was</rdg> etc.</note>

Proposed type attributes for the rdg element:

  • type="alternate"[3] – used when the subsequent text was introduced by ", or, "[4]
  • type="x-equivalent" – used when the note gives the Gr. equivalent (LXX ?) to a Hebrew name.
  • type="x-identity" – used when the note has " also called, " (sometimes without a comma).
  • type="x-literal" – used when the note gives the Heb. translation more literally than the main text.
  • type="x-meaning" – used when the note explains the meaning of a Heb. or Chaldee name or word. Sometimes introduced by " that is, "

The earlier example would then become:

<note type="study"><catchWord>And the evening…</catchWord>: Heb. <rdg type="x-literal">And the evening was, and the morning was</rdg> etc.</note>

A few notes will end up with a rdg element inside a rdg element.
We use a customised OSIS schema, so the inner one is not wrapped within a seg element in order to be valid OSIS.


  1. i.e. Biblical Aramaic.
  2. Among the OT notes there are 35 instances of "Gr. " with equivalent names to the Hebrew.
  3. As documented in the OSIS 2.1.1 Reference Manual.
  4. Of the 2672 notes containing this string, there are 125 notes that contain it twice, and 1 note that has it thrice (see below).
<note type="study">a beacon: or, a tree bereft of branches, or, boughs: or, a mast</note>

David Haslam

transChange type="amplified"

The following six study notes were candidates for this. They didn't match the concept of a reading.

<note type="study"><catchWord>selvedge</catchWord>: <transChange type="amplified">an edge of cloth so woven that it cannot unravel</transChange></note>
<note type="study"><catchWord>the caul</catchWord>: <transChange type="amplified">it seemeth by anatomy, and the Hebrew doctors, to be the midriff</transChange></note>
<note type="study"><catchWord>selvedge</catchWord>: <transChange type="amplified">an edge of cloth so woven that it cannot unravel</transChange></note>
<note type="study"><catchWord>his reign</catchWord>: <transChange type="amplified">Nebuchadnezzar’s eighth year</transChange></note>
<note type="study"><catchWord>behemoth</catchWord>: <transChange type="amplified">probably an extinct animal of some kind</transChange></note>
<note type="study"><catchWord>leviathan</catchWord>: <transChange type="amplified">probably an extinct animal of some kind</transChange></note>

David Haslam

Punctuation & typos

I doubt if anyone has ever thoroughly audited the punctuation in the KJV study notes against the standard reference printed edition.e.g.

  • There were 12 instances of " also called " without a comma, compared to 143 instances of " also called, " with the comma.
  • There were 2 instances of ", of" that have now been corrected to ", or".
Audit to be continued even after version 2.9 was released.

David Haslam

Everything below here is yet to be done.

catchWord osisRef

We might further enhance the study notes by adding a suitable osisRef to each catchWord element. Fine grain references could be used to locate the catchWord text.

  • For a single keyword, we could use the @s[word] operator to search for the keyword.[1]
  • For a keyphrase (multiple words), we could use the @cp operators to specify the range of codepoints in the verse text.

The former would be easier to implement than the latter.[2]


  1. This will only find the first occurrence of the word.
  2. Care would be required when the catchWord text ends with an ellipsis.

David Haslam

Note tag placement

Notes appertaining to Psalm titles

Study notes appertaining to text within a Psalm title are currently placed at the end of verse 1, just like any other note. To prevent these notes being orphaned when headings are hidden, it is proposed to move these notes to within the title element. As some Psalms also have one or more note appertaining to text within verse 1, this change will require careful manual editing, rather than automating by a script.

Before or after the keyword?

My late mother's 1936 Collins edition of the KJV has centre margins with notes and cross-references. Of particular interest is that the cross-references tags are italicised lowercase superscript letters and the note tags are superscripted integers. These are positioned at the start of each word being referenced. This practice differs from how many of our modules are marked up, where the tag is often placed at the end of the word being referenced. In the KJV module, however, all the study note tags are at the end of the verse. David Haslam

Cross references?

Sadly lacking from our KJV module are any scripture cross-references. Many printed editions of the AV contain such references. We should explore how the module might be enhanced by obtaining the data from a suitable electronic source.

[1] is of interest in this context, but see the foot of [2] which describes the sources of the data.

The 1769 edition of the KJV included Benjamin Blayney's cross-references. Many of the OT references therein were to the Deuterocanonical books. See also [3].

One cross-reference already

The note for II Samuel 23:8 contains a cross-reference! This is how I would parse this note.
The text in italics is a reading, the rest being note text.

<note type="study">1ch 11:11 he lift…: from whom he…: Heb. slain</note>

II Samuel 23:8 reads: (emphasis mine)

These be the names of the mighty men whom David had: The Tachmonite that sat in the seat, 
chief among the captains; the same was Adino the Eznite:
he lift up his spear against eight hundred, whom he slew at one time.

The text "he lift up his spear" is in italics as per <transChange type="added").

I Chronicles 11:11 reads: (emphasis mine)

And this is the number of the mighty men whom David had; Jashobeam, an Hachmonite, 
the chief of the captains: he lifted up his spear against three hundred slain by him at one time.

But could "1ch 11:11" have been a cross-reference with its own separate tag?

This note was the only one that has not been marked up in KJV version 2.9

Aside: There's also a numerical discrepancy between these two verses in the Hebrew text. David Haslam

NT margin notes

The KJV module has study notes only in the OT. We should find a source text that also contains all the margin notes found in the NT. e.g.

Matt.1.11@s[Josias] Some read, Josias begat Jakim, and Jakim begat Jechonias. 1 Chr. 3.15.

This example just happens to also contain a cross-reference, but many others do not. David Haslam


As discussed before under Hyphenation, only five words in the NT use a hyphen/minus, eleven occurrences in total for the whole Bible. Seeing as the text of the KJV module already requires a font that includes the en dash (U+2013), and thus is not restricted to ASCII, I see no reason why we shouldn't replace these hyphen/minus by the proper Unicode character for hyphen, U+2010. The five words are:

3	God-ward
1	joint-heirs
1	thee-ward
3	us-ward
3	you-ward

Chapter descriptions

The reference edition of the KJV included chapter descriptions (equivalent to USFM tag \cd_text) with a verse number giving the placement of each section thus described. We might consider adding these. In OSIS, the verse numbers should be made into active links.

David Haslam

Pilcrow signs

This section was moved from User:Dmsmith/KJV2011 – it having never been actioned. David Haslam
  • In printed editions of the KJV, there is normally a space immediately after the ¶. When viewed with Xiphos, there is no such space.
    This is not an artifact of how the SWORD engine handles the OSIS markup. Example:
<verse osisID="Gen.1.6" sID="Gen.1.6"/><milestone type="x-p" marker="¶"/><w lemma="strong:H0430">And God</w> <w morph="strongMorph:TH8799" lemma="strong:H0559">said</w>, ...

Would a simple solution be to change it to marker="¶ ", i.e. with a space after the Pilcrow sign?

Yes, this is a fine solution. --Dmsmith 21:07, 12 November 2011 (MST)
Slight complication! – when the verse starts as red letters, the space is already displayed after the Pilcrow. Compare Matthew 22:11 with Matthew 22:15. David Haslam 08:28, 14 November 2011 (MST)
This is no longer the case! David Haslam 09:49, 11 September 2015 (MDT)
When red letters are on (the verse starts as red letters), there is a space between the verse tag and the text, but when red letters are off, there is no space between the verse tage and the text. This is a different issue. David Haslam 01:39, 23 January 2016 (MST)
I think this is in the SWORD engine, as I see this also on PocketSword. David Haslam 01:40, 23 January 2016 (MST)
On second look, it would be a fine solution, but Xiphos already has special code to add the space. The change should not be made if it is disruptive to Xiphos. --Dmsmith 07:21, 17 November 2011 (MST)
Surely two spaces (where there are red letters) is preferable to not having any space after the pilcrow elsewhere? David Haslam 12:22, 28 May 2012 (MDT)
Xiphos 4.0.4 does not add any space after the ¶ so maybe that special code was never implemented? David Haslam 09:46, 11 September 2015 (MDT)

See also

External links