Difference between revisions of "Talk:DevTools:conf Files"

From CrossWire Bible Society
Jump to: navigation, search
(Proposal for new GlobalOptionFilter=OSISNamesBold to emphasise proper names marked with the <name> element: new section)
(Missing sections for RTF, Continuation, Localisation: :Links that (in effect) pointed to https://wiki.crosswire.org/DevTools:conf_Files#Localization are all now broken because this section was removed! ~~~~)
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Column is too wide in table ==
+
I have following paragraph here presevered until a decision is made.
  
After Osk autowikified the table, the second column is too wide. Having to use horizontal scrollbar in webpages is a PITB. [[User:David Haslam|David Haslam]] 10:19, 13 November 2008 (UTC)
+
==== Normalization ====
:Sorted.  [[User:David Haslam|David Haslam]] 07:33, 18 November 2008 (UTC)
+
:''Currently, this optional key is a discussion proposal.''
  
== HTML and RTF in columns that do not allow for it ==
+
Background: "Unicode Normalization can break [[Fonts#Hebrew|Biblical Hebrew]]."
The Allowed column is for showing what columns allow for HTML and RTF. It is never OK to use it in any other field. While it may work in a front-end, it is guaranteed to not work in others.--[[User:Dmsmith|Dmsmith]] 18:28, 23 February 2013 (MST)
 
:Why the almost top posting? Tip: Use '''Add Topic''' in future. [[User:David Haslam|David Haslam]] 13:08, 24 February 2013 (MST)
 
:Never noticed the "Add topic" tab. Top posting makes sense that the most recent conversations are at the top. And old ones, having been addressed are lower down. But it doesn't matter to me. Either is fine. More critical to sign and date.--[[User:Dmsmith|Dmsmith]] 09:14, 22 April 2013 (MDT)
 
  
== Suggestion for a new element in conf files ==
+
Most modules with Unicode source text are encoded as '''UTF-8''' and normalized to '''NFC''', these being the ''default settings'' for both [[Osis2mod|<tt>osis2mod</tt>]] & <tt>tei2mod</tt>.
  
It would be useful to permit a non-specific element called simply '''Notes'''.  This element could contain anything not covered by About, Description, CopyrightNotes or DistributionNotes, etc. It could be used to record further information about various aspects of the module or the module creation process such as a description of the preprocessing steps. Such information could be very useful to developers. [[User:David Haslam|David Haslam]] 13:13, 13 October 2011 (MDT)
+
Now these two module creation tools have a '''-N''' command line switch to prevent conversion to UTF-8 and Normalization.  
  
== Suggestion for new OSIS filter to toggle the appearance and activeness of dictionary lookup links ==
+
Biblical Hebrew source text with both vowel accents and cantillation may be supplied properly with custom [[Encoding#Normalization|normalization]] as required by the text provider. It should still be encoded UTF-8.
  
The [http://code.google.com/p/xulsword/ xulsword] application includes support for toggling the display of dictionary lookup references. This is implemented by means of the following additional properties in the configuration file for any OSIS text module.
+
As there is a need to create modules from source text that has such a custom ordering of the diacritics, it may be useful to provide information in the .conf file for such modules that are intentionally not normalized to NFC during build. The following method is proposed:
  
  DictionaryModule=glossary_module_name
+
  Normalization=Custom
GlobalOptionFilter=OSISDictionary
 
  
The xulsword developer suggests that this idea might be useful to incorporate into the SWORD and JSword API, and for other front-ends to start to make use of it.
+
It should be assumed that modules where this is specified are made using the '''-N''' switch in the module creation tool.
  
[[User:David Haslam|David Haslam]] 12:12, 5 January 2012 (MST)
+
Normalization is useful to ensure (e.g.) that a search index stores all words the same way. That's why for the most part, modules are expected to be in NFC form. Custom normalization is still a normalization. What's different about it is that the combining classes for each character are different from the canonical combining classes defined by the Unicode Consortium.
  
:See also http://code.google.com/p/osis-converters/wiki/Compatibility [[User:David Haslam|David Haslam]] 06:57, 20 January 2012 (MST)
+
To create a search index for such a module such that it does not automatically use NFC, give the <tt>mkfastmod</tt> command with the '''-N''' switch.<ref>Added to SWORD SVN by DM Smith on 2018-01-07.</ref><ref>The UI in front-ends for creating a search index should ideally also be enhanced to support this option.</ref>
  
::But now see https://github.com/johnaustindev/osis-converters &ndash; especially the section of '''ReadMe.md''' headed '''Deprecated (no longer output by osis-converters)''' [[User:David Haslam|David Haslam]] 14:27, 11 January 2016 (MST)
+
--[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 09:33, 7 January 2018 (MST)
  
== Localization ==
 
 
The policy advice in the localization section is somewhat ambiguous. Clarification is needed. [[User:David Haslam|David Haslam]] 03:52, 9 January 2012 (MST)
 
 
: It's clarified a bit more now, I hope. --[[User:Osk|Osk]] 06:12, 9 January 2012 (MST)
 
 
== OSISVersion ==
 
 
Is there a default for OSISVersion (like there is for MinimumVersion) ? [[User:David Haslam|David Haslam]] 06:49, 20 January 2012 (MST)
 
 
: No. With MinimumVersion, there really is a default, in the sense that if you fail to set one in the .conf, the library will set one for you so that any front end can query the value and reliably get some answer. OSISVersion is essentially just informational, and I don't believe anyone has put it to any use. --[[User:Osk|Osk]] 20:29, 20 January 2012 (MST)
 
 
== Broken links for Creative Commons license summaries ==
 
 
As Osk had just added a link for CC0, I tried some of the other Creative Commons links, and found that they go to pages that no longer exist. Needs follow-up. [[User:David Haslam|David Haslam]] 01:19, 11 March 2012 (MST)
 
:Reminder &ndash; still needs doing. [[User:David Haslam|David Haslam]] 02:39, 11 February 2013 (MST)
 
 
== OSISXlit & OSISEnum  ==
 
 
These new global filters ('''OSISXlit''' & '''OSISEnum''') need describing in more detail, with suitable examples in the page for [[OSIS Bibles]] or wherever most suitable. [[User:David Haslam|David Haslam]] 13:52, 28 June 2012 (MDT)
 
 
== Strip Filters ==
 
 
A recent post to the sword-devel mailing list describes the use of local '''Strip Filters'''. I hope to insert this description into the main page presently. [[User:David Haslam|David Haslam]] 09:04, 10 December 2012 (MST)
 
:Done. [[User:David Haslam|David Haslam]] 04:51, 13 December 2012 (MST)
 
 
== Promo & ShortPromo? ==
 
 
SomKQA module version 1.0 had a config property named Promo. We document the property named ShortPromo. Was the original name changed at some time, or is this merely a minor error in the somkqa.conf file? [[User:David Haslam|David Haslam]] 09:47, 12 December 2012 (MST)
 
 
== JSword and language codes ==
 
 
Currently, JSword does not support language code subtags, but there is now a plan to implement this. (Email conversation with DM today). [[User:David Haslam|David Haslam]] 04:52, 25 January 2013 (MST)
 
 
== Proposed Scope Attribute ==
 
 
Scope
 
 
&lt;osisRef&gt;<ref>Use OSIS Book names. Book, chapter and verse are separated by '.', dot. Ranges are with respect to the order of books in the v11n. Both ends of a range must be fully specified. Ranges are separated by spaces. While order in an osisRef is undefined, they should be ordered here.</ref><br/>Indicates that the versification is limited to subset of its books, chapters and/or verses.
 
 
When there are only a few verses different within a chapter, it may be permissible to include them. But where there are many, do so.
 
 
Examples:<br>To leave out deuterocanonical material from Synodal use the following:<br/>Gen-Josh.4.33 Judg-2Chr Ezra Neh Esth-Ps.150 Prov.0<ref>When used in a verse range, the start of a book is chapter 0, the introduction. If the book has no introduction then Prov.1 is fine.</ref>-Prov.4.27 Prov.5-Prov.13.25 Prov.14-Prov.18.24 Prov.19-Song Isa-Lam Ezek-Dan.3.33 Dan.4-Dan.12 Hos-Mal Matt-Rev
 
 
To note that a module only contains the NT, the following would be appropriate for many v11n:<br>Matt-Rev<ref>At the moment, there is no way to specify the inclusion of module or testament introductions. JSword (in development) is using Intro.Bible, Intro.OT and Intro.NT.</ref>
 
 
'''Notes:'''
 
 
<references />
 
<references />
  
== Feature=NoParagraphs ==
 
 
It should be noted that using Feature=NoParagraphs for our KJV module would be inappropriate.  Even though the KJV is normally printed as Verse Per Line without any paragraphing, we have taken it upon ourselves to implement the pilcrows ¶ as paragraph markers (which no doubt they are), and those chapters containing pilcrows are divided into OSIS paragraphs using the '''p''' element. At the same time, the pilcrows were converted to marker attribute values. [[User:David Haslam|David Haslam]] 02:00, 15 June 2013 (MDT)
 
 
:It should also be noted that the OSIS XML file used as the source for our KJV module was carefully hand-crafted. In approaching other Biblical texts containing pilcrows ¶, the Python converter usfm2osis.py treats these as part of ordinary text, and not as something special. Were we to desire similar functionality to the KJV, then extra measures would be required to make further changes to the OSIS file output from the Python script. [[User:David Haslam|David Haslam]] 02:05, 15 June 2013 (MDT)
 
 
== OSISMorphSegmentation ==
 
 
GlobalOptionFilter='''OSISMorphSegmentation''' is not yet documented in the main page. This module property is specified in wlc.conf for the Westminster Leningrad Codex. [[User:David Haslam|David Haslam]] 14:55, 13 March 2014 (MDT)
 
 
:Still awaiting some explanation of what this property actually toggles. [[User:David Haslam|David Haslam]] 09:04, 11 April 2014 (MDT)
 
 
:Also worth noting, this option is not listed in the syntax help for diatheke. [[User:David Haslam|David Haslam]] 14:56, 13 March 2014 (MDT)
 
 
::Added to main page. Currently we are not aware of any front-end that makes use of this filter to show/hide anything.<br>The developers of the '''STEP Bible''' front-end may be planning to do something with it. [[User:David Haslam|David Haslam]] 07:28, 28 April 2014 (MDT)
 
 
== Companion modules ==
 
 
The new table row for Companion was added after an exchange of emails with Karl. [[User:David Haslam|David Haslam]] 11:33, 22 December 2014 (MST)
 
 
== Exceptions in the Normalization of Malayalam ==
 
 
[https://en.wikipedia.org/wiki/Malayalam Malayalam] is one of the scripts for which there are known rare exceptions to the rules governing Unicode Normalization. This issue was observed during our work on the 1910 Malayalam Bible module development.
 
 
I made a simple bespoke TextPipe filter to insert a [https://en.wikipedia.org/wiki/Combining_Grapheme_Joiner Combining Grapheme Joiner] in between two pairs of Malayalam codepoints.
 
 
The replacement filter is described:
 
 
    (\x{0D46})(\x{0D3E})    $1\xCD\x8F$2
 
    (\x{0D46})(\x{0D57})    $1\xCD\x8F$2
 
 
The search patterns are [https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions PCRE]. The CGJ is inserted using its UTF-8 byte codes. There were 17 of the former pattern and 2 of the latter pattern. This does prevent the improper normalization of these exceptional code pairs. [[User:David Haslam|David Haslam]] 06:26, 10 December 2015 (MST)
 
:It later emerged that this wasn't necessary, as it was found that these 19 locations had been mistyped. Normalization to NFC actually corrected the mistyping. [[User:David Haslam|David Haslam]] 08:11, 10 December 2015 (MST)
 
 
== Undocumented Option Filters ==
 
 
From a post in the sword-devel mailing list:<BR>
 
:Jaak recently found two undocumented option filters, '''GreekLexAttribs''' and '''PapyriPlain''', which inherit from SWOptionFilter and use SWOptionFilter::SWOptionFilter(). Does anyone have detailed information about these? [[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 08:53, 1 November 2016 (MDT)
 
 
== Creative Commons changes ==
 
 
The links and verbatim strings are no longer correct for the various '''Creative Commons''' licences due to changes in the Creative Commons website.
 
 
In particular, the release of
 
 
[https://creativecommons.org/licenses/by-sa/4.0/ Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)]
 
 
and similar changes to version 4.0 etc.
 
 
This [[DevTools:conf Files#Copyright_.26_Licensing_related_elements|section]] of the page requires updating to reflect the changes made by Creative Commons.
 
 
It should be edited by someone that thoroughly understands the CC licenses.
 
 
Existing modules that refer to earlier versions of the CC license may need to be updated as well.
 
 
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 04:24, 10 January 2017 (MST)
 
 
== zText4 and zCom4 support ==
 
 
2017-01-31 JSword now supports zText4 and zCom4. That's the background to my edit to the page today.
 
:SWORD does not yet support these drivers, but is planned to have support in SWORD 1.8
 
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 11:10, 31 January 2017 (MST)
 
  
== Proposal for new GlobalOptionFilter=OSISNamesBold to emphasise proper names marked with the &lt;name> element ==
+
[[Talk:DevTools:conf Files/Archive]]
  
For the many languages whose script does not have any division between uppercase and lowercase characters, it is less obvious which words in Scripture are proper names. I therefore propose that we add a new filter:
+
== Missing sections for RTF, Continuation, Localisation ==
  
GlobalOptionFilter=OSISNamesBold
+
Some edits during the last 3 years seem to have removed the 3 sections linked by words in the '''Allowed''' column. The links don't go anywhere now. This error needs fixing. [[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 13:40, 14 March 2020 (UTC)
 +
:Links that (in effect) pointed to https://wiki.crosswire.org/DevTools:conf_Files#Localization are all now broken because this section was removed! [[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 09:47, 16 May 2020 (UTC)
  
When this is enabled, all words marked using the OSIS '''name''' element could then be rendered using the bold font style. Once implemented and rolled out into front-ends, the feature could be tested by a future release of the KJV module in which the '''name''' element will be used accordingly, as per our roadmap.
+
== Weird consequences of specifying an Abbreviation that matches another module name ==
  
[[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 01:29, 15 February 2017 (MST)
+
While working on '''KJV''' module development my e'''X'''perimental build is called '''KJVX'''. Last week I tried including
 +
Abbreviation=KJV
 +
in <tt>kjvx.conf</tt> and discovered today that this has weird consequences in Xiphos. When I selected the released module '''KJV''', any minor change to this module display settings made Xiphos show module '''KJVX''' instead because of the Abbreviation clash. [[User:David Haslam|David Haslam]] ([[User talk:David Haslam|talk]]) 13:06, 11 May 2020 (UTC)

Latest revision as of 09:47, 16 May 2020

I have following paragraph here presevered until a decision is made.

Normalization

Currently, this optional key is a discussion proposal.

Background: "Unicode Normalization can break Biblical Hebrew."

Most modules with Unicode source text are encoded as UTF-8 and normalized to NFC, these being the default settings for both osis2mod & tei2mod.

Now these two module creation tools have a -N command line switch to prevent conversion to UTF-8 and Normalization.

Biblical Hebrew source text with both vowel accents and cantillation may be supplied properly with custom normalization as required by the text provider. It should still be encoded UTF-8.

As there is a need to create modules from source text that has such a custom ordering of the diacritics, it may be useful to provide information in the .conf file for such modules that are intentionally not normalized to NFC during build. The following method is proposed:

Normalization=Custom

It should be assumed that modules where this is specified are made using the -N switch in the module creation tool.

Normalization is useful to ensure (e.g.) that a search index stores all words the same way. That's why for the most part, modules are expected to be in NFC form. Custom normalization is still a normalization. What's different about it is that the combining classes for each character are different from the canonical combining classes defined by the Unicode Consortium.

To create a search index for such a module such that it does not automatically use NFC, give the mkfastmod command with the -N switch.[1][2]

--David Haslam (talk) 09:33, 7 January 2018 (MST)

  1. Added to SWORD SVN by DM Smith on 2018-01-07.
  2. The UI in front-ends for creating a search index should ideally also be enhanced to support this option.


Talk:DevTools:conf Files/Archive

Missing sections for RTF, Continuation, Localisation

Some edits during the last 3 years seem to have removed the 3 sections linked by words in the Allowed column. The links don't go anywhere now. This error needs fixing. David Haslam (talk) 13:40, 14 March 2020 (UTC)

Links that (in effect) pointed to https://wiki.crosswire.org/DevTools:conf_Files#Localization are all now broken because this section was removed! David Haslam (talk) 09:47, 16 May 2020 (UTC)

Weird consequences of specifying an Abbreviation that matches another module name

While working on KJV module development my eXperimental build is called KJVX. Last week I tried including

Abbreviation=KJV

in kjvx.conf and discovered today that this has weird consequences in Xiphos. When I selected the released module KJV, any minor change to this module display settings made Xiphos show module KJVX instead because of the Abbreviation clash. David Haslam (talk) 13:06, 11 May 2020 (UTC)