Difference between revisions of "Talk:Wycliffe Bible"
David Haslam (talk | contribs) (→Collaboration: new section) |
David Haslam (talk | contribs) (→TextPipe filter details: new section) |
||
Line 6: | Line 6: | ||
I have uploaded the USFM files to a private folder in my box.net account. If anyone in CrossWire would like to collaborate on this further work, please contact me. I can easily make the folder suitable for collaboration. [[User:David Haslam|David Haslam]] 16:16, 17 March 2011 (UTC) | I have uploaded the USFM files to a private folder in my box.net account. If anyone in CrossWire would like to collaborate on this further work, please contact me. I can easily make the folder suitable for collaboration. [[User:David Haslam|David Haslam]] 16:16, 17 March 2011 (UTC) | ||
+ | |||
+ | == TextPipe filter details == | ||
+ | |||
+ | Here is an exported view of the bespoke TextPipe filter I used to convert the data to USFM: | ||
+ | |||
+ | <pre> | ||
+ | TextPipe Single User Edition | ||
+ | Purchased by: David Haslam, David Haslam | ||
+ | |||
+ | Filter Title: C:\Users\David\TextPipe Filters\Special filter to convert Wycliffe Bible from the Wesley Center Online to rudimentary USFM.fll | ||
+ | |||
+ | Filter List | ||
+ | ----------- | ||
+ | Filter options | ||
+ | | [ ] Log to file | ||
+ | | [X] Append to logfile | ||
+ | | Log filename: textpipe.log | ||
+ | | Threshold 500 | ||
+ | | | ||
+ | |--Input from file(s) | ||
+ | | [ ] Confirm before processing each file | ||
+ | | [ ] Confirm before processing read/only files | ||
+ | | [ ] Delete input files after processing | ||
+ | | Process binary files | ||
+ | | | ||
+ | |--Comment... | ||
+ | | | Special filter to convert Wycliffe Bible from the Wesley Center Online to rudimentary USFM | ||
+ | | | | ||
+ | | |--Comment... | ||
+ | | | | Convert ANSI to UTF-8 | ||
+ | | | | | ||
+ | | | +--Convert from ANSI to UTF-8 | ||
+ | | | | ||
+ | | |--Comment... | ||
+ | | | | Convert chapter numbers to tags | ||
+ | | | | | ||
+ | | | |--Comment... | ||
+ | | | | | Deal first with various exceptions | ||
+ | | | | | | ||
+ | | | | |--Perl pattern [^\QCAP. I.\E] with [CAP 1] | ||
+ | | | | | [ ] Match case | ||
+ | | | | | [ ] Whole words only | ||
+ | | | | | [ ] Case sensitive replace | ||
+ | | | | | [ ] Prompt on replace | ||
+ | | | | | [ ] Skip prompt if identical | ||
+ | | | | | [ ] First only | ||
+ | | | | | [ ] Extract matches | ||
+ | | | | | Maximum text buffer size 4096 | ||
+ | | | | | [ ] Maximum match (greedy) | ||
+ | | | | | [ ] Allow comments | ||
+ | | | | | [X] '.' matches newline | ||
+ | | | | | [ ] UTF-8 Support | ||
+ | | | | | | ||
+ | | | | |--Perl pattern [^\QPALM\E] with [CAP] | ||
+ | | | | | [ ] Match case | ||
+ | | | | | [ ] Whole words only | ||
+ | | | | | [ ] Case sensitive replace | ||
+ | | | | | [ ] Prompt on replace | ||
+ | | | | | [ ] Skip prompt if identical | ||
+ | | | | | [ ] First only | ||
+ | | | | | [ ] Extract matches | ||
+ | | | | | Maximum text buffer size 4096 | ||
+ | | | | | [ ] Maximum match (greedy) | ||
+ | | | | | [ ] Allow comments | ||
+ | | | | | [X] '.' matches newline | ||
+ | | | | | [ ] UTF-8 Support | ||
+ | | | | | | ||
+ | | | | +--Perl pattern [^\QPSALM\E] with [CAP] | ||
+ | | | | [ ] Match case | ||
+ | | | | [ ] Whole words only | ||
+ | | | | [ ] Case sensitive replace | ||
+ | | | | [ ] Prompt on replace | ||
+ | | | | [ ] Skip prompt if identical | ||
+ | | | | [ ] First only | ||
+ | | | | [ ] Extract matches | ||
+ | | | | Maximum text buffer size 4096 | ||
+ | | | | [ ] Maximum match (greedy) | ||
+ | | | | [ ] Allow comments | ||
+ | | | | [X] '.' matches newline | ||
+ | | | | [ ] UTF-8 Support | ||
+ | | | | | ||
+ | | | +--Perl pattern [^CAP (\d+)$] with [] | ||
+ | | | | [ ] Match case | ||
+ | | | | [ ] Whole words only | ||
+ | | | | [ ] Case sensitive replace | ||
+ | | | | [ ] Prompt on replace | ||
+ | | | | [ ] Skip prompt if identical | ||
+ | | | | [ ] First only | ||
+ | | | | [ ] Extract matches | ||
+ | | | | Maximum text buffer size 4096 | ||
+ | | | | [ ] Maximum match (greedy) | ||
+ | | | | [ ] Allow comments | ||
+ | | | | [ ] '.' matches newline | ||
+ | | | | [X] UTF-8 Support | ||
+ | | | | | ||
+ | | | +--Replace [CAP] with [\\c] | ||
+ | | | [ ] Match case | ||
+ | | | [ ] Whole words only | ||
+ | | | [ ] Case sensitive replace | ||
+ | | | [ ] Prompt on replace | ||
+ | | | [ ] Skip prompt if identical | ||
+ | | | [ ] First only | ||
+ | | | [ ] Extract matches | ||
+ | | | | ||
+ | | |--Comment... | ||
+ | | | | Convert verse numbers to tags | ||
+ | | | | | ||
+ | | | +--Perl pattern [^(\d+) ] with [] | ||
+ | | | | [ ] Match case | ||
+ | | | | [ ] Whole words only | ||
+ | | | | [ ] Case sensitive replace | ||
+ | | | | [ ] Prompt on replace | ||
+ | | | | [ ] Skip prompt if identical | ||
+ | | | | [ ] First only | ||
+ | | | | [ ] Extract matches | ||
+ | | | | Maximum text buffer size 4096 | ||
+ | | | | [X] Maximum match (greedy) | ||
+ | | | | [ ] Allow comments | ||
+ | | | | [ ] '.' matches newline | ||
+ | | | | [X] UTF-8 Support | ||
+ | | | | | ||
+ | | | +--Perl pattern [^(\d+)] with [\\v $1] | ||
+ | | | [ ] Match case | ||
+ | | | [ ] Whole words only | ||
+ | | | [ ] Case sensitive replace | ||
+ | | | [ ] Prompt on replace | ||
+ | | | [ ] Skip prompt if identical | ||
+ | | | [ ] First only | ||
+ | | | [ ] Extract matches | ||
+ | | | Maximum text buffer size 4096 | ||
+ | | | [X] Maximum match (greedy) | ||
+ | | | [ ] Allow comments | ||
+ | | | [ ] '.' matches newline | ||
+ | | | [X] UTF-8 Support | ||
+ | | | | ||
+ | | +--Comment... | ||
+ | | | Add remarks, filenames and convert to ID and Header tags | ||
+ | | | | ||
+ | | |--Add file header [\\rem John Wycliffe's Translation of the Bible] | ||
+ | | | | ||
+ | | |--Add file header [@inputFilename] | ||
+ | | | | ||
+ | | |--Restrict lines:Line 1 .. line 1 | ||
+ | | | | | ||
+ | | | +--Perl pattern [^(\d+)_(...)\Q.TXT\E] with [\\id $2\r\n\\h $2] | ||
+ | | | [ ] Match case | ||
+ | | | [ ] Whole words only | ||
+ | | | [ ] Case sensitive replace | ||
+ | | | [ ] Prompt on replace | ||
+ | | | [ ] Skip prompt if identical | ||
+ | | | [ ] First only | ||
+ | | | [ ] Extract matches | ||
+ | | | Maximum text buffer size 4096 | ||
+ | | | [X] Maximum match (greedy) | ||
+ | | | [ ] Allow comments | ||
+ | | | [ ] '.' matches newline | ||
+ | | | [X] UTF-8 Support | ||
+ | | | | ||
+ | | +--Restrict lines:Line 2 .. line 2 | ||
+ | | | | ||
+ | | +--Replace list: D:\Download\Java\GoBibleCreator\Download Other\Wesley Center Online\ID to Book.csv Perl pattern | ||
+ | | [X] Match case | ||
+ | | [X] Whole words only | ||
+ | | [ ] Case sensitive replace | ||
+ | | [ ] Prompt on replace | ||
+ | | [ ] Skip prompt if identical | ||
+ | | [ ] First only | ||
+ | | [ ] Extract matches | ||
+ | | Maximum text buffer size 4096 | ||
+ | | [ ] Maximum match (greedy) | ||
+ | | [ ] Allow comments | ||
+ | | [X] '.' matches newline | ||
+ | | [ ] UTF-8 Support | ||
+ | | | ||
+ | +--Output to file(s) | ||
+ | [ ] Only update date on changed files | ||
+ | [ ] Append mode | ||
+ | [X] Change extension to: .usfm | ||
+ | [ ] Open output file | ||
+ | Only output modified files Output folder: D:\Download\Java\GoBibleCreator\Download Other\Wesley Center Online\USFM | ||
+ | [ ] Maintain folder structure | ||
+ | [ ] Remove empty output files | ||
+ | |||
+ | Files List | ||
+ | ---------- | ||
+ | D:\Download\Java\GoBibleCreator\Download Other\Wesley Center Online\wycbible\AP\*.txt | ||
+ | D:\Download\Java\GoBibleCreator\Download Other\Wesley Center Online\wycbible\NT\*.txt | ||
+ | D:\Download\Java\GoBibleCreator\Download Other\Wesley Center Online\wycbible\OT\*.txt | ||
+ | </pre> |
Revision as of 14:34, 18 March 2011
What prompted me to do this?
This exercise was prompted by something posted in facebook by one of my friends. He had pasted a verse from our existing Wycliffe module, which contains only the Pentateuch and the Gospels, and which was derived from Sergej Fedosov's Slavic Bible. IMHO, it should be fairly straightforward to create a SWORD module for the entire Wycliffe Bible, albeit we would require to use av11n to include the 10 Deuterocanonical books. David Haslam 15:58, 17 March 2011 (UTC)
Collaboration
I have uploaded the USFM files to a private folder in my box.net account. If anyone in CrossWire would like to collaborate on this further work, please contact me. I can easily make the folder suitable for collaboration. David Haslam 16:16, 17 March 2011 (UTC)
TextPipe filter details
Here is an exported view of the bespoke TextPipe filter I used to convert the data to USFM:
TextPipe Single User Edition Purchased by: David Haslam, David Haslam Filter Title: C:\Users\David\TextPipe Filters\Special filter to convert Wycliffe Bible from the Wesley Center Online to rudimentary USFM.fll Filter List ----------- Filter options | [ ] Log to file | [X] Append to logfile | Log filename: textpipe.log | Threshold 500 | |--Input from file(s) | [ ] Confirm before processing each file | [ ] Confirm before processing read/only files | [ ] Delete input files after processing | Process binary files | |--Comment... | | Special filter to convert Wycliffe Bible from the Wesley Center Online to rudimentary USFM | | | |--Comment... | | | Convert ANSI to UTF-8 | | | | | +--Convert from ANSI to UTF-8 | | | |--Comment... | | | Convert chapter numbers to tags | | | | | |--Comment... | | | | Deal first with various exceptions | | | | | | | |--Perl pattern [^\QCAP. I.\E] with [CAP 1] | | | | [ ] Match case | | | | [ ] Whole words only | | | | [ ] Case sensitive replace | | | | [ ] Prompt on replace | | | | [ ] Skip prompt if identical | | | | [ ] First only | | | | [ ] Extract matches | | | | Maximum text buffer size 4096 | | | | [ ] Maximum match (greedy) | | | | [ ] Allow comments | | | | [X] '.' matches newline | | | | [ ] UTF-8 Support | | | | | | | |--Perl pattern [^\QPALM\E] with [CAP] | | | | [ ] Match case | | | | [ ] Whole words only | | | | [ ] Case sensitive replace | | | | [ ] Prompt on replace | | | | [ ] Skip prompt if identical | | | | [ ] First only | | | | [ ] Extract matches | | | | Maximum text buffer size 4096 | | | | [ ] Maximum match (greedy) | | | | [ ] Allow comments | | | | [X] '.' matches newline | | | | [ ] UTF-8 Support | | | | | | | +--Perl pattern [^\QPSALM\E] with [CAP] | | | [ ] Match case | | | [ ] Whole words only | | | [ ] Case sensitive replace | | | [ ] Prompt on replace | | | [ ] Skip prompt if identical | | | [ ] First only | | | [ ] Extract matches | | | Maximum text buffer size 4096 | | | [ ] Maximum match (greedy) | | | [ ] Allow comments | | | [X] '.' matches newline | | | [ ] UTF-8 Support | | | | | +--Perl pattern [^CAP (\d+)$] with [] | | | [ ] Match case | | | [ ] Whole words only | | | [ ] Case sensitive replace | | | [ ] Prompt on replace | | | [ ] Skip prompt if identical | | | [ ] First only | | | [ ] Extract matches | | | Maximum text buffer size 4096 | | | [ ] Maximum match (greedy) | | | [ ] Allow comments | | | [ ] '.' matches newline | | | [X] UTF-8 Support | | | | | +--Replace [CAP] with [\\c] | | [ ] Match case | | [ ] Whole words only | | [ ] Case sensitive replace | | [ ] Prompt on replace | | [ ] Skip prompt if identical | | [ ] First only | | [ ] Extract matches | | | |--Comment... | | | Convert verse numbers to tags | | | | | +--Perl pattern [^(\d+) ] with [] | | | [ ] Match case | | | [ ] Whole words only | | | [ ] Case sensitive replace | | | [ ] Prompt on replace | | | [ ] Skip prompt if identical | | | [ ] First only | | | [ ] Extract matches | | | Maximum text buffer size 4096 | | | [X] Maximum match (greedy) | | | [ ] Allow comments | | | [ ] '.' matches newline | | | [X] UTF-8 Support | | | | | +--Perl pattern [^(\d+)] with [\\v $1] | | [ ] Match case | | [ ] Whole words only | | [ ] Case sensitive replace | | [ ] Prompt on replace | | [ ] Skip prompt if identical | | [ ] First only | | [ ] Extract matches | | Maximum text buffer size 4096 | | [X] Maximum match (greedy) | | [ ] Allow comments | | [ ] '.' matches newline | | [X] UTF-8 Support | | | +--Comment... | | Add remarks, filenames and convert to ID and Header tags | | | |--Add file header [\\rem John Wycliffe's Translation of the Bible] | | | |--Add file header [@inputFilename] | | | |--Restrict lines:Line 1 .. line 1 | | | | | +--Perl pattern [^(\d+)_(...)\Q.TXT\E] with [\\id $2\r\n\\h $2] | | [ ] Match case | | [ ] Whole words only | | [ ] Case sensitive replace | | [ ] Prompt on replace | | [ ] Skip prompt if identical | | [ ] First only | | [ ] Extract matches | | Maximum text buffer size 4096 | | [X] Maximum match (greedy) | | [ ] Allow comments | | [ ] '.' matches newline | | [X] UTF-8 Support | | | +--Restrict lines:Line 2 .. line 2 | | | +--Replace list: D:\Download\Java\GoBibleCreator\Download Other\Wesley Center Online\ID to Book.csv Perl pattern | [X] Match case | [X] Whole words only | [ ] Case sensitive replace | [ ] Prompt on replace | [ ] Skip prompt if identical | [ ] First only | [ ] Extract matches | Maximum text buffer size 4096 | [ ] Maximum match (greedy) | [ ] Allow comments | [X] '.' matches newline | [ ] UTF-8 Support | +--Output to file(s) [ ] Only update date on changed files [ ] Append mode [X] Change extension to: .usfm [ ] Open output file Only output modified files Output folder: D:\Download\Java\GoBibleCreator\Download Other\Wesley Center Online\USFM [ ] Maintain folder structure [ ] Remove empty output files Files List ---------- D:\Download\Java\GoBibleCreator\Download Other\Wesley Center Online\wycbible\AP\*.txt D:\Download\Java\GoBibleCreator\Download Other\Wesley Center Online\wycbible\NT\*.txt D:\Download\Java\GoBibleCreator\Download Other\Wesley Center Online\wycbible\OT\*.txt