Difference between revisions of "Talk:File Formats"
(request for more info)
(→Request for info: SWORD internal file formats)
|Line 109:||Line 109:|
[[User:Pmartel60|Pmartel60]] 15:12, 1 October 2011 (MDT)
[[User:Pmartel60|Pmartel60]] 15:12, 1 October 2011 (MDT)
Revision as of 23:12, 1 October 2011
- 1 non-SWORD software
- 2 Go Bible & collaboration with CrossWire
- 3 Tessaract OCR software
- 4 Zefania XML
- 5 PMD files?
- 6 Please discuss before reverting - the page is now become inaccurate
- 7 usfm2osis.pl
- 8 Creating and copying from PDF files
- 9 modwrite / treeidxutil / xmlcatalog
- 10 SGML section
- 11 XHTML-TE and Go Bible Creator
- 12 imp2osis.pl
- 13 Why I have added the section about ODF
- 14 Request for info: SWORD internal file formats
I couldn't see the value of listing other Bible software in our Wiki, but tolerated it since it was supposed to be further developed into something more worthwhile. However, now that it's become largely an ad for non-Sword software, I'm going to remove the whole section.
Only material related, relevant, or useful to CrossWire and/or the SWORD Project need go in our Wiki. Material not fitting those categories can go in someone else's Wiki and will be deleted from this one.
Go Bible & collaboration with CrossWire
I have just added some information about Go Bible. Please visit my user page to see why, rather than removing this. David Haslam 15:47, 7 August 2008 (MDT)
Tessaract OCR software
Having just added a stub section about Tessaract, I'd like to suggest that Troy adds a few words about it. David Haslam 20:03, 9 January 2009 (UTC)
A couple of days ago, there was a message from Wolfgang Schultz in the Sword dev mailing list which announced that he has removed all websites relating to Zefania XML. Certainly the link now goes to a "page under construction" message. David Haslam 19:25, 27 April 2009 (UTC)
- The sourceforge repository for various modules is still available at . The most recently uploaded module is the Zürcher Bibel, dated April 12, 2009. David Haslam 10:00, 15 May 2009 (UTC)
- The admins for this sourceforge project are Tom Baccei and Mathieu Delarue. David Haslam 10:03, 15 May 2009 (UTC)
- This repository is still active. Many more Zefania XML files have been uploaded since I last reported. David Haslam 16:14, 25 July 2009 (UTC)
Anyone know a way (short of obtaining and installing an old version of PageMaker) how to extract the text from a PMD file? David Haslam 09:13, 14 June 2009 (UTC)
- Anyone tried Create Adobe® PDF Online? This online service is only available to subscribers in the US & Canada. I live in the UK. David Haslam 09:33, 14 June 2009 (UTC)
Please discuss before reverting - the page is now become inaccurate
- I reverted because one of your last edits erased text following "$$$". This is the second time you've done this kind of careless edit. Last time, I (and I believe another editor) corrected all of the deletions you performed. --Osk 15:47, 24 July 2009 (UTC)
- I have no idea why that occurred, when all I was doing was changing a redlink in the previous paragraph, and that only during my last edit before your reversion. I'm sure this wasn't through carelessness. I normally check my edits using preview before saving. This is quite weird! Could this be due to a subtle bug in the wiki software? David Haslam 19:44, 24 July 2009 (UTC)
- I've been thinking about this further. The two items of text that were unintentionally deleted were both Bible references. A couple of weeks ago I installed a Firefox add-on called Bible Refalizer. I wonder if that was the cause? I'll do some tests in the wiki sandbox, and report later. David Haslam 21:09, 24 July 2009 (UTC)
- With Bible Refalizer enabled, when a wiki page or section is edited, Bible references in the edit box get removed. With Refalizer disabled, everything is OK. This is a serious bug in the Refalizer add-on for Firefox. I have reported it to the programmer, James Anderson. David Haslam 16:31, 29 July 2009 (UTC)
Creating and copying from PDF files
Providing the document's security properties permit copying, the entire content of a PDF file can be copied using Adobe Reader 9.x (I have successfully used this to paste a complete Bible text into Wordpad). Couple this concept with the fact that there are several printer drivers that permit printing from any Windows application to a PDF file, and let your imagination run riot. Two such printer drivers are the commercial program called pdfFactory from Fineprint, and the free program called PDFCreator. David Haslam 11:21, 19 November 2009 (UTC)
modwrite / treeidxutil / xmlcatalog
Please would someone provide a suitable description of these SWORD tools. I have added a line for each under Miscellaneous. David Haslam 13:28, 7 December 2009 (UTC)
- Still waiting. David Haslam 15:57, 19 December 2009 (UTC)
I added the SGML section after learning that one of my contacts in SIL has a task for converting some Folio View files to Logos format, and he's doing it via XML. He's probably not using SP, yet I thought it helpful to record what I have found. David Haslam 17:50, 22 April 2010 (UTC)
XHTML-TE and Go Bible Creator
I am in contact with the SIL employee who is adapting Go Bible Creator to add the option to specify XHTML-TE as a source text format. Email me if you'd like further details. David Haslam 12:21, 25 May 2010 (UTC)
Here is a copy of the help output for imp2osis.pl
imp2osis.pl -- IMP (Sword Import) format to OSIS 2.1.1 converter version 2.0.1 Revision 227 (2009-10-30) Syntax: imp2osis.pl <osisWork> <input filename> [-o OSIS-file] [-m] The -m option will produce milestoned <verse/> elements, which are more likely to produce valid OSIS from Bibles with OSIS markup internally. No attempt is made to convert markup present in the verse entries themselves, so this tool is appropriate for converting Bibles that already contain OSIS markup or plaintext markup. This tool is ONLY intended for VersKey-type Sword texts, namely Bibles and commentaries.
Lightly edited to avoid having to use horizontal scrolling. David Haslam 16:37, 4 December 2010 (UTC)
Why I have added the section about ODF
Works that are supplied as word processing files are sometimes difficult to convert to OSIS XML. The XML content of an ODF file may prove to be a useful intermediate step for format shifting. In theory, it should be feasible to develop an XSLT script to perform such a transformation. David Haslam 13:41, 23 March 2011 (UTC)
Request for info: SWORD internal file formats
I arrived at this page looking for info on SWORD's internal file formats.
There's some detail on "other" formats and references to existing tools that convert between some of them and "SWORD format" but it would be useful to tell more about what "SWORD format" actually IS.
Is there documentation available anywhere about what "SWORD format" looks like externally (file names and/or directory structure) and internally (file layout)?
What if someone wanted to develop a document for use in a sword application?
This page seems to suggest that they'd be best advised to format that information in one of these other formats -- because those are documented -- and trust one of the listed conversion routines to somehow make the data usable in sword apps.
Is it really the case that a direct formatting into Sword format is not worth documenting/considering?
Even the indirect path of importing from one of these other formats leaves questions about the resulting "external" structure like:
How do I identify the result -- like to tell if there's any chance that a conversion succeeded or to distribute it to another machine?
Should I expect to see a file? multiple files? a directory? a directory tree?
How, as in by what file name(s), file extension(s), directory name(s) etc. does an application identify such resources to the sword library?
Pmartel60 15:12, 1 October 2011 (MDT)
- Sword's format is not and will not be documented (except in the sense that it is documented in formal languages, namely C++ & Java). Our source code is open source, so you are welcome to read it, with the understanding that any reimplementation of our code that results from reading our code (such as a Sword format reader) would necessarily be bound by the GPL. Sword's format is not an open format. It is a proprietary format, prone to change without notice, according to our current and future needs. --Osk 17:12, 1 October 2011 (MDT)