Difference between revisions of "DevTools:Conversion to Unicode"
From CrossWire Bible Society
David Haslam (talk | contribs) |
David Haslam (talk | contribs) (ANSI (iso ASCII)) |
||
Line 1: | Line 1: | ||
− | For English language texts that only make use of | + | For English language texts that only make use of ANSI characters, no change to the source encoding will be required. For other European language and most other languages, there probably exist simple encoding converters for ISO and national standards to UTF-8. For more complex source encodings, you may need to create your own converter or adapt an existing one. Some currently available conversion tools that you may find useful, depending on your platform and needs, include: |
*uconv (part of ICU), available compiled for Win32 in the utilities ZIP at http://crosswire.org/ftpmirror/pub/sword/utils/win32 or in source format from ICU at http://www.icu-project.org/. | *uconv (part of ICU), available compiled for Win32 in the utilities ZIP at http://crosswire.org/ftpmirror/pub/sword/utils/win32 or in source format from ICU at http://www.icu-project.org/. |
Latest revision as of 12:24, 8 January 2018
For English language texts that only make use of ANSI characters, no change to the source encoding will be required. For other European language and most other languages, there probably exist simple encoding converters for ISO and national standards to UTF-8. For more complex source encodings, you may need to create your own converter or adapt an existing one. Some currently available conversion tools that you may find useful, depending on your platform and needs, include:
- uconv (part of ICU), available compiled for Win32 in the utilities ZIP at http://crosswire.org/ftpmirror/pub/sword/utils/win32 or in source format from ICU at http://www.icu-project.org/.
- font2uni from CCEL, available at http://www.ccel.org/info/gkheb/.
uconv is best suited for standard encodings and font2uni is best suited for font-specific encodings. When creating XML texts, the only entities that should be used are & for '&' and < for '<'. All other entities should be encoded as their UTF-8 equivalents.