Difference between revisions of "DevTools:Conversion to Unicode"

From CrossWire Bible Society
Jump to: navigation, search
(ANSI (iso ASCII))
 
Line 1: Line 1:
  
For English language texts that only make use of ASCII characters, no change to the source encoding will be required. For other European language and most other languages, there probably exist simple encoding converters for ISO and national standards to UTF-8. For more complex source encodings, you may need to create your own converter or adapt an existing one. Some currently available conversion tools that you may find useful, depending on your platform and needs, include:
+
For English language texts that only make use of ANSI characters, no change to the source encoding will be required. For other European language and most other languages, there probably exist simple encoding converters for ISO and national standards to UTF-8. For more complex source encodings, you may need to create your own converter or adapt an existing one. Some currently available conversion tools that you may find useful, depending on your platform and needs, include:
  
 
*uconv (part of ICU), available compiled for Win32 in the utilities ZIP at http://crosswire.org/ftpmirror/pub/sword/utils/win32 or in source format from ICU at http://www.icu-project.org/.
 
*uconv (part of ICU), available compiled for Win32 in the utilities ZIP at http://crosswire.org/ftpmirror/pub/sword/utils/win32 or in source format from ICU at http://www.icu-project.org/.

Latest revision as of 12:24, 8 January 2018

For English language texts that only make use of ANSI characters, no change to the source encoding will be required. For other European language and most other languages, there probably exist simple encoding converters for ISO and national standards to UTF-8. For more complex source encodings, you may need to create your own converter or adapt an existing one. Some currently available conversion tools that you may find useful, depending on your platform and needs, include:

uconv is best suited for standard encodings and font2uni is best suited for font-specific encodings. When creating XML texts, the only entities that should be used are &amp; for '&' and &lt; for '<'. All other entities should be encoded as their UTF-8 equivalents.