I would like to point out one of the new features of ICU 2.8, which is currently available as an alpha release: http://oss.software.ibm.com/icu/download/2.8/

ICU 2.8 has the ability to handle m:n character conversion mappings driven by simple lines in Unicode conversion tables (text files).

I sincerely hope that the availability of this feature will help argue against further assignments of precomposed Unicode characters.

For example, the ibm-1390_P110-2003.ucm conversion table file (for EBCDIC Japanese with the JIS X 0213 repertoire) contains lines like

<U304B><U309A> \xEC\xB5 |0

which expresses the mapping between two Unicode code points (Hiragana Ka + semi-voiced mark) and one DBCS sequence.

Either side of the mapping can contain multiple "characters" - Unicode code points on one side, complete codepage byte sequences on the other.

Best regards,
markus




Reply via email to