ICU conversion of codepage data (Was: japanese xml

Carl W. Brown Sat, 01 Sep 2001 17:17:44 -0700

Misha,

> case of Japanese) may cover all the characters you require, in which
> Additionally, if you are thinking of XML (or
> HTML) then you can encode *all* Unicode characters in an EUC-encoded
> document, by employing numeric character references for characters
> outside the EUC character repertoire.  Using the same technique, you can
> encode all Unicode characters in an ASCII-encoded document.
>

Your comment that you can encode all Unicode characters in code page text
(&#nnnnn; or &#xhhhh;) reminded me that I should make a change to xIUA to
take advantage of the power of ICU.  ICU will let you set your converter so
that it will produce nicely HTML/XML compatible escape sequences for all the
characters that it can not convert to the specified code page.

If you are sending code page data to a browser it make more sense to use
escape sequences just in case you have a Unicode capable browser and the
fonts to display the characters.  This can now be optionally selected.

Carl

ICU conversion of codepage data (Was: japanese xml

Reply via email to