RE: Support for Japanese characters

Marco Cimarosti Mon, 11 Mar 2002 02:55:44 -0800

Eric Ray wrote:
> 1.  The library does not really evaluate the Japanese characters
> to make logical decisions.  We believe base64 encode the
> character array to avoid any "bad things happening in the code"
> (such as hitting a null value or other values that could
> potential cause problems).


Hint: consider revising your project on the light of the fact that both
Unicode (ISO 10646) and the Japanese character set (JIS X 0208) have
ASCII-compatible "multibyte" formats.

Unicode's ASCII-compatible format is called UTF-8. The most popular JIS
ASCII-compatible format is called EUC.

ASCII-compatible means that all byte in the ASCII range (0-128) are only
used for ASCII characters. So, among other things, no "bad things" happen
with null terminators or control characters.

For UTF-8, see Unicode's FAQ
<http://www.unicode.org/unicode/faq/utf_bom.html> or read the historical RFC
which proposed it <http://www.faqs.org/rfcs/rfc2279.html>.

BTW, base64 was also the base of an obsolete Unicode format called UTF-7.
Searching UTF-7 on the web, you'll find a few information and lots of bitter
comments about why this approach is obsolete.

_ Marco

RE: Support for Japanese characters

Reply via email to