Re: Japanese Language Support

Kiyokazu SUTO Mon, 07 May 2001 21:13:23 -0700
Citation (with leading "> " of each line) from article:
  <[EMAIL PROTECTED]>
    by Sam Varshavchik <[EMAIL PROTECTED]> :
> No, not really.  SqWebMail's only assumption is that a character set can be 
> mapped to or from unicode.  Non US-ASCII charsets can generally use 
> 0x21..0x7E, except for the HTML defanging issue, which I'll mention shortly. 

There is another exception.  When sending e-mails, SqWebMail performs
only Q-encoding of octets of range 0x80..0xFF which apprears in
messages headers, and pass through escape sequences even if they
appear in structured field.  This might confuse some mailers because
standard does not require a mailer to interpret CES ohter than
US-ASCII in message header.

> Someone else mailed me some links to look over.  It appears that the major 
> stumbling block is that currently the unicode mapper does not carry over 
> stateful information between successive mappings to/from unicode.  SqWebMail 
> first maps the message's text/plain content to Unicode, according to its 
> MIME charset, then from Unicode to the browser client's MIME charset.  To do 
> this correctly with iso-2022-jp it is necessary to keep track of the current 
> character set being encoded in iso-2022-jp, and currently there is no state 
> information carried across successive calls to the unicode functions. 

I don't think this is significant problem.  We Japanese programmers
are very familiar with such work, and can contribute necessary codes.
Otherwise you can use iconv library, of course.

> The other potential issue is text/html content encoded in iso-2022-jp.  The 
> jis-x-0208 octets are in the lower US-ASCII range and they definitely 
> overlap with the HTML markup tags, since they use the < > (and & and other) 
> octets.  I suppose that text/html iso-2022-jp always shifts back to US-ASCII 
> before introducing each < > markup tag.  Even with that, this is going to 
> cause problems for SqWebMail's HTML defanger, which eats HTML markup tags in 
> their raw form. 

Actual problem I encountered is that, when SqWebMail outputs HTML text
for clients, it converts ESC (0x1B) to character reference (&#x1B;).
As far as I know, no browser interprets this character reference as an
introduction of an escape sequence to switch CCS.  Thus, succeeding
string is treated as US-ASCII text, which seems like dust on screen.

-- 
SUTO, Kiyokazu <[EMAIL PROTECTED]>
http://pub.ks-and-ks.ne.jp/pgp-public-key.html
Re: Japanese Language Support

Reply via email to