Re: Japanese Language Support

Sam Varshavchik Mon, 07 May 2001 21:46:39 -0700
Kiyokazu SUTO writes:

> Citation (with leading "> " of each line) from article:
>   <[EMAIL PROTECTED]>
>     by Sam Varshavchik <[EMAIL PROTECTED]> :
>> No, not really.  SqWebMail's only assumption is that a character set can be 
>> mapped to or from unicode.  Non US-ASCII charsets can generally use 
>> 0x21..0x7E, except for the HTML defanging issue, which I'll mention shortly. 
> 
> There is another exception.  When sending e-mails, SqWebMail performs
> only Q-encoding of octets of range 0x80..0xFF which apprears in
> messages headers, and pass through escape sequences even if they
> appear in structured field.  This might confuse some mailers because
> standard does not require a mailer to interpret CES ohter than
> US-ASCII in message header.

That should be simple enough to fix. 

> 
>> Someone else mailed me some links to look over.  It appears that the major 
>> stumbling block is that currently the unicode mapper does not carry over 
>> stateful information between successive mappings to/from unicode.  SqWebMail 
>> first maps the message's text/plain content to Unicode, according to its 
>> MIME charset, then from Unicode to the browser client's MIME charset.  To do 
>> this correctly with iso-2022-jp it is necessary to keep track of the current 
>> character set being encoded in iso-2022-jp, and currently there is no state 
>> information carried across successive calls to the unicode functions. 
> 
> I don't think this is significant problem.  We Japanese programmers
> are very familiar with such work, and can contribute necessary codes.
> Otherwise you can use iconv library, of course.

It's not too difficult, it's just time consuming.  The conversion function 
for each charset needs to be modified to accept a transparent context 
pointer, and each charset needs to define a context creation/destruction 
function.  Then, start compiling and fixing stuff that doesn't compile any 
more because the API changed. 


>> The other potential issue is text/html content encoded in iso-2022-jp.  The 
>> jis-x-0208 octets are in the lower US-ASCII range and they definitely 
>> overlap with the HTML markup tags, since they use the < > (and & and other) 
>> octets.  I suppose that text/html iso-2022-jp always shifts back to US-ASCII 
>> before introducing each < > markup tag.  Even with that, this is going to 
>> cause problems for SqWebMail's HTML defanger, which eats HTML markup tags in 
>> their raw form. 
> 
> Actual problem I encountered is that, when SqWebMail outputs HTML text
> for clients, it converts ESC (0x1B) to character reference (&#x1B;).
> As far as I know, no browser interprets this character reference as an
> introduction of an escape sequence to switch CCS.  Thus, succeeding
> string is treated as US-ASCII text, which seems like dust on screen.

That's easily changed too.  There's a definite problem with iso-2022-jp 
using < and > characters in multibyte sequences.  This won't be that easy to 
solve. 


-- 
Sam
Re: Japanese Language Support

Reply via email to