Kiyokazu SUTO writes:
> Citation (with leading "> " of each line) from article:
> <[EMAIL PROTECTED]>
> by Sam Varshavchik <[EMAIL PROTECTED]> :
>> No, not really. SqWebMail's only assumption is that a character set can be
>> mapped to or from unicode. Non US-ASCII charsets can generally use
>> 0x21..0x7E, except for the HTML defanging issue, which I'll mention shortly.
>
> There is another exception. When sending e-mails, SqWebMail performs
> only Q-encoding of octets of range 0x80..0xFF which apprears in
> messages headers, and pass through escape sequences even if they
> appear in structured field. This might confuse some mailers because
> standard does not require a mailer to interpret CES ohter than
> US-ASCII in message header.
That should be simple enough to fix.
>
>> Someone else mailed me some links to look over. It appears that the major
>> stumbling block is that currently the unicode mapper does not carry over
>> stateful information between successive mappings to/from unicode. SqWebMail
>> first maps the message's text/plain content to Unicode, according to its
>> MIME charset, then from Unicode to the browser client's MIME charset. To do
>> this correctly with iso-2022-jp it is necessary to keep track of the current
>> character set being encoded in iso-2022-jp, and currently there is no state
>> information carried across successive calls to the unicode functions.
>
> I don't think this is significant problem. We Japanese programmers
> are very familiar with such work, and can contribute necessary codes.
> Otherwise you can use iconv library, of course.
It's not too difficult, it's just time consuming. The conversion function
for each charset needs to be modified to accept a transparent context
pointer, and each charset needs to define a context creation/destruction
function. Then, start compiling and fixing stuff that doesn't compile any
more because the API changed.
>> The other potential issue is text/html content encoded in iso-2022-jp. The
>> jis-x-0208 octets are in the lower US-ASCII range and they definitely
>> overlap with the HTML markup tags, since they use the < > (and & and other)
>> octets. I suppose that text/html iso-2022-jp always shifts back to US-ASCII
>> before introducing each < > markup tag. Even with that, this is going to
>> cause problems for SqWebMail's HTML defanger, which eats HTML markup tags in
>> their raw form.
>
> Actual problem I encountered is that, when SqWebMail outputs HTML text
> for clients, it converts ESC (0x1B) to character reference ().
> As far as I know, no browser interprets this character reference as an
> introduction of an escape sequence to switch CCS. Thus, succeeding
> string is treated as US-ASCII text, which seems like dust on screen.
That's easily changed too. There's a definite problem with iso-2022-jp
using < and > characters in multibyte sequences. This won't be that easy to
solve.
--
Sam