On Tue, 17 Dec 2002, Stephane Bortzmeyer wrote: > On Tue, Dec 17, 2002 at 01:28:00PM +0100, > Otto Stolz <[EMAIL PROTECTED]> wrote
> > I have seen many messages, originally in ISO-8859-1-encoded French, > > that got the high-bit of every accented character chopped off, thus > > replacing "é" with "i", "î" with "n", and so forth. When was the last time you saw this? > Last time I saw such problems was something like ten years ago. It was > almost never the fault of the SMTP server, but of some programs on the > destination machine (or sometimes the faults of funny gateways like > X400 servers, something you cannot blame on the Internet). Although I agree that 8BITMIME is implemented and deployed very widely these days(it's been more than two years since I received garbled emails due to 7bit-only path. I receive tens of emails in 8bit encodings every day), I'm afraid it's your unique experience that the last time you received emails with MSB stripped off was 10 years ago. While trying to counter the exaggeration made against the ability of the internet email to transport UTF-8 emails, you may have gone to the other extreme. In 1992, sendmail 4.x/5.x transported more than half (if not more) of the Internet email and they're not 8bit clean. That's why RFC 1468 and RFC 1557 were written circa 1992 for Japanese and Korean email exchanges in 7bit ISO-2022-JP and ISO-2022-KR, respectively. (in case of ISO-2022-JP, there's another important reason. there are two major encodings used for Japanese, Shift_JIS on DOS/Windows/Mac and EUC-JP on Unix) As lately as 1999, I did receive MSB-stripped emails which didn't go through non-SMTP gateway (e.g. X400). Back then, some mail servers still used 7bit-only sendmail 4.x, 5.x (on old Sun OS 4.x, AIX 3.x, 4.x, HP/UX 8.x, IRIX, etc machines), old version of PMDF(old VMS machines) and smail(on some Unix machines) while 8bit clean sendmail 8.6.x or later had been around since mid-1990's. Besides, some email servers still don't abide by ESMTP standard and don't include '8BITMIME' in their response when queried with 'EHLO' although they support 8bit clean transport (as you wrote). Nonetheless, I agree that these days most mail transport paths are 8bit clean. Even if not, Base64 and QP(I don't regard them as hack as you do) are well supported by most modern MUAs so that end-users have little problem exchanging emails in UTF-8 (or other legacy 8bit encodings). Most of them don't have to care whether 8BITMIME is used in transit or which C-T-E is used, 8bit,QP, or Base64. > > take the pains to transform 8-bit MIME to some transfer-encoding > > supported by the receiving server. > > Very bad idea, BTW, since it mangles the mail, which can be a problem > with applications like cryptographic signatures. I always turn it off > and it was never a problem. In practice (do note I refer to the real > world), all SMTP servers accept 8-bits EVEN IF THEY DO NOT ADVERTISE > IT PROPERLY with the 8BITMIME option. Doing this type of C-T-E change (from 8bit to QP/Base64) automatically at the MTA level may be a bad idea, but doing this with MUAs should not be a problem(that's what end-users choose). With most modern MUAs supporting MIME standard very well(with notable exceptions being Eudora and some popular web mail services), the 8bit-cleanness of the transport path doesn't matter much for UTF-8 email exchange as I wrote above. IMHO, the biggest obstacle to email exchange in UTF-8 is not 7bit only SMTP but the fact that people don't feel a strong need to switch because they think legacy encodings just work fine for them. (not many people need to exchange emails in languages other than their native ones, let alone multilingual emails that cross the boundary of legacy encodings). Another obstacle is that popular web mail services don't support UTF-8 well incorrectly assuming that there's 'the' invariant mapping between languages and MIME charset/encodings(e.g. for French, use ISO-8859-15/1 or Windows-1252, for Japanese ISO-2022-JP). Therefore, even though major MUAs have no problem with UTF-8 emails, some people get reluctant to send all their outgoing emails in UTF-8 for fear that their correspondents with web mail accounts won't be able to read them without some 'user-intervention'. Jungshik Shin