Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

Michael \(michka\) Kaplan Fri, 28 Feb 2003 11:33:26 -0800

From: "Yung-Fong Tang" <[EMAIL PROTECTED]>

> When you deal with encoding which need states (ISO-2022,
ISO-2022-JP,
> etc) or variable length encoding (Shift_JIS, Big5, UTF-8), then the
> situration is different.


Unicode cannot of course speak for those other encodings, but it can
speak for UTF-8. There is a clear definition and it is up to the
application what it wants to do with sequences deemed irregular or
illegal. The decision is application dependent.

EXAMPLE: In the latest versions of Windows, one can convert from UTF-8
using MultiByteToWideChar. If one passes MB_ERR_INVALID_CHARS then
such an errant string will cause the conversion to fail with an
ERROR_NO_UNICODE_TRANSLATION error. If one does not pass the flag,
then the conversion will simply strip the errant characters. Note that
either solution meets the needs of refusal to interpret the errant
sequences.

What Netscape wants to do here in Mozilla or elsewhere can also be
based on a decision within Netscape for the most appropriate behavior,
given the definition.

MichKa [MS]

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

Reply via email to