From: "Yung-Fong Tang" <[EMAIL PROTECTED]> > When you deal with encoding which need states (ISO-2022, ISO-2022-JP, > etc) or variable length encoding (Shift_JIS, Big5, UTF-8), then the > situration is different.
Unicode cannot of course speak for those other encodings, but it can speak for UTF-8. There is a clear definition and it is up to the application what it wants to do with sequences deemed irregular or illegal. The decision is application dependent. EXAMPLE: In the latest versions of Windows, one can convert from UTF-8 using MultiByteToWideChar. If one passes MB_ERR_INVALID_CHARS then such an errant string will cause the conversion to fail with an ERROR_NO_UNICODE_TRANSLATION error. If one does not pass the flag, then the conversion will simply strip the errant characters. Note that either solution meets the needs of refusal to interpret the errant sequences. What Netscape wants to do here in Mozilla or elsewhere can also be based on a decision within Netscape for the most appropriate behavior, given the definition. MichKa [MS]