From: "Mark Davis" <[EMAIL PROTECTED]>

> 2. Auto-detection does not particularly favor one side or the other.
>
> UTF-8 and UTF-8s are strictly non-overlapping. If you ever encounter a
> supplementary character expressed with two 3-byte values, you know you do
> not have pure UTF-8. If you ever encounter a supplementary character
> expressed with a 4-byte value, you know you don't have pure UTF-8s. If you
> never encounter either one, why does it matter? Every character you read
is
> valid and correct.

I would have to disagree with this point, since it is considered legal to
accept (but not emit) six-byte supplementary characters in UTF-8 as it
stands today. Thus there is some severe overlap between existing
implementations -- every single implemenattion that was not thinking ahead
to surogate pairs, for example. This would include dozens of MS apps, the
versions of Oracle prior to them adding official UTF-8 support, and a ton of
other products.

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/



Reply via email to