Re: UTF-8S (was: Re: ISO vs Unicode UTF-8)

Michael \(michka\) Kaplan Mon, 04 Jun 2001 11:47:15 -0700

From: "Mark Davis" <[EMAIL PROTECTED]>

> 2. Auto-detection does not particularly favor one side or the other.
>
> UTF-8 and UTF-8s are strictly non-overlapping. If you ever encounter a
> supplementary character expressed with two 3-byte values, you know you do
> not have pure UTF-8. If you ever encounter a supplementary character
> expressed with a 4-byte value, you know you don't have pure UTF-8s. If you
> never encounter either one, why does it matter? Every character you read
is
> valid and correct.

I would have to disagree with this point, since it is considered legal to
accept (but not emit) six-byte supplementary characters in UTF-8 as it
stands today. Thus there is some severe overlap between existing
implementations -- every single implemenattion that was not thinking ahead
to surogate pairs, for example. This would include dozens of MS apps, the
versions of Oracle prior to them adding official UTF-8 support, and a ton of
other products.

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/

Re: UTF-8S (was: Re: ISO vs Unicode UTF-8)

Reply via email to