RE: CESU-8 vs UTF-8

Carl W. Brown Sat, 15 Sep 2001 12:17:51 -0700
Doug,

>
> This was my solution long ago: fix the code that sorts in UCS-2
> order so that
> supplementary characters are sorted correctly.  In case there is any
> disagreement about this, sorting by UCS-2 order has been WRONG ever since
> surrogates and UTF-16 were invented.
>
> However, the database vendors' position is that there is now data
> sorted in
> this way, and it cannot be changed or database integrity will be
> compromised.

It will not be compromised unless they already have data with characters in
the database indexes beyond U+FFFF.  This is why I think that the Unicode
standards committee should take quick action so that the Unicode world does
not get split between two alternate basic sorting sequences.  It needs to be
done before there are a lot of legacy data to contend with.  Now it the time
because developers are just starting to convert to provide real surrogate
support.  Collation does not work for three reasons.  1) It is too slow.  2)
More importantly we need to have a locale neutral sorting sequence. 3) Code
point order sequencing supports all existing data stored with UCS-2 binary
indexes but collation does not.

>
> I suggest, as part of the Proposed Draft stage for this document, that
> Section 4 be deleted and that IANA be informed that CESU-8 is
> intended as an
> internal encoding only and that they are explicitly requested NOT
> to register
> it.

In actuality Section 4 neither adds not takes away from PDUTR #26.  They can
either apply to IANA or not if Section 4 is included or not.  It is merely a
notification that there is no intent to make CESU-8 a private protocol.

PDUTR #26 should be rejected in its entirety.  If it is truly a private
protocol as they claim it does not belong it any form in the Unicode
standard.

You may have heard about hijacking legislative bills.  It is taking an
existing bill and amending it to change the entire text of the bill.  I
think that we should hijack PDUTR #26 and replace it with UTF-17.

In actuality we should hijack PDUTR #26 to modify TR27 to specify that at a
minimum, systems that support UTF-16 must provide code point order support
services.  We should delete all references to CESU-8 and reject the idea of
adding CESU-8 to the standard.

Carl
RE: CESU-8 vs UTF-8

Reply via email to