Re: Unicode forms for internal storage - BOCU-1 speed

Philippe Verdy Thu, 22 Jan 2004 15:35:25 -0800

From: <[EMAIL PROTECTED]>
To: "Philippe Verdy" <[EMAIL PROTECTED]>
Cc: "Markus Scherer" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, January 22, 2004 10:26 PM
Subject: Re: Unicode forms for internal storage - BOCU-1 speed



> Philippe Verdy scripsit:
>
> > Is the other competing UTF-9 from Jerome Abela this one:
>
> No. Abela's version preserves all of 00-7F and A0-FF, packing all the rest
> of Unicode into sequences beginning with any of 80-9F.

Thanks for pointing this.

By the way, I don't think that there's an official reference that attributes
the acronym "UTF-9" to any of these encoding forms. I think that if "UTF-9"
is used it should be agreed by Unicode as being an official unique
representation. The other forms requiring another encoding label not
starting by "UTF" which should be reserved to encoding forms approved by
Unicode and ISO/IEC 10646.

We have already suffered in the past of the confusion caused by various
interpretation of "UTF-8" (until CESU-8 was documented, and the acronym
"UTF-8" removed from the JNI documentation for Java) and by confusions
between UTF-16/UTF-16BE/UTF-16LE/UCS2... I think then that "UTF-9" is a bad
acronym to refer to a specific unapproved (not-standard) encoding form, and
its use in this mailing list is just adding more confusion because there's
no such "UTF-9" standard until it is documented by a IETF/ISO/IEC 10646 RFC
or by Unicode.

Re: Unicode forms for internal storage - BOCU-1 speed

Reply via email to