Re: FSS-UTF, UTF-2, UTF-8, and UTF-16

DougEwell2 Mon, 18 Jun 2001 23:45:34 -0700
In a message dated 2001-06-18 12:56:47 Pacific Daylight Time, 
[EMAIL PROTECTED] writes:

>  As matter of fact, Oracle supported UTF-8 far earlier than surrogate or 
4-byte
>  encoding was introduced. As database vendor, Oracle took fully advantages 
of
>  Unicode and also a victim of Unicode in sense of compatibility. As no 
burden of
>  fonts and IME issue for a database to store Unicode at its server. Oracle
>  supported very early version of Unicode in its Oracle 7 release as database
>  character set AL24UTFFSS which means 3-byte encoding for UTF-FSS. When
>  Unicode came to version 2.1, we found our AL24UTFFSS had trouble for 2.1 as
>  Hangul's reallocation, and we could not simply update AL24UTFFSS to 2.1 
definition
>  as it would  mess existing users' data in their database. So we came up 
with a new
>  character set as UTF8 which is still 3-byte encoding to support Unicode 
2.1.  The
>  choice of 3-byte encoding is also bound to AL24UTFFSS implementation as it 
would
>  not break when users migrate AL24UTFFSS into UTF8.

The Hangul mess took place with Unicode 2.0, not 2.1.  And this is a red 
herring anyway when we are talking about UTF-8.  As stated before, UTF-8 has 
never changed even though the Unicode beneath it has changed:

*  by moving the Hangul block in version 2.0
*  by creating the UTF-16 mechanism to support surrogates in 1993 (not 2001)

The mechanism in UTF-8 to encode characters from U+10000 to U+10FFFF 
(actually U+1FFFFF) in 4 bytes was part of the original FSS-UTF specified in 
1992.  Check the records.  It was never "added on" at some later date, 
causing existing conformant UTF-8 to break.  If Oracle or any other vendor or 
developer originally interpreted UTF-8 to use a maximum of 3 bytes to encode 
a character, that is either their own misreading of the specification or a 
deliberate subsetting of the problem, but in any case that company cannot 
claim to be a "victim of Unicode" when they have implemented a clearly 
specified Unicode standard incorrectly.

-Doug Ewell
 Fullerton, California
Re: FSS-UTF, UTF-2, UTF-8, and UTF-16

Reply via email to