Not bad at all for Oracle as we get exact requirement from our application vendors that they want to store the surrogate as 6 bytes in database so that they can have the same semantics as UTF-16. As for conforming, I don't think there is any issue here for the database client if UTF-16 is used for client side. We will also support 4-byte UTF-8 at client side in next release for the conforming if the user want to it. Regards, Jianping. [EMAIL PROTECTED] wrote: > >As Oracle UTF8 character set definition supports surrogates by a pairs of > two > >3-bytes to be sync with UTF-16 in binary sorting and code point, > > This in not a conformant representation. > > D29 (p. 46) states that a UTF "transforms each Unicode scalar value into a > unique sequence of code values". Am I not right in saying that xD800 - > xDFFF are not valid Unicode scalar values? (If so, then three bytes that > map to one of these values are not valid UTF-8.) The text after the > definition states that "...invalid scalar values include... unpaired > surrogates" and here we'd be dealing with paired surrogates. But the usage > described above is mapping individual surrogate code values to a UTF-8 > sequence, and that seems to be invalid. Furthermore, D29 requires unique > mappings. If we allow both 4-byte and 6-byte representations for a given > non-BMP character, that condition is violated. This also violates the > specification of D36, which refers to table 3-1, and also the normative > text below that says, "when converting a Unicode scalar value to UTF-8, the > shortest form that can represent those values shall be used." > > So, if you're representing non-BMP characters in Oracle using quasi-UTF-8 > sequences that are six bytes long, you are not conforming to the spec for > UTF-8, and your software is not conformant to the Unicode standard (or to > ISO 10646). Sorry for the bad news... > > - Peter > > --------------------------------------------------------------------------- > Peter Constable > > Non-Roman Script Initiative, SIL International > 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA > Tel: +1 972 708 7485 > E-mail: <[EMAIL PROTECTED]>
begin:vcard n:Yang;Jianping tel;fax:650-506-7225 tel;work:650-506-4865 x-mozilla-html:FALSE org:Server Gobalization Technology;Server Technology version:2.1 email;internet:[EMAIL PROTECTED] title:Senior Development Manager adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065; fn:Jianping Yang end:vcard