A few items: I agree with your main point, which is that UCS-2 is, for all practical purposes, just a repertoire subset of UTF-16; the code units and bit-width are the same.
> Some Java classes that assume that the "char" arithmetic will automatically roll after 16 bits are wrong. The JVM spec only requires that char be at least 16-bit wide (but it may be larger). The compiled classes need to store string constants. But these constants are serialized to be platform independant using a UTF-8 encoding scheme. I'm in the JSR 204 group looking at supplementary character support. Although I won't speak to the details of the discussions in that group, it is quite unlikely that char would be changed to be 32-bits. It would break far too much. > The probable official full support of Unicode 4 and 3.2 will come with new classes derived from Character and String (UChar and UString are their name in the IBM ICU package, but Sun may also keep the class name but designate them under the java.text package insteads of the core's java.lang package, and a compiler option (such as the target Java version) may allow a class author to compile its code according to the default java.lang.String or java.text.String class if the package name is not specified by an explicit import). In ICU4J (which is an add-on package for Java), we don't have classes UChar and UString. For supplementary support, we have: - UCharacter, which provides property functions based on code points -- rather than chars (It also has all the UCD properties instead of just the small fraction that are in the standard JDK.) - UTR16, which provides utilities for using supplementaries with String, StringBuffer and char[] The other functionality, such as Normalizer, UnicodeSet, Collator, StringSearch, Transliterator, etc. all handle supplementary characters. See http://oss.software.ibm.com/icu4j/doc/index.html for details. BTW, I only very quickly scan long documents, such as those that you and a few others are blessed with the ability to produce. So there may be other items that I don't catch. Marc > -- Philippe. > ----- Original Message ----- > From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]> > To: "Philippe Verdy" <[EMAIL PROTECTED]> > Sent: Wednesday, June 04, 2003 4:36 PM > Subject: Re: Encoding converion through JDBC > > > > From: "Philippe Verdy" <[EMAIL PROTECTED]> > > > > Phillipe, you went on for quite a while and I admit most of the things you > > talked about are not thing about which I have knowledge. But some of the > > things you talked about, I do understand, and in those cases you were wrong. > > Psychologically, it causes me to wonder how much of the rest of this message > > converys accurate information. > > > > Specifically, you talk about SQL Server but most of what you said about it > > is inaccurate. You cannot stored big endian data without risking corruptipn, > > you can only store UCS-2, it is not surrogate aware can can thus be said to > > truly support onlu UCS-2, not UTF-16, and the "N" prefix fields *always* > > mean UCS-2 for MSSQLS, period. > > > > You have a gift -- that of being able to speak knowledgably. But please, use > > that gift for *good* and do not move past what you know. > > > > Please, think about it? > > > > MichKa > > > >