On top of that, it looks like 950 maps a bogus symbol or punctuation character to U+2574. (2574 is one of a set of 4, and only 1 is mapped for starters. Fonts covering CP950 give a way different image for that character than you'd expect from either the charts or the names...
I let some people know about this, but fixing it would cause even more problems one assumes. A./ At 11:13 PM 12/18/01 -0500, Tex Texin wrote: >Ken, > >Thanks for commiserating. >Yes, I noticed the differences in mapping tables. >I am glad Sybase gave different character sets different names. >I am curious how you deal with Unicode and HKSCS in the private use >area, sometimes.... >For that matter I wonder what a user in HK does when their Windows >operating system is upgraded and their files that had HKSCS characters >in the private use area now expect them in other locations. > >With respect to messy tables, and HKSCS and GB18030 in particular, it is >a damn shame that there is no entity making a case to governments and >others creating character set standards, that they not consider the set >defined until it is registered to ISO and Unicode, so some of the silly >mistakes get worked out first. A little press relations here, with >recent history and resulting problems as evidence and the corrections >that came about once registration was attempted, would show that working >these things out in committee is helpful and not a threat to national >soverignty. > >Oh well. Surely this won't happen again in 2002.... >tex > > > >Kenneth Whistler wrote: > > > > Tex, > > > > > > > > Thanks for this and the several private responses. > > > > > > For anyone interested, in addition to the Microsoft page: > > > http://www.microsoft.com/hk/hkscs/ > > > > > > The HK Gov't has a web page, fonts and mapping tables: > > > http://www.info.gov.hk/digital21/eng/hkscs/introduction.html > > > > And to add to the chaos and confusion, note that the HKSCS > > patch for Windows Code Page 950 does not map exactly the > > same as the HK Government mapping table. And that the HK > > Government mapping table has at least a couple of blatant > > errors in it. And that the HKSCS path for Windows Code Page 950 > > (like Code Page 950 without the extension, but even moreso) > > has duplicate mappings in it that need to be resolved in > > order to roundtrip through Unicode. And you have no guarantee > > that various vendors' attempts to sort out the HK Government > > mapping table and Windows Code Page 950 + HKSCS path behavior > > will themselves produce matching results. > > > > > > > > Oracle gave a nice paper at a recent Unicode conference: > > > http://www.unicode.org/iuc/iuc18/papers/b19.ppt > > > > > > It amazes me that in the year 2000, organizations are still creating > > > chaos by amending definitions of standards especially code pages, > > > without giving the new creation its own name or some other way of > > > distinguishing it, and then on top of that creating multiple mapping > > > tables. > > > > > > I understand the desire to get new functionality into users hands, but > > > would it have been a problem to rename either big5 or 950 to something > > > like big-6 or big-5hk or 950HK or 951? > > > > Sybase is now supporting "cp950" (+euro, by the way -- another addition > > that may or may not be supported in a particular Windows implementation, > > depending on date) and a separate "big5hk", so if you interoperate > > with Sybase, you should know what you are getting. However, like > > everybody else, it is hit or miss for us when a platform or other > > data announces itself to us as "cp950" or "big-5", whether it > > is with or without the HKSCS extensions. > > > > > So now we can't tell if big-5 or 950 will or won't have this data, or > > > even whether Unicode data will have these characters in the private use > > > area or elsewhere, or whether software that may be on the other end of > > > the pipe supports HKSCS or not, or even if their operating system has > > > the patch or not. > > > > > > Although "that which we call a rose by any other name would smell as > > > sweet", > > > calling everything a rose, makes it hard to know when you are getting a > > > rose. > > > > I think this was all part of a conspiracy for Chinese to catch up > > with Japanese, since the Chinese code pages (until now) didn't have > > a mess the scale of SJIS. But between HKSCS and GB 18030, they are > > making up for lost time. > > > > --Ken > > > > > > > > Here's hoping for less chaos in 2002! > > > tex > >-- >------------------------------------------------------------- >Tex Texin Director, International Business >mailto:[EMAIL PROTECTED] Tel: +1-781-280-4271 >the Progress Company Fax: +1-781-280-4655 >------------------------------------------------------------- >For a compelling demonstration for Unicode: >http://www.geocities.com/i18nguy/unicode-example.html