Frank,

> Sure Unicode defined those planes, but defining planes 
> without defining the characters in it mean not too much 
> to people.

Which is exactly the complacency that Doug Ewell was warning
about. Too many people assumed that even though UTF-16 was
defined in Unicode 2.0 they could ignore it indefinitely,
since no encoded characters had been assigned on the other
planes. But if they weren't getting prepared in their code,
they have been left flatfooted now that suddenly there *are*
48,000 or so characters defined on the other planes.

> How can
> you implement case conversion, property mapping without 
> knowing what is inside.

This is a fundamental misconception about Unicode (and character
encodings in general) that unfortunately seems to be spreading.

There are differences between operations on *characters* (such
as case conversion), which obviously require the characters
themselves to be defined to make any sense, and operations
on *code points* (such as UTF-8 <--> UTF-16 conversion), which
make no reference to characters.

Many programmers get hopelessly confused about this distinction,
apparently, since the API's just pass around the code points
associated with the characters, and not the encoded characters 
per se. (And this is a disease that was inflicted on the world
23 years ago when Kernighan and Ritchie published a certain
language that unfortunately chose to call its 8-bit numeric
data type a "char".)

> In particular, DOES GB18030 define code point to
> code point mapping (beyond BMP) between Unicode?

Yes. Absolutely it does. It is spelled out in the standard
itself.

GB 18030 <--> Unicode conversion is basically like a big
UTF, with an enormous table for all the GBK part of the
encoding, and a bunch of offset ranges to convert all the
other code points.
 
> Unless you 
> can said that is YES and show me the specification how to 
> map between
> them, there are no way people can implement code set 
> conversion between GB18030 and Unicode.

http://www-106.ibm.com/developerworks/library/u-china.html

Markus Scherer's excellent documentation of GB 18030, with
code snippets and pointer to a complete ICU implementation.

> 
> That question is not wheather they should define the 
> relationship or not, but have they defined it yet.

They have.

--Ken


Reply via email to