Hi folks, I appreciate the answers to my 6 questions,
some of which came directly from the authors. I think
that’s neat. 

I’m afraid I have a little bit of a beef about the
Unicode documentation here, forgive me if this has
already been brought up. How come UAX #27 says that
Unicode 3.0 had 34 non characters, 32 of which are in
supplementary planes? First of all, there are no
characters defined in supplementary planes in Unicode
3.0. Secondly, where were these 32 non characters
published? The Unicode 3.0 book has a section called
“non characters” where it only describes the 2 non
characters in the specials block, and the code charts
of that era totally leave out the 32 non character
code values in the arabic presentation form block (it
doesn’t say that they are non characters like it does
for the specials). For this reason I’m going to refer
to these as the “hidden” non characters. 

How many planes are defined in Unicode 3.1? UAX #27
seems to indicate that it depends on what
transformation format is used (“A process shall
interpret the Unicode code units in accordance with
the Unicode Transformation Format used.”). UTF-8 seems
to only define 17 planes but UTF-32 seems to have 128
groups of 256 planes. UAX #27 says that Unicode 3.1
defines 3 new supplementary planes... including plane
14. I have difficulty with that statement.. does that
mean that there are only 3 new planes, or that there
are (at least) 14 new planes, but only 3 of which have
plane names and characters in them? At least 17 planes
must be defined in order to define the 32 non
characters in 16 supplementary planes, that’s what
common sense would say anyway. 

This whole “plane” business suffers from a lack of
documentation. UAX #27 talks about planes as if it’s
ancient history but the Unicode 3.0 book does not
mention planes once (it’s not in the index anyway). I
would like the Unicode documentation to explain
exactly what a plane is without requiring the 10646
documentation which is only available for a fee. In
fact, according to UAX #27 the planes are defined in
terms of what WILL be in 10646-2. 

I’m trying to get a grasp on exactly how many planes
are defined in Unicode in part because it seems to
affect the number of non characters that are defined.
I also want to know the maximum number of characters
that Unicode can encode. So far I reckon there are
1114112 (assuming 17 planes) minus 2048 (half
surrogates) minus 2 (special non characters) minus 32
(“hidden” non characters) minus 32 (non characters due
to some arbitrary association between 16 higher planes
code values and the special non characters code
values) = 1111998 code positions available for
characters. What’s with this 1114111 number I’ve seen
on this list? 

BTW, it doesn’t make sense for every code position
ending in FFFF or FFFE to be a non character. Why
isn’t the same rule applied to the “hidden” non
characters, so that every code value ending in FDD0 to
FDEF is also a non character? Is it to contribute to
their “hidden” nature? 

-Bernard


__________________________________________________
Do You Yahoo!?
Listen to your Yahoo! Mail messages from any phone.
http://phone.yahoo.com

Reply via email to