Hi folks, I appreciate the answers to my 6 questions, some of which came directly from the authors. I think that’s neat.
I’m afraid I have a little bit of a beef about the Unicode documentation here, forgive me if this has already been brought up. How come UAX #27 says that Unicode 3.0 had 34 non characters, 32 of which are in supplementary planes? First of all, there are no characters defined in supplementary planes in Unicode 3.0. Secondly, where were these 32 non characters published? The Unicode 3.0 book has a section called “non characters” where it only describes the 2 non characters in the specials block, and the code charts of that era totally leave out the 32 non character code values in the arabic presentation form block (it doesn’t say that they are non characters like it does for the specials). For this reason I’m going to refer to these as the “hidden” non characters. How many planes are defined in Unicode 3.1? UAX #27 seems to indicate that it depends on what transformation format is used (“A process shall interpret the Unicode code units in accordance with the Unicode Transformation Format used.”). UTF-8 seems to only define 17 planes but UTF-32 seems to have 128 groups of 256 planes. UAX #27 says that Unicode 3.1 defines 3 new supplementary planes... including plane 14. I have difficulty with that statement.. does that mean that there are only 3 new planes, or that there are (at least) 14 new planes, but only 3 of which have plane names and characters in them? At least 17 planes must be defined in order to define the 32 non characters in 16 supplementary planes, that’s what common sense would say anyway. This whole “plane” business suffers from a lack of documentation. UAX #27 talks about planes as if it’s ancient history but the Unicode 3.0 book does not mention planes once (it’s not in the index anyway). I would like the Unicode documentation to explain exactly what a plane is without requiring the 10646 documentation which is only available for a fee. In fact, according to UAX #27 the planes are defined in terms of what WILL be in 10646-2. I’m trying to get a grasp on exactly how many planes are defined in Unicode in part because it seems to affect the number of non characters that are defined. I also want to know the maximum number of characters that Unicode can encode. So far I reckon there are 1114112 (assuming 17 planes) minus 2048 (half surrogates) minus 2 (special non characters) minus 32 (“hidden” non characters) minus 32 (non characters due to some arbitrary association between 16 higher planes code values and the special non characters code values) = 1111998 code positions available for characters. What’s with this 1114111 number I’ve seen on this list? BTW, it doesn’t make sense for every code position ending in FFFF or FFFE to be a non character. Why isn’t the same rule applied to the “hidden” non characters, so that every code value ending in FDD0 to FDEF is also a non character? Is it to contribute to their “hidden” nature? -Bernard __________________________________________________ Do You Yahoo!? Listen to your Yahoo! Mail messages from any phone. http://phone.yahoo.com