Ernest suggested: > There are currently some 10 totally unused planes, with not even any > tentative plans for them, Allocating one or two those into additional > Private Use Areas with a variety of default characteristics instead of > the monotonous default characteristics of the existing Private Use > Areas should not prove too difficult.
Fine. Make your formal proposal to the UTC and to SC2/WG2 and see whether it is "difficult" or not to convince the committees of the appropriateness of your approach. > For example, 26 blocks of 128 > Private Use Combining Marks each, each block corresponding to > one of the existing canonical combining classes (with perhaps a > larger block for class 0) would amply satisfy the needs of most > private use scripts for combining marks. Similarly, blocks for > additional characters that would have other properties ^^^^^^^^^^^^^^^^ which would be what, exactly? > should > be simple to define and for most combinations of property values, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ which would be what, exactly? As of Unicode 4.0.1, PropertyAliases.txt now lists 82 distinct character properties. Some of those, particularly those most relevant to complex script behavior and rendering, such as General_Category, Bidi_Class, Canonical_Combining_Class, Joining_Type, etc., are multi-valued. Do you have any idea how big the numbers start getting when combinatorics start to get involved here? Or are you planning to do the research first, via a comprehensive implementation of character properties such as IUC, to first determine what the actual existing number of combinations of property values is for the existing repertoire and properties and then make a principled projection of that into the uncertain world of characters for scripts which have not yet been encoded or modeled? > 128 characters should also prove to be exceedingly ample For what? > I'd have to take the time to list them, but a quick glance convinces > me that there are at most several hundred combinations that would > need to be supported if we limit things to just those combinations > already in use. This may be correct, but you'd have to make the case based on the existing data from property implementations. > (it might take more, if for example all 256 potential > combining classes were supported instead of the 26 listed in > UCD.html), At 128 characters per combination plus more for a > few that might need them, it should prove possible to handle this > in 1 or 2 planes. Which still begs the fundamental questions: Why this scheme instead of a much more flexible scheme, as outlined by Rick, for having an implementation with API support for establishing PUA properties on an as-needed basis? (Which requires *no* action by the UTC at all, by the way.) What makes you think, once you have such a scheme of property combinations worked out, and once you convinced the UTC of it (which I doubt), that you could also convince SC2/WG2 to do something comparable in 10646 to keep the standards in synch? Recall that SC2/WG2 has almost *no* concept of character properties -- those are added by the Unicode Standard. Bring in a proposal that says, "We need to add two more planes of private use characters, with these special properties, because XYZ..." and you'll get a row of blank stares from the national body representatives. Finally, assuming that you could get something like this into the standards, what makes you think that the platform vendors would complicate and expand their character property tables to support this speculative scheme? They have the option to not support all characters in the standard, and a new plane or two full of PUA characters with a checkerboard of speculative property assignments strike me as prime candidates for the kind of stuff they would simply say, "We have no interest in supporting these things." I think you're spitting into the wind if you think you can force, through the character standardization process, the major platform vendors to support the kind of PUA functionality you are after, when they could do so *today* via much more extensible and architecturally sensible means given the existing PUA characters, but have not yet chosen to do so. --Ken