2011/9/13 Kent Karlsson <kent.karlsso...@telia.com>: > I'm not at all sure the suggested workaround works in general, and not just > in a few examples. > > Another possibility, as long as we are just "brain-storming" a bit here, is > to use the bidi category S (Segment Separator) for the LEVEL DIRECTION MARK > (which would be a normally invisible (bidi) format control character). I.e. > it would work just like TAB (as specified in the UBA), except that it > wouldn't do tabbing. But then it would work only for the paragraph bidi > direction. However, the idea that TAB (and the other bidi S characters) > magically cuts through *all* nested bidi levels seems a bit strange to me... > Going just to the closest explicit embedding/(override) level seems less > drastic. Without formally subdividing "S", one could treat different "bidi > S" (old and new) to reset to different levels (to the embedding bidi level > for the new one, and to the paragraph bidi level for the three old ones). (I > know, this would be a form of "option 1" in the PRI.)
You can turn it as you want it is still a splitting of the bidi class if you change the behavior of class S like this. Onve again, if you want to encode new characters, why would you restrict yourself to reusing an existing bidi class just to break it? Think it or simply: the stability is just meant to NOT break any bidi rendering of existing fonts that use assigned characters. For existing unassigned code points, there's simply never been any stability warrantied for any property, so you can assign the properties much more freely. I am convinved that if you need new characters, the only good question is which ones? – (1) Either you duplicate the encoding of existing whitespaces, punctuations, symbols to give them a different bidi class (then you can reuse one of the existing classes). But many characters would have to be duplicated if you start this way (and WG2 will most probably strongly oppose to this UTC proposal). – (2) Either you encode new bidi controls, to which you assign new bidi classes. This does not break ANY existing text rendered with any existing renderers. Of course you'll need an updated renderer (but not new fonts), otherwise existing implementation will display a .notdef glyph and the user will know visibly that there's something in the encoded text which may be important to render the text correctly. The second option is certainly the least disturbing (and the most economical in terms of encoding, and the most likely to be accepted without much troubles by voting NBs in WG2). It does not break the policy on ANY existing encoded texts. It gives NO surprise to users, or at least they know that something is missing, and their decision for what to do will be exactly like when they are presented newly encoded texts containing newly assigned characters for which they still don't have a supporting font or any support in their existing renderer for the complex shaping/layout features required by a newly encoded script. In other words, the UTC policy about the stability of Bidi classes should be minimally relaxed, by rewording into something like: « The bidi class property value of any assigned code point is IMMUTABLE (and will never change for the same assigned code point in any subsequent versions of the UCS). » instead of speaking about the poorly defined concept of « splitting the bidi classes ». In fact if you add a new bidi class for new characters, you effectively never split any existing bidi class, and you don't break the IMMUTABILITY rule I give just here (which is similar to the rule of immutability of other normative character properties of assigned code points, such as the code point value, the character name, the decomposition mapping and the combining class for the 4 standard normalisations, and even the age version). I can accept that the full set of possible values for the general category is restricted and inextensible, because these categories are frequently used in algorithms where the GC is supposed to be fully partitioned with a constant number of elements (a fixed enumeration) for impelmenting lots of other algorithms or derived properties. But the Bidi class for characters is just meant for the rendering, and has no other use than implementing the UBA itself; it should never be used for any exclusive yes/no decision. -- Philippe.