> [Original Message] > From: Peter Kirk <[EMAIL PROTECTED]> > > On 24/04/2004 15:16, Ernest Cline wrote: > > > >In order to get Variation Selectors even able to be applied to > >other combining marks one would need to change the way > >Variation Selectors work, and doing that is what would > >complicate things too much. > > I agree that a change is necessary. I disagree that it would > complicate things too much. > > >There are tons of problems once one adds in other combining marks > >being applied to the character as well, because then under normalization, > >unless the mark you were applying the variation selector to is of > >combining class 0, you can't assure that the variation selector will > >stay with the mark. Having the existing Variation Selectors behave > >in that way would break the normalization stability guarantee, ... > > This is untrue. Normalisation stability does not apply when the text is > changed, and inserting a variation selector is a change to the text. I > have never suggested changing the combining class or other normalisation > properties of existing VSs. The way to ensure that a VS stays with the > mark it applies to is to ensure that in the part of the combining > character sequence before the VS all combining characters are already in > canonical order. Well, I can see that there are potential problems where > there are canonical decompositions (which are not composition > exclusions), but that does not apply to the cases I am interested in. > > >... so that > >can't be done, so you would need to introduce new Variation > >Selectors that would behave in this novel fashion. > > > >In order to do so, under the existing combining class framework you > >would need to add variation selectors with the same combining class > >as the mark it works with. An alternative would be to add yet another > >property for these new Variation Selectors so as to have it go outside > >the existing canonical combining class rules when it comes to > >canonical ordering.. Either way, it won't work properly with existing > >implementations, involves a lot more work than adding another > >vowel mark, and will not solve the problem of legacy data using the > >vowel mark for both the main version and its variant. ... > > > > The former, VSs with various combining classes, would work perfectly > well with existing implementations as soon as they have been updated > with character data for these new characters. Adding a new mark has no > advantage over this, as it also cannot be used until the character data > is updated, and the disadvantage that (once the character data has been > updated) the VS, being default ignorable, is simply ignored when a font > which does not support it is used, whereas the new mark is supported > only when it is included in a font. There will always be a legacy data > problem, but the VS mechanism was defined precisely to minimise this > problem, and as such it has the potential of minimising it for combining > characters just as it does for base characters.
There are problems. Suppose, we define a new variation selector that will stay with the preceding mark under normalization. Now consider what happens when implementations conforming to a standard of Unicode that does not know about the new character normalizes the sequence BC CM180 CM160 NVS BC = Base Character CM# = Combining Mark of ccc # NVS = New Variation Selector. As far as it knows, the new variation selector is an undefined character with a ccc of 0, so when normalizing this it will reorder it as: BC CM160 CM180 NVS Now lets have this "normalized" string be passed on to an implementation which knows about this NVS, There were two schemes I proposed for implementing this NVS. Both have problems, as I will point out below. One involves giving it the novel characteristic of ignoring the canonical combing classes and always sticking with the character. Under this scheme the NVS will stick with the CM180 which means that the character sequence the implementation receives will not be the one originally intended. This problem is too severe to be ignored. This scheme would have made sense if it had been available from the start of Unicode, but to add it now would cause too many problems with the interoperability of data. The other, and the one which you preferred anyway, involves using different variation selectors for each combining class. At least with this solution, when an implementation that did know about the character encountered the data "normalized" by an unaware implementation, it would be able to renormalize it. Given the nature of Hebrew vowel points, where each existing point is its own one character canonical combining class, employing variation selectors with non-zero ccc's will require just as many new characters which if you want them in an appropriate block and have the Default Ignorable property in unaware implementations will require placing then in the SSP. Even then they will only be ignored by implementations conforming to Unicode 3.2 or later. Adding Variation Selectors with non-zero canonical combining classes is possible, but I fail to see the benefits from adding new Variation Selectors on the SSP outweighing the benefits of defining new vowel marks in the Hebrew block. It's not as if the Hebrew block does not have the space to add additional vowel points, and frankly, anything on Plane 0 is likelier to be implemented sooner and on a wider set of platforms..