From: Peter Kirk <[EMAIL PROTECTED]> > > On 24/04/2004 06:30, Peter Constable wrote: > > > >If data is always encoded in canonical order, then having a VS within > >the combining mark sequence wouldn't create any normalization problems, > >that's true. But you well know that people do not want their Hebrew data > >in canonical order. Even if they did, it couldn't be guaranteed. > > Yes, canonical ordering cannot be guaranteed. But ordering rules can be > specified, and departures from them treated as spelling errors. > > >There's a problem not only in cases of the form B M1 M2 VS, but also in > >cases of the form B M1 VS M2. Of course, the issues are different. The > >first may normalize to B M2 M1 VS; the second perhaps *ought* to > >normalize to B M2 M1 VS, but that won't happen. > > Well, perhaps the best thing here is to specify that the mark to which > the VS applies should always come first after the base character and > followed by the VS, irrespective of the normal canonical order. At least > that would be unambiguous, and stable under normalisation (since the > only relevant precomposed characters are composition exceptions). > Other orderings should simply be considered spelling errors.
As someone who has put a lot of thought into variation selectors, let me point out something. In the case of B M1 M2 VS what would the variation selector indicating as being varied if such a thing were to be allowed? Since variation selectors are combining marks, then just like any other combining marks they should be viewed as being applied to the entire combining sequence up to that point, and hence should be viewed as indicating a variant of B M1 M2, and not of just the preceding mark. Any other treatment complicates things too much. Thus in the case of the vowel marks, one could add a series of variation sequences with one for each base character that the variant vowel mark would be used with. If this causes too many other problems, then adding a new mark for the vowel variant instead of trying to adapt variation selectors to the task would seem to be the best solution.