On 24/04/2004 15:16, Ernest Cline wrote:




[Original Message]
From: Peter Kirk <[EMAIL PROTECTED]>

On 24/04/2004 11:22, Ernest Cline wrote:



...


As someone who has put a lot of thought into variation selectors, let me
point out something. In the case of B M1 M2 VS what would the variation
selector indicating as being varied if such a thing were to be allowed? ...



I have re-read section 15.6 of the standard. It is absolutely clear that a VS applies only to the immediately preceding character, and not to a complete combining sequence:


A variation sequence, which always consists of a base character followed by the variation selector,...

There is no suggestion that more than a single character may precede the VS.


...Since variation selectors are combining marks, then just like any other
combining marks they should be viewed as being applied to the entire
combining sequence up to that point, and hence should be viewed as
indicating a variant of B M1 M2, and not of just the preceding mark. ...



Whether or not this applies to other combining marks, it explicitly does not apply to VSs. Well, it is of course also explicit that any sequence of a combining mark followed by a VS is not sanctioned for standard use.


... Any other treatment complicates things too much.



Some other treatment is clearly what the UTC had in mind.


I always assumed that VS's are intended to apply to just the immediately preceding character, and not to a whole combining character sequence. In my opinion, "Any other treatment complicates things too much." But perhaps there are others who can tell us what the UTC intended for this.



Which is why as things currently stand, the standard calls for the only legal sequences to involve base characters only. To quote from Section 15.6:

"The base character in a variation sequence is never a combining
character or a decomposable character. The variation selectors
themselves are combining marks of combining class 0 ..."

In order to get Variation Selectors even able to be applied to
other combining marks one would need to change the way
Variation Selectors work, and doing that is what would complicate
things too much.



I agree that a change is necessary. I disagree that it would complicate things too much.

Thus in the case of the vowel marks, one could add a series of variation
sequences with one for each base character that the variant vowel
mark would be used with. If this causes too many other problems, ...


It would indeed if someone considers that every such combining sequence has to be enumerated and defined individually. But if one simply says that every combining sequence containing e.g. the sequence <QAMATS, VS1> is legal and represents use of the variant qamats glyph, then there is no problem.



There are tons of problems once one adds in other combining marks being applied to the character as well, because then under normalization, unless the mark you were applying the variation selector to is of combining class 0, you can't assure that the variation selector will stay with the mark. Having the existing Variation Selectors behave in that way would break the normalization stability guarantee, ...


This is untrue. Normalisation stability does not apply when the text is changed, and inserting a variation selector is a change to the text. I have never suggested changing the combining class or other normalisation properties of existing VSs. The way to ensure that a VS stays with the mark it applies to is to ensure that in the part of the combining character sequence before the VS all combining characters are already in canonical order. Well, I can see that there are potential problems where there are canonical decompositions (which are not composition exclusions), but that does not apply to the cases I am interested in.


... so that can't be done, so you would need to introduce new Variation
Selectors that would behave in this novel fashion.


In order to do so, under the existing combining class framework you
would need to add variation selectors with the same combining class
as the mark it works with.  An alternative would be to add yet another
property for these new Variation Selectors so as to have it go outside
the existing canonical combining class rules when it comes to
canonical ordering..   Either way, it won't work properly with existing
implementations, involves a lot more work than adding another
vowel mark, and will not solve the problem of legacy data using the
vowel mark for both the main version and its variant. ...


The former, VSs with various combining classes, would work perfectly well with existing implementations as soon as they have been updated with character data for these new characters. Adding a new mark has no advantage over this, as it also cannot be used until the character data is updated, and the disadvantage that (once the character data has been updated) the VS, being default ignorable, is simply ignored when a font which does not support it is used, whereas the new mark is supported only when it is included in a font. There will always be a legacy data problem, but the VS mechanism was defined precisely to minimise this problem, and as such it has the potential of minimising it for combining characters just as it does for base characters.


... I just don't
see the benefits justifying the costs. If there were a number of use
cases for doing this, it might justify the effort required, but for only
a couple of vowel marks, I can't see it.



Well, it is more than a couple, and anyway I don't see the costs as being high. On the Hebrew list I listed yesterday six candidates for definition as variation sequences, each of one Hebrew combining mark plus a variation selector. Five of these sequences have the potential of solving an issue for which a proposal either has been made or is being considered, and for which the alternative would probably be to define a new character. (The sixth had apparently been rejected as too marginal: it probably doesn't merit a separate character but might be worth defining as a variation sequence.) So potentially we save five new characters by using either an already defined VS or a special one defined for Hebrew. I have just thought of a seventh possible sequence, although in this case the alternate glyph is already encoded as an alphabetic presentation form (U+FB1E). There is also the possibility of using VSs to indicate alternative pointing schemes. These are all in Hebrew. There may well be similar examples in other scripts - in fact I vaguely remember seeing that some texts (German black letter, I think) distinguish umlaut from diaeresis, and this is something which could be handled by a combining character VS (although here there are problems with normalisation composition). So this is potentially a large field!


--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Reply via email to