On 24/04/2004 06:30, Peter Constable wrote:

...


If data is always encoded in canonical order, then having a VS within
the combining mark sequence wouldn't create any normalization problems,
that's true. But you well know that people do not want their Hebrew data
in canonical order. Even if they did, it couldn't be guaranteed.



Yes, canonical ordering cannot be guaranteed. But ordering rules can be specified, and departures from them treated as spelling errors. I can't help thinking that it would have been much simpler for everybody if Unicode had simply done that rather than permitting canonical reordering; but that is obviously a battle already lost.


There's a problem not only in cases of the form B M1 M2 VS, but also in
cases of the form B M1 VS M2. Of course, the issues are different. The
first may normalize to B M2 M1 VS; the second perhaps *ought* to
normalize to B M2 M1 VS, but that won't happen.



Well, perhaps the best thing here is to specify that the mark to which the VS applies should always come first after the base character and followed by the VS, irrespective of the normal canonical order. At least that would be unambiguous, and stable under normalisation (since the only relevant precomposed characters are composition exceptions). Other orderings should simply be considered spelling errors.



-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/




Reply via email to