Peter responded: > Ken Whistler wrote on 06/25/2003 06:57:56 PM: > > > People could consider, for example, representation > > of the required sequence: > > > > <lamed, qamets, hiriq, final mem> > > > > as: > > > > <lamed, qamets, ZWJ, hiriq, final mem> > > So, we want to introduce yet *another* distinct semantic for ZWJ?
Actually, no, I don't. That was just the first candidate that came to mind. > We've > got one for Indic, another for Arabic, another for ligatures (similar to > that for Arabic, but slightly different). Now another that is "don't > affect any visual change, just be there to inhibit reordering under > canonical ordering / normalization"? As I pointed out in a separate response, just putting the ZWJ there would *already* interrupt the reodering of the sequence. There is nothing new about that. The problem is that you might not be able to count on it not effecting a visual change, because the generic meaning of ZWJ is now intended to be ligation requesting, which does have visual consequences. I now like better the suggestions of RLM or WJ for this. Both of those format controls, by *definition*, should have no impact on visual display in this context, the RLM because it would be inserted between two NSM's that pick up strong R-to-L directionality from the consonant, and the WJ because it would be inserted at a position where there already is no word/line break opportunity. But either of them, by their current definition and properties, would break the sequences for canonical reordering. So they already have the semantics of the putative new control in question: no effect on visual display, while inhibiting of the canonical reordering of the point sequence. > > The presence of a ZWJ (cc=0) in the sequence would block > > the canonical reordering of the sequence to hiriq before > > qamets. If that is the essence of the problem needing to > > be addressed, then this is a much simpler solution which would > > impact neither the stability of normalization nor require > > mass cloning of vowels in order to give them new combining > > classes. > > Yes, it would accomplish all that; and is groanable kludge. Why is making use of the existing behavior of existing characters a "groanable kludge", if it has the desired effect and makes the required distinctions in text? If there is not some rendering system or font lookup showstopper here, I'm inclined to think it's a rather elegant way out of the problem. > At least with > having distinct vowel characters for Biblical Hebrew, we'd come to a point > we could forget about it, and wouldn't be wincing every time we considered > it. Au contraire. We'll be wincing forever for this one. There's no way of getting around the fact that this is merely a cloning of a the whole set of points in order to have candidates for a reassigned set of combining classes. You're stuck between a rock and a hard place on this one. The UTC cannot entertain merely fixing the existing combining class assignments, because it breaks the normalization stability guarantee. We've all come to acknowledge and most to accept that, even though it still elicits groans. But in the 10646 WG2 context, coming in with a duplicate set of Hebrew points is not going to make any sense, because, as someone (John Cowan?) has already pointed out, 10646 doesn't assign combining classes, and so trying to justify character cloning on the basis of distinct combining class assignments isn't going to make any sense there. You can always come in with the proposal to encode BIBLICAL HEBREW POINT PATAH and say, even though the glyph is identical, see, the name is different, so the character is different. But this is a pretty thin disguise, and is vulnerable to simple questioning: What is it for? Well, to point Biblical Hebrew texts. But what was U+05B7 HEBREW POINT PATAH for? Well, to point Biblical Hebrew texts (or any Hebrew text, for that matter...). Well, then, what is the difference? Uh, the combining classes for the two are different. What is a combining class? ... and so on. I'm trying to find a way, using existing characters and a simple set of text representational conventions, to make the distinctions and preserve the order relations that you need for decent font lookup, without the whole enterprise washing up on either of those two rocks. --Ken