Peter responded:

> Ken Whistler wrote on 06/25/2003 06:57:56 PM:
> 
> > People could consider, for example, representation
> > of the required sequence:
> > 
> >   <lamed, qamets, hiriq, final mem>
> > 
> > as:
> > 
> >   <lamed, qamets, ZWJ, hiriq, final mem>
> 
> So, we want to introduce yet *another* distinct semantic for ZWJ?

Actually, no, I don't. That was just the first candidate that
came to mind.
 
> We've 
> got one for Indic, another for Arabic, another for ligatures (similar to 
> that for Arabic, but slightly different). Now another that is "don't 
> affect any visual change, just be there to inhibit reordering under 
> canonical ordering / normalization"?

As I pointed out in a separate response, just putting the ZWJ
there would *already* interrupt the reodering of the sequence.
There is nothing new about that. The problem is that you might
not be able to count on it not effecting a visual change,
because the generic meaning of ZWJ is now intended to be
ligation requesting, which does have visual consequences.

I now like better the suggestions of RLM or WJ for this. Both
of those format controls, by *definition*, should have no
impact on visual display in this context, the RLM because it
would be inserted between two NSM's that pick up strong
R-to-L directionality from the consonant, and the WJ
because it would be inserted at a position where there already
is no word/line break opportunity. But either of them,
by their current definition and properties, would break the
sequences for canonical reordering. So they already have
the semantics of the putative new control in question: no
effect on visual display, while inhibiting of the canonical
reordering of the point sequence.

> > The presence of a ZWJ (cc=0) in the sequence would block
> > the canonical reordering of the sequence to hiriq before
> > qamets. If that is the essence of the problem needing to
> > be addressed, then this is a much simpler solution which would
> > impact neither the stability of normalization nor require
> > mass cloning of vowels in order to give them new combining
> > classes.
> 
> Yes, it would accomplish all that; and is groanable kludge. 

Why is making use of the existing behavior of existing characters
a "groanable kludge", if it has the desired effect and makes
the required distinctions in text? If there is not some
rendering system or font lookup showstopper here, I'm inclined
to think it's a rather elegant way out of the problem.

> At least with 
> having distinct vowel characters for Biblical Hebrew, we'd come to a point 
> we could forget about it, and wouldn't be wincing every time we considered 
> it.

Au contraire. We'll be wincing forever for this one. There's
no way of getting around the fact that this is merely a cloning
of a the whole set of points in order to have candidates for
a reassigned set of combining classes.

You're stuck between a rock and a hard place on this one.

The UTC cannot entertain merely fixing the existing combining
class assignments, because it breaks the normalization stability
guarantee. We've all come to acknowledge and most to accept that,
even though it still elicits groans.

But in the 10646 WG2 context, coming in with a duplicate set
of Hebrew points is not going to make any sense, because, as
someone (John Cowan?) has already pointed out, 10646 doesn't
assign combining classes, and so trying to justify character
cloning on the basis of distinct combining class assignments
isn't going to make any sense there. You can always come in
with the proposal to encode BIBLICAL HEBREW POINT PATAH and
say, even though the glyph is identical, see, the name is
different, so the character is different. But this is a pretty
thin disguise, and is vulnerable to simple questioning:
What is it for? Well, to point Biblical Hebrew texts. But
what was U+05B7 HEBREW POINT PATAH for? Well, to point Biblical
Hebrew texts (or any Hebrew text, for that matter...). Well,
then, what is the difference? Uh, the combining classes for
the two are different. What is a combining class?  ... and
so on.

I'm trying to find a way, using existing characters and a
simple set of text representational conventions, to make
the distinctions and preserve the order relations that you
need for decent font lookup, without the whole enterprise
washing up on either of those two rocks.

--Ken


Reply via email to