Re: Major Defect in Combining Classes of Tibetan Vowels

Kenneth Whistler Wed, 25 Jun 2003 18:17:29 -0700

John Hudson wrote:

> This idea of Hebrew vowels as 'fixed' marks is problematical, because in 
> Biblical Hebrew they are not fixed: they move relative to additional marks 
> (other vowels or cantillation marks).
> 
> >It may be more *difficult* for applications to do correct rendering,
> >but there was never any intention in the standard that I know
> >of that a sequence <hiriq, patah> would render differently
> >than a sequence <patah, hiriq>.
> 
> Yes, this is what I am saying is wrong: <hiriq, patah> *should* render 
> differently from <patah, hiriq>. This example is particularly important, 
> because it occurs in the spelling of yerushalaim, the Masoretic 
> approximation of yerushalayim. Correct rendering requires that the hiriq 
> follows the patah, and not vice versa.


Understood. See my separate response on the Biblical Hebrew thread.

> They are necessary to render Biblical Hebrew text correctly using current 
> font and layout engine technologies. These technologies work perfectly for 
> Biblical Hebrew so long as Unicode canonical ordering is ignored. I think 
> there is very little impetus to change or develop new implementations to 
> take into account what strikes most of those involved with Biblical Hebrew 
> text processing as an error in Unicode.

"so long as Unicode canonical ordering is ignored". But as you
and Peter point out, you cannot actually ignore canonical
ordering, since in the Internet context it is outside of
the end user's control. Once text escapes your own system
for interchange, it may be subject to normalization, and you
are kaputt.

As stated, this is also turning into a typical--dare I say, religious--
confrontation of "I'm right and you're wrong" with no compromise
in prospect and people getting ready to shoot themselves in the
foot to prove they are right.

You say there is little impetus to change or develop new implementations,
and yet the very solutions being proposed, e.g., by Peter, would
force reencoding of all the Biblical Hebrew text to work at all,
and would, ipso facto, require new implementations and new fonts
to work right.

The alternative I suggested, of agreeing on a text representational
convention of <vowel, ZWJ, vowel> for those instances of sequences
which should not reorder could be implemented *now* with
existing characters, and only minor extensions to the fonts and
to keyboard methods. Any existing corpus could be updated
en masse (and more easily than switching over to Peter's scheme),
or incrementally, as appropriate.

The other alternative that some seem to prefer: just change the
combining classes and be done with it -- is *not* going to
happen. It would fly in the face of politically committed
stability guarantees by the UTC and required by the IETF and
W3C. An inconvenience for Biblical Hebrew implementations is
not going to outweigh that, for any of the committees involved.
And even, if by some miracle, it *were* to happen, you would
also be awaiting the rollout of new implementations, since
you'd have to wait through the chaotic transition while everyone
updated their normalization algorithms.

Just picking up the marbles and going home isn't an option,
either. As you indicate, "so long as Unicode canonical ordering 
is ignored" the existing layout technologies work just fine.
So address the problem with an appropriate fix. Insert a
ZWJ (for instance) at the point where the canonical reordering
needs to be blocked on a vowel sequence, and you are then in
a situation where even though you are not ignoring canonical
ordering (which in distributed systems you cannot), you
end up preserving the order you need, anyway.

> I've spent nine months working on Biblical Hebrew rendering for the major 
> user community (the Society of Biblical Literature and their Font 
> Foundation partners), and their take on this is that a) they want a 
> solution that works with today's technology, and b) they will avoid Unicode 
> canonical ordering like the plague and use custom normalisations instead. 

And how is implementing a custom normalization not a matter of
"developing a new implementation"? It doesn't even begin to
deal with the problem of what happens if the text "escapes" out
into the Internet context, which won't be using the same
custom normalization.

Implementing a "custom" text representational convention seems
like a much more straightforward task to me.

> When we conducted normalisation tests, switching from Unicode normalisation 
> of  to a custom normalisation that does not re-order vowels or meteg*, we 
> increased the number of unique consonant + mark(s) sequences encoded in the 
> Old Testament text by more 340. This means that Unicode normalisation was 
> creating 340 textual ambiguities by treating lexically distinct sequences 
> as canonically equivalent. I don't think that kind of textual ambiguity is 
> 'overblown'.

Introduce a canonical reordering blocker (cc=0) into the textual
sequences which get ordered in ways that lead to textual ambiguities,
and the textual ambiguities should go away.

> 
> * Meteg re-ordering is in some respects even more problematic than 
> multi-vowel re-ordering; certainly it is a more common problem. The meteg 
> can occur to the left or right of a vowel (sometimes the distinction is the 
> result of editorial intervention (see Kittel's original Biblia Hebraice 
> edition), left, right and hataf-itermediary meteg positioning are all found 
> in the ben Asher manuscripts). Unicode canonical ordering treats meteg as a 
> fixed position mark with a combining class higher than vowels, which 
> suggests that it always appears in the same position relative to vowels. 
> This is incorrect.

This particular case might be amenable to the cloning of a Biblical
meteg of different behavior than the existing meteg, or possibly
something along the lines I have suggested above for the vowel
ordering distinctions.

If, however, you wait for a cloned meteg, then solutions await
Unicode 4.1 (or Unicode 5.0), and any application will certainly
be requiring the "development of a new implementation", since they
are going to have to await the gradual rollout of generalized
support for the new repertoire. In any case, any such approach
requires reencoding of existing text and establishment of
new text representational conventions. Why not seek a solution
which can make appropriate distinctions using the existing
repertoire, as well as the existing tools and implementations?

--Ken

Re: Major Defect in Combining Classes of Tibetan Vowels

Reply via email to