John Hudson wrote: > This idea of Hebrew vowels as 'fixed' marks is problematical, because in > Biblical Hebrew they are not fixed: they move relative to additional marks > (other vowels or cantillation marks). > > >It may be more *difficult* for applications to do correct rendering, > >but there was never any intention in the standard that I know > >of that a sequence <hiriq, patah> would render differently > >than a sequence <patah, hiriq>. > > Yes, this is what I am saying is wrong: <hiriq, patah> *should* render > differently from <patah, hiriq>. This example is particularly important, > because it occurs in the spelling of yerushalaim, the Masoretic > approximation of yerushalayim. Correct rendering requires that the hiriq > follows the patah, and not vice versa.
Understood. See my separate response on the Biblical Hebrew thread. > They are necessary to render Biblical Hebrew text correctly using current > font and layout engine technologies. These technologies work perfectly for > Biblical Hebrew so long as Unicode canonical ordering is ignored. I think > there is very little impetus to change or develop new implementations to > take into account what strikes most of those involved with Biblical Hebrew > text processing as an error in Unicode. "so long as Unicode canonical ordering is ignored". But as you and Peter point out, you cannot actually ignore canonical ordering, since in the Internet context it is outside of the end user's control. Once text escapes your own system for interchange, it may be subject to normalization, and you are kaputt. As stated, this is also turning into a typical--dare I say, religious-- confrontation of "I'm right and you're wrong" with no compromise in prospect and people getting ready to shoot themselves in the foot to prove they are right. You say there is little impetus to change or develop new implementations, and yet the very solutions being proposed, e.g., by Peter, would force reencoding of all the Biblical Hebrew text to work at all, and would, ipso facto, require new implementations and new fonts to work right. The alternative I suggested, of agreeing on a text representational convention of <vowel, ZWJ, vowel> for those instances of sequences which should not reorder could be implemented *now* with existing characters, and only minor extensions to the fonts and to keyboard methods. Any existing corpus could be updated en masse (and more easily than switching over to Peter's scheme), or incrementally, as appropriate. The other alternative that some seem to prefer: just change the combining classes and be done with it -- is *not* going to happen. It would fly in the face of politically committed stability guarantees by the UTC and required by the IETF and W3C. An inconvenience for Biblical Hebrew implementations is not going to outweigh that, for any of the committees involved. And even, if by some miracle, it *were* to happen, you would also be awaiting the rollout of new implementations, since you'd have to wait through the chaotic transition while everyone updated their normalization algorithms. Just picking up the marbles and going home isn't an option, either. As you indicate, "so long as Unicode canonical ordering is ignored" the existing layout technologies work just fine. So address the problem with an appropriate fix. Insert a ZWJ (for instance) at the point where the canonical reordering needs to be blocked on a vowel sequence, and you are then in a situation where even though you are not ignoring canonical ordering (which in distributed systems you cannot), you end up preserving the order you need, anyway. > I've spent nine months working on Biblical Hebrew rendering for the major > user community (the Society of Biblical Literature and their Font > Foundation partners), and their take on this is that a) they want a > solution that works with today's technology, and b) they will avoid Unicode > canonical ordering like the plague and use custom normalisations instead. And how is implementing a custom normalization not a matter of "developing a new implementation"? It doesn't even begin to deal with the problem of what happens if the text "escapes" out into the Internet context, which won't be using the same custom normalization. Implementing a "custom" text representational convention seems like a much more straightforward task to me. > When we conducted normalisation tests, switching from Unicode normalisation > of to a custom normalisation that does not re-order vowels or meteg*, we > increased the number of unique consonant + mark(s) sequences encoded in the > Old Testament text by more 340. This means that Unicode normalisation was > creating 340 textual ambiguities by treating lexically distinct sequences > as canonically equivalent. I don't think that kind of textual ambiguity is > 'overblown'. Introduce a canonical reordering blocker (cc=0) into the textual sequences which get ordered in ways that lead to textual ambiguities, and the textual ambiguities should go away. > > * Meteg re-ordering is in some respects even more problematic than > multi-vowel re-ordering; certainly it is a more common problem. The meteg > can occur to the left or right of a vowel (sometimes the distinction is the > result of editorial intervention (see Kittel's original Biblia Hebraice > edition), left, right and hataf-itermediary meteg positioning are all found > in the ben Asher manuscripts). Unicode canonical ordering treats meteg as a > fixed position mark with a combining class higher than vowels, which > suggests that it always appears in the same position relative to vowels. > This is incorrect. This particular case might be amenable to the cloning of a Biblical meteg of different behavior than the existing meteg, or possibly something along the lines I have suggested above for the vowel ordering distinctions. If, however, you wait for a cloned meteg, then solutions await Unicode 4.1 (or Unicode 5.0), and any application will certainly be requiring the "development of a new implementation", since they are going to have to await the gradual rollout of generalized support for the new repertoire. In any case, any such approach requires reencoding of existing text and establishment of new text representational conventions. Why not seek a solution which can make appropriate distinctions using the existing repertoire, as well as the existing tools and implementations? --Ken