On Wednesday, July 30, 2003 8:21 AM, Peter Kirk wrote:This is complicated, but not actually ambiguous. To simplify, let's use the CCAT encoding in which this would be written LOWWOWT. By the algorithm used in Ezra SIL and in SBL Hebrew, each O before a W is shifted from the left of the preceding consonant to the right of the W, i.e. treated as holam male, as long as the W has no (other) vowel. This rule applied to both of these O's so this will be rendered correctly. Test - view with Ezra SIL or SBL Hebrew (there is a known bug with the latest beta version of the latter):
... The vowel form,
Ted's holam male, is encoded as holam followed by vav, and the consonant
vav with holam is encoded simply as that.
Encoding 05B9 before the vav to create a kholam male can be a complicated business. Consider the (non-authentic) spelling used in the hugely popular "501 Hebrew Verbs" by Shmuel Bolozky (Barron's), where vowels and ketiv male (plene spelling) are mixed. (This is frequently done for pedagogical applications.) A particularly striking word is borrowers (f): <lamed-kholam male-vav-kholam male-tav>. Under the proposal, that would be encoded [05DC.05B9.05D5.05D5.05B9.05D5.05EA] -- somewhat difficult to parse, if you ask me. ...
לֹווֹות
Result: nearly right in Ezra SIL, but the second holam has not shifted on to the following vav. Maybe shift from vav to vav is disabled for some reason. SBL Hebrew has the same problem, also it fails to distinguish the two positions of vav (known bug).
This also comes out correctly. We have LOWWOT. The first O shifts to make holam male. The second one does not as O does not shift on to T. So we have the two different positionings of holam on vav next to one another, something which by the way never happens in the Hebrew Bible. Test:... There will also be a bad ambiguity for the present, female, plural of borrow: <lamed-kholam male-vav-kholam chaser-tav>. The resulting encoding under the proposal is [05DC.05B9.05D5.05D5.05B9.05EA]. This could also be interpreted <lamed-kholam chaser-vav-vav-cholam khaser-tav> (with the reasonable but incorrect interpretation that the double-vav is to indicate a consonantal vav, ...
לֹווֹת
Result: exactly right in Ezra SIL, SBL Hebrew fails to distinguish the two positions of vav (known bug).
I suppose an alternative form which might appear would be LOWOWT, with the first vowel holam haser and the second holam male. In this case the first O would stay with the L as the following W has an O, but the second O would shift to the top right of the second W. Test:
לֹוֹות
Result: again exactly right in Ezra SIL and in SBL Hebrew.
Then how would Jony Rosenne's preferred encoding fare here? He would encode the former LWOWOT. After the L, my suggested (unimplemented, so I can't test it) algorithm to distinguish expects a vowel and so interprets WO as holam male, and after holam male it expects a consonant and so interprets the next WO as vav plus holam. Correct. The second form he would encode as LOWWOT, with holam haser first. No problem with that. Then vav on its own, a consonant so expecting a vowel to follow. So the following holam vav is interpreted as holam male. Correct.
To me as a reader of biblical Hebrew, this form looks like an error. I would expect either sheva under the first vav, or the two vavs to be combined into one with dagesh. Nowhere in the Bible do two consonantal vavs occur together, without a full vowel between them.... analogous to the the past tense, female, second person of borrow: <lamed-qamats-vav-vav-qamats-he>.).
QOWOW. First W is followed by O, so first O doesn't shift and W is taken as a consonant. Second W is not followed by a vowel so second O shifts, holam male. Yes, I think it's right. Test:
How would one interpret: [05E7.05B9.05D5.05B9.05D5]? This is how the
proposed scheme would encode a word that appears in Brown-Driver-Biggs under
entry I for kavah (qof-qamats, vav-qamats, he). (It should be interpreted
<qof-kholam khaser-vav-kholam male>. How'd you do?)
קֹוֹו
Result: correct in Ezra SIL and in SBL Hebrew.
Jony would encode QOWWO. That would also come out correct.
I think we need to congratulate Joan, John H, and those who worked with them for successfully doing the impossible. It works now, Ted. Well, very nearly. The small problems I identified are easily fixable. The version of the algorithm which works with Jony's encoding is less simple so I am not yet sure if it is possible.It seems to me that it will be difficult-to-impossible to develop a parsing algorithm for this kind of thing, ...
Actually "markup" solves no problems at all, it just passes the buck and reinforces the impression many already have that Unicode is a waste of time because it can't do what they need.... even without considering things like transliterations and other irregular applications. Combining characters should follow their base characters. We just have to live without the kholam male for now (or create it using "markup", which can apparently solve all problems).
But why live without the holam male? After all, if it is a separate form in Hebrew (and we have established, I think, that it has been for 1000 years), and since you don't like the way which some have used to encode it, why not add it to Unicode as a separate new character? After all, if the French had found that one of their accented characters was not in Unicode, I don't think they would have said that they could live without it or use markup. They would have fought tooth and nail to get it added to the standard. Why don't you suggest that? That's not a breach of the stability policy. (Maybe the preferred addition would be a new combining mark, right holam, rather than a new precomposed character, but that is a detail.)
-- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/