On 30/07/2003 09:25, Ted Hopp wrote:

On Wednesday, July 30, 2003 8:21 AM, Peter Kirk wrote:


... The vowel form,
Ted's holam male, is encoded as holam followed by vav, and the consonant
vav with holam is encoded simply as that.



Encoding 05B9 before the vav to create a kholam male can be a complicated business. Consider the (non-authentic) spelling used in the hugely popular "501 Hebrew Verbs" by Shmuel Bolozky (Barron's), where vowels and ketiv male (plene spelling) are mixed. (This is frequently done for pedagogical applications.) A particularly striking word is borrowers (f): <lamed-kholam male-vav-kholam male-tav>. Under the proposal, that would be encoded [05DC.05B9.05D5.05D5.05B9.05D5.05EA] -- somewhat difficult to parse, if you ask me. ...

This is complicated, but not actually ambiguous. To simplify, let's use the CCAT encoding in which this would be written LOWWOWT. By the algorithm used in Ezra SIL and in SBL Hebrew, each O before a W is shifted from the left of the preceding consonant to the right of the W, i.e. treated as holam male, as long as the W has no (other) vowel. This rule applied to both of these O's so this will be rendered correctly. Test - view with Ezra SIL or SBL Hebrew (there is a known bug with the latest beta version of the latter):

לֹווֹות
Result: nearly right in Ezra SIL, but the second holam has not shifted on to the following vav. Maybe shift from vav to vav is disabled for some reason. SBL Hebrew has the same problem, also it fails to distinguish the two positions of vav (known bug).


... There will also be a bad ambiguity for the present, female, plural
of borrow: <lamed-kholam male-vav-kholam chaser-tav>. The resulting encoding
under the proposal is [05DC.05B9.05D5.05D5.05B9.05EA]. This could also be
interpreted <lamed-kholam chaser-vav-vav-cholam khaser-tav> (with the
reasonable but incorrect interpretation that the double-vav is to indicate a
consonantal vav, ...

This also comes out correctly. We have LOWWOT. The first O shifts to make holam male. The second one does not as O does not shift on to T. So we have the two different positionings of holam on vav next to one another, something which by the way never happens in the Hebrew Bible. Test:

לֹווֹת

Result: exactly right in Ezra SIL, SBL Hebrew fails to distinguish the two positions of vav (known bug).

I suppose an alternative form which might appear would be LOWOWT, with the first vowel holam haser and the second holam male. In this case the first O would stay with the L as the following W has an O, but the second O would shift to the top right of the second W. Test:

לֹוֹות

Result: again exactly right in Ezra SIL and in SBL Hebrew.

Then how would Jony Rosenne's preferred encoding fare here? He would encode the former LWOWOT. After the L, my suggested (unimplemented, so I can't test it) algorithm to distinguish expects a vowel and so interprets WO as holam male, and after holam male it expects a consonant and so interprets the next WO as vav plus holam. Correct. The second form he would encode as LOWWOT, with holam haser first. No problem with that. Then vav on its own, a consonant so expecting a vowel to follow. So the following holam vav is interpreted as holam male. Correct.

... analogous to the the past tense, female, second person of
borrow: <lamed-qamats-vav-vav-qamats-he>.).

To me as a reader of biblical Hebrew, this form looks like an error. I would expect either sheva under the first vav, or the two vavs to be combined into one with dagesh. Nowhere in the Bible do two consonantal vavs occur together, without a full vowel between them.


How would one interpret: [05E7.05B9.05D5.05B9.05D5]? This is how the
proposed scheme would encode a word that appears in Brown-Driver-Biggs under
entry I for kavah (qof-qamats, vav-qamats, he). (It should be interpreted
<qof-kholam khaser-vav-kholam male>. How'd you do?)


QOWOW. First W is followed by O, so first O doesn't shift and W is taken as a consonant. Second W is not followed by a vowel so second O shifts, holam male. Yes, I think it's right. Test:

קֹוֹו

Result: correct in Ezra SIL and in SBL Hebrew.

Jony would encode QOWWO. That would also come out correct.

It seems to me that it will be difficult-to-impossible to develop a parsing
algorithm for this kind of thing, ...

I think we need to congratulate Joan, John H, and those who worked with them for successfully doing the impossible. It works now, Ted. Well, very nearly. The small problems I identified are easily fixable. The version of the algorithm which works with Jony's encoding is less simple so I am not yet sure if it is possible.

... even without considering things like
transliterations and other irregular applications. Combining characters
should follow their base characters. We just have to live without the kholam
male for now (or create it using "markup", which can apparently solve all
problems).

Actually "markup" solves no problems at all, it just passes the buck and reinforces the impression many already have that Unicode is a waste of time because it can't do what they need.

But why live without the holam male? After all, if it is a separate form in Hebrew (and we have established, I think, that it has been for 1000 years), and since you don't like the way which some have used to encode it, why not add it to Unicode as a separate new character? After all, if the French had found that one of their accented characters was not in Unicode, I don't think they would have said that they could live without it or use markup. They would have fought tooth and nail to get it added to the standard. Why don't you suggest that? That's not a breach of the stability policy. (Maybe the preferred addition would be a new combining mark, right holam, rather than a new precomposed character, but that is a detail.)

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/





Reply via email to