Re: [hebrew] Re: Hebrew Issues

John Hudson Sun, 24 Aug 2003 18:46:23 +0000

[Bcc'd to the SBL BibLit project discussion list.]

At 03:13 PM 8/23/2003, Peter Kirk wrote:

2.2 Holam Alef

...

Although the rules concerning this case are fairly straightforward, the rendering engine should not need to know so much grammar.

I'm a little surprised, Jony, that you came to this conclusion. It seems to me that this one is a rendering issue. You have argued before that in most typesetting this shift is not made. It has been demonstrated (in Ezra SIL and SBL Hebrew with Uniscribe) that it is feasible for a rendering engine to implement these rules, in the cases where this shift is required for high quality e.g. biblical publications. The biblical text already contains sufficient information to guide the rendering engine, except possibly for a few special cases, and in the spirit of "thou shalt not add thereto" I prefer not to do so when, as here, it is not absolutely necessary.

I agree with Peter, it is not a problem for the rendering (in this case font lookups) to handle this holam repositioning contextually.

A possible solution is to use ZWJ to indicate the shifting of the Holam forward. For example, Bet Dagesh Holam ZWJ Alef.
Agreed, if a mechanism is required. My preference is to use this encoding only for special cases where the shift takes place as an exception to the regular rules, and to use ZWNJ instead of ZWJ to inhibit such shifting in cases where it is not required.

Again, I agree:

<bet, dagesh, holam, alef> = holam repositioned on alef

<bet, dagesh, holam, ZWNJ, alef> = holam retained on bet

By the way, your example is not in canonical order (although it is in logical order, see my comments on 2.8 below), and will be reordered to <bet, holam, dagesh, ZWJ, alef>.

Thankfully, this is one of the mark reordering cases that the font lookups can handle: we just need to make sure that the context is large enough for other marks to fall between the holam and the alef. However, this does raise the question of what happens to the ZWNJ in reordering

<bet, dagesh, holam, ZWNJ, alef>

If the holam ends up reordered before the dagesh, where does the ZWNJ end up? If it remains immediately in front of the alef, that's fine.

For simpler cases, such as Yerushala(y)im, a zero width invisible base character could be used. Various possibilities had been discussed. CGJ is not appropriate because it is not a base character. ZWNBSP would have been suitable, except that it has been taken over by the BOM.
I fail to see a good reason not to use CGJ in such a case. The Unicode distinction between a base character and a combining character is a technical one which does not need to align perfectly with every user's perceptions.

I agree. I understand the logic in inserting an invisible base character in a place where readers 'know' there is a missing consonant, but the consonant *is* missing, it is not there and should not be there. CGJ works fine in this instance, because the only important thing to do is to make sure that the two vowels are not reordered.

The medial Meteg in the Hataf vowels could be a rendering issue, a combining marks ligature. However, in this case we would need a CGNJ when a left Meteg is needed together with a Hataf.
In the absence of a CGNJ, and since CGJ does not have defined joining properties despite its misleading name, I have suggested using CGJ for this.

Since actual glyph ligation is occuring, the ZWNJ should be used to inhibit ligation. This is consistent with the Unicode 4.0 description of ZWJ and ZWNJ behaviour. A question remains, however: should medial meteg with hataf be the default rendering of <hataf..., meteg>, or should such ligation require <hataf..., ZWJ, meteg>? This is a rendering issue, but one which affects encoding: if one set of fonts treats ligation as default and another set doesn't, users will produce documents with conflicting encoding conventions depending on the rendering of the fonts they are using (one can even imagine a single document, set in multiple fonts, using different character sequences to obtain the same rendering). Personally, I favour having the medial meteg as default rendering for <hataf..., meteg>, requiring <hataf..., ZWNJ, meteg> in order to obtain a left meteg, because the medial meteg appears to be the most common positioning in the manuscript tradition.

For the right Meteg, a new character is needed.

...

But I disagree that a new character is needed. This is essentially an alternative positioning of the same combining character relative to other combining characters with which it interferes typographically. This should have been dealt with by appropriate allocation of combining classes. As it was not, the appropriate mechanism seems to be to use CGJ to inhibit canonical reordering. Thus my suggestion (= indicates canonical equivalence):
left meteg (non-hataf vowel): <vowel, meteg> = <meteg, vowel>
right meteg: <meteg, CGJ, vowel>
medial meteg (hataf vowel): <vowel, meteg> = <meteg, vowel>
left meteg (hataf vowel): <vowel, CGJ, meteg>

I basically agree, with the following modification:

left meteg (hataf vowel): <vowel, ZWNJ, meteg>

Does this mean that we are agreed that the medial meteg rendering should be normative?

2.9 Inverted Nun

In the Bible there are a few cases of a special mark known as "Inverted Nun". It is probably not an inverted letter Nun, and requires its own character, HEBREW MARK INVERTED NUN.
Agreed.

Agreed. Who wants to write the proposal? I have some good graphics showing various manuscript forms of this letter, clearly distinguished in form from the nun.

2.10 Extraordinary Points

The SII encoded only the upper extraordinary point, as 05C4 HEBREW MARK UPPER DOT. A character for the lower dot could be added, although it appears only a few times.
Agreed. Although this latter character is rare, it is in regular and undisputed use in a widely used text, and so probably does need to be encoded.

I am content either to have the lower punctum encoded or to use a generic combining mark (U+0323), although the latter raises issues for multiscript fonts in applications that do not support writing system-specific glyph substitution (currently all applications). What I am most keen to have is a clear statement from the UTC identifying 05C4 HEBREW MARK UPPER DOT as the upper punctum, as Jony indicates was intended by SII, and specifying a codepoint for the Hebrew number / masoretic note dot, which requires its own glyph and cannot be harmonised with the upper punctum character. Again, this could mean a new Hebrew block character or U+0307 could be used.

Note that until Jony's note on SII's intent, I had presumed U+05C4 to be the number / masoretic note dot, because of the absence of a corresponding lower mark to indicate that it was the upper punctum. Now I would like a definitive ruling from the UTC, to avoid future confusion.

2.12 Number Dots

An old practice was to use dots and double dots above to distinguish "non words", such as numbers and acronyms. For several centuries this usage has been replaced by the use of Geresh and Gershayim.

The dots always appear on unpointed texts. There is nothing special about them, so the existing Unicodes 0307 and 0308 could be used.
Agreed.

Okay, that's fine with me, but I'd still like to see a note in the standard re. U+05C4.

John Hudson

Tiro Typeworks          www.tiro.com
Vancouver, BC           [EMAIL PROTECTED]

The sight of James Cox from the BBC's World at One,
interviewing Robin Oakley, CNN's man in Europe,
surrounded by a scrum of furiously scribbling print
journalists will stand for some time as the apogee of
media cannibalism.
                        - Emma Brockes, at the EU summit

Re: [hebrew] Re: Hebrew Issues

Reply via email to