On 29/10/2003 10:26, Philippe Verdy wrote:
...
The problem I see here is that ZWJ is not intended to create ligatures
between diacritics, only between clusters that would otherwise still be a
single combining sequence.
Normally CGJ would have fitted better there, but this conflicts with the
intent to address the canonical combining order with CGJ.
In the sil.org proposal, ...
I'm not sure if this is a good way to describe this proposal. It was a
joint proposal from several people, including Peter Constable who was
then with SIL International, and was put on the SIL web site for
convenience. Peter and others of the original proposers have had second
thoughts about it. Don't assume that it represents the current thinking
of SIL or of anyone else.
... the medial meteg is missing, but not the right and
left meteg, as they are encoded within the same class and their order is
preserved when attached to a vowel.
Logically, ...
Graphically, yes. In origin, probably. But I know that Unicode has
rejected any notion of these hataf vowels being combinations, even at
the compatibility level, between sheva and another vowel.
... the hataf vowel is made of two parts: hataf and a second vowel,
and the medial meteg is put in the middle. There are two solutions:
- either encode <hataf, meteg, second part of the vowel> with the sil.org
proposed new biblical vowels, which all belong to the same class
In view of the above this would probably not be acceptable.
- or add a medial meteg that combines and modifies the hataf vowel, and will
be normally coded after that vowel.
Or use some kind of character, existing or new, to promote ligation
between the vowel and the meteg.
As the sil.org proposal also keep cantillation marks in the same combining
class 220 as vowels and meteg, the order will be significant, as the medial
meteg must combine with the hataf vowel, not with the cantillation mark.
Requests to ligate meteg with any other mark would simply be ignored, in
the same way as ZWJ is ignored when between base characters that cannot
be ligated. While it is obviously not good to sprinkle text with
superfluous ligation marks, CGJs etc, these need not be made illegal or
automatically removed any more than it is neccessary to remove
superfluous ZWJs between characters which cannot be ligated. As the
characters are default ignorable, applications and renderers should
simply ignore them when they are superfluous - but they should not
delete them as no one application can be sure which character pairs
actually have ligatures in any particular font.
This is not a problem because other cantillation marks that combine below
are not separated in two halves like hataf vowels. This means that no medial
meteg could occur within a cantillation mark and so only a normal meteg
could eventually occur; but this causes a rendering problem if this is not
normalized directly on input, as no NF form will not reorder them.
So the sil.org proposal PDF leaves open the choice of the combining class to
use for the new vowels and meteg that combine below, and they could be given
class 28 as well, allowing the new meteg to be reordered before cantillation
marks.
So I do think that the new vowels and meteg proposed by sil.org should not
be given the same class 220 as cantillation marks that should be reordered
after all vowels and meteg, and that a class 28 for them would be
preferable, unless there is some proof that vowels or meteg can follow
cantillation marks (meaning that there would be a second logical vowel group
on the same consonnant, and in that case we still have a problem because not
all cantillation marks share the same class 220).
But there is such proof. A few days ago I posted some text describing
how meteg and certain other low accents (class 220 ones, fortunately)
occur together and in both orders. Also there are cases of
vowel-accent-vowel in that order below a single base character, see
http://www.qaya.org/academic/hebrew/Issues-Hebrew-Unicode.html section
3.2. So the best thing is to put every low mark into class 220, except
for the two (dehi and yetiv) which are always positioned to the right.
Here again the sil.org proposal does not solve all, and there persists the
need to encode a ignorable control with class 0 to separate two vowel groups
applied to the same consonnant group (I all a "consonnant group" the Unicode
sequence made of: a single base consonnant letter, with a optional sin/shin
dot above right or left, and a optional dagesh/rafe/varika point inside or
centered above).
That's why the choice between 220 and 28 classes in the sil.org proposal is
not important.
As I see it, the "sil.org proposal" with class 220 for all low marks
except dehi and yetiv does successfully do away with the need for a CGJ
type control. The objections to this proposal are of a quite different type.
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/