Peter replied to Karljürgen: > Karljürgen Feuerherm wrote on 06/25/2003 08:31:41 PM: > > > I was going to suggest something very similar, a ZW-pseudo-consonant of > some > > kind, which would force each vowel to be associated with one consonant. > > An invisible *consonant* doesn't make sense because the problem involves > more than just multiple written vowels on one consonant;
I agree that we don't want to go inventing invisible consonants for this. BTW, there's already an invisible vowel (in fact a pair of them) that is unwanted by the stakeholders of the script it was originally invented for: U+17B4 KHMER VOWEL INHERENT AQ This is also (cc=0), so would serve to block canonical reordering if placed between two Hebrew vowel points. But I'm sure that if Peter thought the suggestion of the ZWJ for this was a "groanable kludge", Biblical Hebraicists would probably not take lightly to the importation of an invisible Khmer character into their text representations. ;-) > in fact, that is > a small portion of the general problem. If we want such a character, it > would notionally be a zero-width-canonical-ordering-inhibiter, and nothing > more. The fact is that any of the zero-width format controls has the side-effect of inhibiting (or rather interrupting) canonical reordering if inserted in the middle of a target sequence, because of their own class (cc=0). I'm not particularly campaigning for ZWJ, by the way. ZWNJ or even U+FEFF ZWNBSP would accomplish the same. I just suggested ZWJ because it seemed in the ballpark. ZWNBSP would likely have fewer possible other consequences, since notionally it means just "don't break here", which you wouldn't do in the middle of a Hebrew combining character sequence, anyway. > And I don't particular want to think about what happens when people start > sticking this thing into sequences other than Biblical Hebrew ("in > unicode, any sequence is legal"). But don't forget that these cc=0 zero width format controls already can be stuck into sequences other than Biblical Hebrew. In some instances they have defined semantics there (as for Arabic and Indic scripts), but in all cases they would *already* have the effect of interrupting canonical reordering of combining character sequences if inserted there. --Ken