> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Jony Rosenne
> > But it *is* a > > piece of text, however > > malformed it might seem from normal lexicographic > > understanding. It may not be a word. It > > may, in fact, be two words merged into a unit. But it is most > > certainly text. > > Sure it is text, but it is not plain text. > > Qere and Ketiv are not malformed. I don't think anyone disagrees that they > are the juxtaposition of the letters of one word with the vowel points of > another. > > That most cases can be visibly reproduced by Unicode is a hack... Jony, where you and I have had a different worldview is that, it seems to me, you view characters as encoding language, and I view characters as encoding letterforms; or, put another way, for you, text is necessarily linguistic, whereas for me text is text, independent of linguistic interpretation. To make this concrete, the fact that a qere sequence involves the vowel points of word A rather than word B is linguistically interesting, but irrelevant as far as encoding is concerned. If the displayed letterforms consist of a lamed with two vowel points, then the encoded character sequence IMO should be lamed with two vowel points -- and I would not consider that a hack. > and is not a > sufficient justification to extend Unicode to support cases that cannot be > reproduced. > > There is the case of Yerushala(y)im, for which the plain text hack would > require an invisible RTL letter to represent the omitted Yod, or to allow > pointing an RLM. The CGJ hack may work too but it is based on a > misunderstanding, as if the Lamed has two vowels. The only hackish thing about needing CGJ is that the combining classes for vowel points that occupy the same space relative to a base should never have been different from one another, but since we cannot revise that detail, we need to come up with another mechanism to deal with it. I agree that using CGJ is a hack, but not because the text involves one base letterform with two combining vowel points. > > But I'm now, as always, happy to hear alternate suggestions > > as to how things might be > > handled in either encoding or display. So if you think merged > > Ketiv/Qere forms should be > > handled by markup, perhaps you can explain how, so that I > > might better understand. Thank you. > > This is the Unicode list, not the markup - SGML etc. list. And I do not know > too much about markup. It's not a list dedicated to discussion of markup, but if people contend that a solution to a problem lies in something other than plain text, then it is germane to this list to have that alternative solution elaborated. Peter Constable