On 18/07/2004 12:51, Michael Everson wrote:
At 13:00 +0300 2004-07-18, Jony Rosenne wrote:
> Jony is arguing to extend AccentFolding to Hebrew (fold to
unpointed). His suggestion is to fold *all* combining marks used with Hebrew in that case. I want to double check that he really means all combining marks in the
> Hebrew block, or just some of them.
I did mean all. All points and cantillation marks in Hebrew are optional.
In the Hebrew language, perhaps. But in other languages, like Yiddish, which use the Hebrew script, at least some points are NOT optional, and "dropping" them causes textual corruption and loss of data.
The same is of course true of accent removal in Latin script, in many European languages. The general accent folding, like DUCET, has to make the best compromise between preferred usage in the most widely used languages; or it can be tailored to the needs of specific languages. Indeed in some sense every folding involves loss of data; that is the nature of a folding. That doesn't stop generic accent removal being a useful folding, in Latin and Hebrew scripts.
The question in one sense is whether accent and diacritic folding is a graphical process or a logical one. If it is a logical process, it has to take into account all sorts of potentially language-specific variables such as the phonetic function of each combining mark. But it makes more sense, within the scope of Unicode folding, for it to be specified as a graphical process, the removal of auxiliary glyphs and glyph modifiers from base characters without regard for their phonetic effect or their status within the orthography of particular languages.
Anyway, is Yiddish in fact never written completely unpointed? That would surprise me.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

