RE: Folding algorithm and canonical equivalence

2004-07-20 Thread E. Keown
Elaine Keown Tucson Hi, Asmus wrote: >Only very few foldings make sense to apply on a >permanent basis. Think of casefolding for example. >Such a folding is mostly useful for searches, where >it is applied *transiently*. Is it possible that Hebrew script needs more than one

RE: Folding algorithm and canonical equivalence

2004-07-20 Thread Jony Rosenne
; [mailto:[EMAIL PROTECTED] On Behalf Of Peter Kirk > Sent: Monday, July 19, 2004 8:53 PM > To: Mark E. Shoulson > Cc: Jony Rosenne; 'Unicode List' > Subject: Re: Folding algorithm and canonical equivalence > > > On 19/07/2004 03:20, Mark E. Shoulson wrote: > &

Re: Folding algorithm and canonical equivalence

2004-07-20 Thread Mark E. Shoulson
Peter Kirk wrote: On 19/07/2004 03:20, Mark E. Shoulson wrote: ... Jony's right: when it's down to brass tacks in Hebrew, it's consonants and whitespace (and punctuation, I guess). Agreed. But then there are a few characters which are not combining marks but which are really part of the accent s

Re: Folding algorithm and canonical equivalence

2004-07-20 Thread Simon Montagu
Mark E. Shoulson wrote: Even so, there's probably some language out there that requires some diacritics left in place on Hebrew letters (I don't know much about other languages written in Hebrew letters; Elain Keown knows that better). I have printed texts in Ladino and Arabic in Hebrew script w

Re: Folding algorithm and canonical equivalence

2004-07-19 Thread E. Keown
Elaine Keown Tucson Dear Mark and List: I have even less of an idea than usual what on earth you are all talking about, but Today I am working on the 6th set of Hebrew diacritics. They are called 'Palestinian' and are found exclusively in the Cairo Genizah material. The 'Cai

Re: Back to the subject: Folding algorithm and canonical equivalence

2004-07-19 Thread Peter Kirk
On 19/07/2004 23:23, Asmus Freytag wrote: At 01:56 PM 7/19/2004, Mark Davis wrote: You did point out an oversight; Asmus and I have been working on the issue. ‎Mark As Mark wrote, your point is taken and we've taken that onboard. However, we won't try to *edit* text on the list, that's why we

Re: Back to the subject: Folding algorithm and canonical equivalence

2004-07-19 Thread Asmus Freytag
At 01:56 PM 7/19/2004, Mark Davis wrote: You did point out an oversight; Asmus and I have been working on the issue. ‎Mark As Mark wrote, your point is taken and we've taken that onboard. However, we won't try to *edit* text on the list, that's why we are not engaging in a long discussion on the

Re: Back to the subject: Folding algorithm and canonical equivalence

2004-07-19 Thread Mark Davis
You did point out an oversight; Asmus and I have been working on the issue. âMark - Original Message - From: "Peter Kirk" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Monday, July 19, 2004 13:21 Subject: Back to the subject:

Back to the subject: Folding algorithm and canonical equivalence

2004-07-19 Thread Peter Kirk
There has been extensive discussion in this thread on the specifics of accent and diacritic folding. But no one has answered my point, repeated below, that there seems to be a conflict between the folding algorithm (rather than the details of specific foldings) and the principle of canonical eq

RE: Folding algorithm and canonical equivalence

2004-07-19 Thread Jony Rosenne
ECTED] On Behalf Of Peter Kirk > Sent: Monday, July 19, 2004 8:53 PM > To: Mark E. Shoulson > Cc: Jony Rosenne; 'Unicode List' > Subject: Re: Folding algorithm and canonical equivalence > > > On 19/07/2004 03:20, Mark E. Shoulson wrote: > > > ... > > &g

Re: Folding algorithm and canonical equivalence

2004-07-19 Thread Asmus Freytag
At 02:38 AM 7/19/2004, Michael Everson wrote: At 22:25 -0400 2004-07-18, Mark E. Shoulson wrote: Though for all that, a lot of Yiddish I've seen is also written without vowel-points. So the patah-alef and qamats-alef vowels, and the yod-yod-patah vs. yod yod diphthongs, must be distinguished fro

Re: Folding algorithm and canonical equivalence

2004-07-19 Thread Peter Kirk
On 19/07/2004 03:20, Mark E. Shoulson wrote: ... Jony's right: when it's down to brass tacks in Hebrew, it's consonants and whitespace (and punctuation, I guess). Agreed. But then there are a few characters which are not combining marks but which are really part of the accent system and so shoul

Re: Folding algorithm and canonical equivalence

2004-07-19 Thread Michael Everson
At 22:25 -0400 2004-07-18, Mark E. Shoulson wrote: Though for all that, a lot of Yiddish I've seen is also written without vowel-points. So the patah-alef and qamats-alef vowels, and the yod-yod-patah vs. yod yod diphthongs, must be distinguished from context, like everything else. For much of

RE: Folding algorithm and canonical equivalence

2004-07-18 Thread Asmus Freytag
At 07:53 PM 7/18/2004, Jony Rosenne wrote: By this logic, I cannot see why you lump Latin/Greek/Cyrillic together. Latin/Greek/Cyrillic share the fact that for searches you may want to remove accents, but, except for very unusual circumstances, it's not a good idea to transform text permanently.

RE: Folding algorithm and canonical equivalence

2004-07-18 Thread Jony Rosenne
Sent: Monday, July 19, 2004 12:16 AM > To: Peter Kirk > Cc: John Cowan; Unicode List; jony Rosenne > Subject: Re: Folding algorithm and canonical equivalence > > > At 05:25 AM 7/18/2004, Peter Kirk wrote: > >I accept that there might be some script-specific cases in which

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Mark E. Shoulson
Michael Everson wrote: At 13:00 +0300 2004-07-18, Jony Rosenne wrote: > Jony is arguing to extend AccentFolding to Hebrew (fold to unpointed). His suggestion is to fold *all* combining marks used with Hebrew in that case. I want to double check that he really means all combining marks in the

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Mark E. Shoulson
Jony Rosenne wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Asmus Freytag Sent: Sunday, July 18, 2004 10:53 AM To: John Cowan Cc: Peter Kirk; Unicode List; jony Rosenne Subject: Re: Folding algorithm and canonical equivalence Jony is

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread John Cowan
Asmus Freytag scripsit: > There are two options for a starting set: > select all 'accents' (note, not baseforms) that occur in some > precomposed character. And then add additional ones on a case by case > basis (e.g. stroke overlay). > > Or, start with all gc=Mn from the 0300 and 1DC0 blocks (

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread John Cowan
Peter Kirk scripsit: > Anyway, is Yiddish in fact never written completely unpointed? That > would surprise me. It might have happened at some point, but the standard (YIVO) Yiddish orthography would become illegible if points were stripped. -- Principles. You can't say A is John Cowa

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Peter Kirk
On 18/07/2004 22:15, Asmus Freytag wrote: At 05:25 AM 7/18/2004, Peter Kirk wrote: I accept that there might be some script-specific cases in which particular accents should not be removed. The breve in Cyrillic i kratkoe might be an example; but then this might be rather too language-specific a

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Asmus Freytag
At 05:25 AM 7/18/2004, Peter Kirk wrote: I accept that there might be some script-specific cases in which particular accents should not be removed. The breve in Cyrillic i kratkoe might be an example; but then this might be rather too language-specific as well. But these should be clearly define

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Asmus Freytag
At 05:28 AM 7/18/2004, Peter Kirk wrote: I can see that there might be cases when the Hebrew folding should be invoked without other scripts being affected. But I think that anyone applying a general accent or diacritic folding would expect this to include all Hebrew (and Arabic, Syriac etc) com

RE: Folding algorithm and canonical equivalence

2004-07-18 Thread Asmus Freytag
At 10:43 AM 7/18/2004, Jony Rosenne wrote: If folding is not suitable for Yiddish texts or Biblical texts or ancient Greek texts or any other text then I suggest that the user of said text seriously considers not using folding. Only very few foldings make sense to apply on a permanent basis. Think

RE: Folding algorithm and canonical equivalence

2004-07-18 Thread Michael Everson
At 20:43 +0300 2004-07-18, Jony Rosenne wrote: > In the Hebrew language, perhaps. But in other languages, like Yiddish, which use the Hebrew script, at least some points are NOT optional, and "dropping" them causes textual corruption and loss of data. Dropping them always causes loss of data. T

RE: Folding algorithm and canonical equivalence

2004-07-18 Thread Jony Rosenne
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Michael Everson > Sent: Sunday, July 18, 2004 2:51 PM > To: 'Unicode List' > Subject: RE: Folding algorithm and canonical equivalence > > > At 13:00

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Peter Kirk
On 18/07/2004 12:51, Michael Everson wrote: At 13:00 +0300 2004-07-18, Jony Rosenne wrote: > Jony is arguing to extend AccentFolding to Hebrew (fold to unpointed). His suggestion is to fold *all* combining marks used with Hebrew in that case. I want to double check that he really means all com

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Peter Kirk
On 18/07/2004 08:56, Asmus Freytag wrote: At 11:17 PM 7/17/2004, John Cowan wrote: Peter Kirk scripsit: > But I think the best thing to do is to drop *all* Hebrew > combining marks; the result of this is valid unpointed Hebrew. I agree. OK, in my last message I was cofused, this was Peter's sugges

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Peter Kirk
On 18/07/2004 08:52, Asmus Freytag wrote: At 11:15 PM 7/17/2004, John Cowan wrote: I agree that in the TR#30 context, the Right Thing is to remove the character pair mappings altogether, and all of the single-character mappings that have canonical decompositions In other words, in your opinion, th

RE: Folding algorithm and canonical equivalence

2004-07-18 Thread Michael Everson
At 13:00 +0300 2004-07-18, Jony Rosenne wrote: > Jony is arguing to extend AccentFolding to Hebrew (fold to unpointed). His suggestion is to fold *all* combining marks used with Hebrew in that case. I want to double check that he really means all combining marks in the > Hebrew block, or jus

RE: Folding algorithm and canonical equivalence

2004-07-18 Thread Jony Rosenne
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Asmus Freytag > Sent: Sunday, July 18, 2004 10:53 AM > To: John Cowan > Cc: Peter Kirk; Unicode List; jony Rosenne > Subject: Re: Folding algorithm and canonical equivalence >

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Marcin 'Qrczak' Kowalczyk
W liście z sob, 17-07-2004, godz. 16:46 -0700, Asmus Freytag napisał: > I wonder whether that's truly intended, or whether it could be replaced > by a combination of > > AccentFolding > OtherDiacriticFolding > > where AccentFolding removes *all* nonspacing marks following Latin, Greek > or Cyri

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Asmus Freytag
At 11:17 PM 7/17/2004, John Cowan wrote: Peter Kirk scripsit: > But I think the best thing to do is to drop *all* Hebrew > combining marks; the result of this is valid unpointed Hebrew. I agree. OK, in my last message I was cofused, this was Peter's suggestion and Jony had seconded it. I take it

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Asmus Freytag
At 11:15 PM 7/17/2004, John Cowan wrote: I agree that in the TR#30 context, the Right Thing is to remove the character pair mappings altogether, and all of the single-character mappings that have canonical decompositions In other words, in your opinion, the reasonable thing to do would be for some

Re: Folding algorithm and canonical equivalence

2004-07-17 Thread John Cowan
Asmus Freytag scripsit: > John, you proposed the initial set. Do you have any suggestion here? My original submission had only the single-character mappings, not the character pair mappings, which are just the result of decomposing the precomposed set and don't IMHO make much sense: they are too

Re: Folding algorithm and canonical equivalence

2004-07-17 Thread John Cowan
Peter Kirk scripsit: > But I think the best thing to do is to drop *all* Hebrew > combining marks; the result of this is valid unpointed Hebrew. I agree. -- Schlingt dreifach einen Kreis vom dies!John Cowan <[EMAIL PROTECTED]> Schliesst euer Aug vor heiliger Schau, http://www.reuters

RE: Folding algorithm and canonical equivalence

2004-07-17 Thread Jony Rosenne
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Asmus Freytag > Sent: Sunday, July 18, 2004 2:46 AM > To: Peter Kirk; Unicode List > Cc: [EMAIL PROTECTED] > Subject: Re: Folding algorithm and canonical equivalence > >

Re: Folding algorithm and canonical equivalence

2004-07-17 Thread Peter Kirk
On 18/07/2004 00:46, Asmus Freytag wrote: Thank you for reviewing this. DiacriticFolding (unlike AccentFolding) is selective about which combining marks it removes for which base character. I wonder whether that's truly intended, or whether it could be replaced by a combination of AccentFolding

Re: Folding algorithm and canonical equivalence

2004-07-17 Thread Asmus Freytag
Thank you for reviewing this. DiacriticFolding (unlike AccentFolding) is selective about which combining marks it removes for which base character. I wonder whether that's truly intended, or whether it could be replaced by a combination of AccentFolding OtherDiacriticFolding where AccentFolding

Folding algorithm and canonical equivalence

2004-07-17 Thread Peter Kirk
I was just reviewing the UTR #30 draft in response to Rick's notice about it. And I believe I may have found a point in which the folding algorithm as given may violate the principle of canonical equivalence. But I would like some clarification from list members before providing formal input on