Normalisation stability, was: Compression through normalization

2003-11-25 Thread Peter Kirk
On 24/11/2003 16:56, Philippe Verdy wrote: Peter Kirk writes: If conformance clause C10 is taken to be operable at all levels, this makes a nonsense of the concept of normalisation stability within databases etc. I don't think that the stability of normalization influence this: as long

RE: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Philippe Verdy
So it's the absence of stability which would make impossible this rearrangement of normalization forms... Canonical equivalence is unaffected if combining classes are rearranged, though not if they are split or joined. It is only the normalised forms of strings which may be changed. So

Re: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Peter Kirk
On 25/11/2003 07:22, Philippe Verdy wrote: ... Composition exclusions have a lower impact as well as the relative orders of canonical classes, as they don't affect canonical equivalence of strings, and thus won't affect applications based on the Unicode C10 definition; they are important only to

Re: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Doug Ewell
Normalization may or may not have an effect on compression. It has definitely been shown to have an effect on Hebrew combining marks. I must ask, however, that we try to keep these issues separate in discussion, and not let the compression topic, if there is to be any, degenerate into a wing of

RE: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Philippe Verdy
De : Peter Kirk [mailto:[EMAIL PROTECTED] Envoye : mardi 25 novembre 2003 17:06 A : [EMAIL PROTECTED] Cc : [EMAIL PROTECTED] Objet : Re: Normalisation stability, was: Compression through normalization On 25/11/2003 07:22, Philippe Verdy wrote: ... Composition exclusions have

Re: Normalisation stability, was: Compression through normalization

2003-11-25 Thread John Cowan
Philippe Verdy scripsit: I just wonder however why it was crucial (as Unicode says in its Definitions chapter) to expect a relative order of distinct non-zero combining classes. For me these combining classes are arbitrary not only on their absolute value as they are now, but even their

RE: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Philippe Verdy
John Cowan writes: Since it adds efficiency to normalize only once, it is worthwhile to define a few normalization forms and urge people to produce text in one of them, so that receivers need not normalize but need only check for normalization, typically much cheaper. I'm not convinced that

Re: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Doug Ewell
Peter Kirk peterkirk at qaya dot org wrote: Well, Doug, I see your point; different topics should be kept separate. But I changed the subject line precisely because the thread has shifted from discussion of compression to a general discussion of normalisation stability. That's true; most

Re: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Peter Kirk
On 25/11/2003 10:03, John Cowan wrote: ... And as for canonical equivalence, the most efficient way to compare strings for it is to normalize both of them in some way and then do a raw binary compare. Since it adds efficiency to normalize only once, it is worthwhile to define a few

Re: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: I'm not convinced that there's a significant improvement when only checking for noramlization but not perfomring it. It requires at least a list of the characters are acceptable in a normalization form, and as well their combining

Re: Normalisation stability, was: Compression through normalization

2003-11-25 Thread John Cowan
Peter Kirk scripsit: If receivers are expected to check for normalisation, they are presumably expected also to normalise Not so. An alternative behavior, which is preferred in certain circumstances, is to reject the input, or at least to advise higher layers that the input may be invalid.

Re: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Peter Kirk
On 25/11/2003 11:15, John Cowan wrote: Peter Kirk scripsit: If receivers are expected to check for normalisation, they are presumably expected also to normalise Not so. An alternative behavior, which is preferred in certain circumstances, is to reject the input, or at least to advise

Re: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Mark Davis
__ http://www.macchiato.com - Original Message - From: Doug Ewell [EMAIL PROTECTED] To: Unicode Mailing List [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; John Cowan [EMAIL PROTECTED] Sent: Tue, 2003 Nov 25 11:18 Subject: Re: Normalisation stability, was: Compression through

Re: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Rick McGowan
John Cowan suggested... We will never come close to exceeding this limit. Essentially all new combining characters are either class 0 or fall into one of the 200-range positional classes. Or 9, for viramas. One take-home point is that there won't be any more fixed position classes added

Re: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Peter Kirk
On 25/11/2003 08:55, Doug Ewell wrote: Normalization may or may not have an effect on compression. It has definitely been shown to have an effect on Hebrew combining marks. I must ask, however, that we try to keep these issues separate in discussion, and not let the compression topic, if there

Re: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Rick McGowan
Of course, as usual, this is my opinion. UTC hasn't actually made any proclamations about what will or won't be done in terms of the classes or what kinds of classes might be assigned in the future. Rick John Cowan suggested... We will never come close to exceeding this limit.

RE: Normalisation stability, was: Compression through normalization

2003-11-25 Thread Philippe Verdy
Rick McGowan writes: John Cowan suggested... We will never come close to exceeding this limit. Essentially all new combining characters are either class 0 or fall into one of the 200-range positional classes. Or 9, for viramas. Or 1, for overlays. Don't forget them... Or 7, for