On 24/11/2003 16:56, Philippe Verdy wrote:
Peter Kirk writes:
If conformance clause C10 is taken to be operable at all levels, this
makes a nonsense of the concept of normalisation stability within
databases etc.
I don't think that the stability of normalization influence this: as long
So it's the absence of stability which would make impossible this
rearrangement of normalization forms...
Canonical equivalence is unaffected if combining classes are rearranged,
though not if they are split or joined. It is only the normalised forms
of strings which may be changed. So
On 25/11/2003 07:22, Philippe Verdy wrote:
...
Composition exclusions have a lower impact as well as the relative orders of
canonical classes, as they don't affect canonical equivalence of strings,
and thus won't affect applications based on the Unicode C10 definition; they
are important only to
Normalization may or may not have an effect on compression. It has
definitely been shown to have an effect on Hebrew combining marks.
I must ask, however, that we try to keep these issues separate in
discussion, and not let the compression topic, if there is to be any,
degenerate into a wing of
De : Peter Kirk [mailto:[EMAIL PROTECTED]
Envoye : mardi 25 novembre 2003 17:06
A : [EMAIL PROTECTED]
Cc : [EMAIL PROTECTED]
Objet : Re: Normalisation stability, was: Compression through
normalization
On 25/11/2003 07:22, Philippe Verdy wrote:
...
Composition exclusions have
Philippe Verdy scripsit:
I just wonder however why it was crucial (as Unicode says in its
Definitions chapter) to expect a relative order of distinct non-zero
combining classes. For me these combining classes are arbitrary not only on
their absolute value as they are now, but even their
John Cowan writes:
Since it adds efficiency to normalize only once,
it is worthwhile to define a few normalization forms and urge
people to produce text in one of them, so that receivers need not
normalize but need only check for normalization, typically much cheaper.
I'm not convinced that
Peter Kirk peterkirk at qaya dot org wrote:
Well, Doug, I see your point; different topics should be kept
separate. But I changed the subject line precisely because the thread
has shifted from discussion of compression to a general discussion of
normalisation stability.
That's true; most
On 25/11/2003 10:03, John Cowan wrote:
... And as for
canonical equivalence, the most efficient way to compare strings for
it is to normalize both of them in some way and then do a raw
binary compare. Since it adds efficiency to normalize only once,
it is worthwhile to define a few
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
I'm not convinced that there's a significant improvement when only
checking for noramlization but not perfomring it. It requires at least
a list of the characters are acceptable in a normalization form, and
as well their combining
Peter Kirk scripsit:
If receivers are expected to check for normalisation, they are
presumably expected also to normalise
Not so. An alternative behavior, which is preferred in certain circumstances,
is to reject the input, or at least to advise higher layers that the input
may be invalid.
On 25/11/2003 11:15, John Cowan wrote:
Peter Kirk scripsit:
If receivers are expected to check for normalisation, they are
presumably expected also to normalise
Not so. An alternative behavior, which is preferred in certain circumstances,
is to reject the input, or at least to advise
__
http://www.macchiato.com
- Original Message -
From: Doug Ewell [EMAIL PROTECTED]
To: Unicode Mailing List [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; John Cowan [EMAIL PROTECTED]
Sent: Tue, 2003 Nov 25 11:18
Subject: Re: Normalisation stability, was: Compression through
John Cowan suggested...
We will never come close to exceeding this limit. Essentially all new
combining characters are either class 0 or fall into one of the 200-range
positional classes.
Or 9, for viramas.
One take-home point is that there won't be any more fixed position
classes added
On 25/11/2003 08:55, Doug Ewell wrote:
Normalization may or may not have an effect on compression. It has
definitely been shown to have an effect on Hebrew combining marks.
I must ask, however, that we try to keep these issues separate in
discussion, and not let the compression topic, if there
Of course, as usual, this is my opinion. UTC hasn't actually made any
proclamations about what will or won't be done in terms of the classes or
what kinds of classes might be assigned in the future.
Rick
John Cowan suggested...
We will never come close to exceeding this limit.
Rick McGowan writes:
John Cowan suggested...
We will never come close to exceeding this limit. Essentially all new
combining characters are either class 0 or fall into one of the
200-range positional classes.
Or 9, for viramas.
Or 1, for overlays. Don't forget them...
Or 7, for
17 matches
Mail list logo