Well, maybe 3 things ;-)

Mark
________
[EMAIL PROTECTED]
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "Mark Davis" <[EMAIL PROTECTED]>
To: "Markus Scherer" <[EMAIL PROTECTED]>; "unicode"
<[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, March 13, 2003 13:04
Subject: "Deterministic Sorting" (was Re: ZWNJ & Persian Collation)


> I want to point out two things.
>
> 1. UCA provides a mechanism for producing a "deterministic" sort (there
> called semi-stable). See step 3.10
> (http://www.unicode.org/reports/tr10/#Step_3).
>
> 2. A "deterministic" sort is actually not needed very often; people
confuse
> it with a stable sort. See http://www.unicode.org/reports/tr10/#Stability
>
> 3. If someone did customize the UCA for numeric sorting, the difference
> between 002 and 2 could be a tertiary difference. So even without using
> 3.10, they would be distinguished at level 3.
>
> Mark
> ________
> [EMAIL PROTECTED]
> IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
> (408) 256-3148
> fax: (408) 256-0799
>
> ----- Original Message -----
> From: "Markus Scherer" <[EMAIL PROTECTED]>
> To: "unicode" <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Sent: Wednesday, March 12, 2003 08:48
> Subject: Re: ZWNJ & Persian Collation
>
>
> > Roozbeh Pournader wrote:
> > > Well, anything that is completely ignored in collation creates
problems
> > > with deterministic sorting.
> >
> > I don't think you mean "deterministic". UCA is deterministic, it just
> sorts many strings as equal.
> >
> > > There are certain words in Persian, with
> > > completely different meanings, that only differ in a ZWNJ[1].  Having
> ZWNJ
> > > ignored by default, means they may appear in this or that order,
> possibly
> > > based on the original order of input.  I guess this is not what we
want
> > > for deterministic collation.
> > >
> > > The desired behavior for ZWNJ, is being treated like punctuations.
> > > Ignored in the first levels, but considered at the end. (Personal
Note:
> > > write something for UTC on this.)
> >
> > Possible. I assume that ZWNJ is ignored in UCA because that is the
> expected behavior for many other
> > languages. Not ignoring ZWNJ is possible with a tailoring that gives it
> some non-zero weights.
> >
> > Note that many languages require tailorings for at least a couple of
> characters to follow national
> > standards.
> >
> > markus
> >
> > --
> > Opinions expressed here may not reflect my company's positions unless
> otherwise noted.
> >
> >
> >
>
>
>


Reply via email to