Well, maybe 3 things ;-) Mark ________ [EMAIL PROTECTED] IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799
----- Original Message ----- From: "Mark Davis" <[EMAIL PROTECTED]> To: "Markus Scherer" <[EMAIL PROTECTED]>; "unicode" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Thursday, March 13, 2003 13:04 Subject: "Deterministic Sorting" (was Re: ZWNJ & Persian Collation) > I want to point out two things. > > 1. UCA provides a mechanism for producing a "deterministic" sort (there > called semi-stable). See step 3.10 > (http://www.unicode.org/reports/tr10/#Step_3). > > 2. A "deterministic" sort is actually not needed very often; people confuse > it with a stable sort. See http://www.unicode.org/reports/tr10/#Stability > > 3. If someone did customize the UCA for numeric sorting, the difference > between 002 and 2 could be a tertiary difference. So even without using > 3.10, they would be distinguished at level 3. > > Mark > ________ > [EMAIL PROTECTED] > IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 > (408) 256-3148 > fax: (408) 256-0799 > > ----- Original Message ----- > From: "Markus Scherer" <[EMAIL PROTECTED]> > To: "unicode" <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]> > Sent: Wednesday, March 12, 2003 08:48 > Subject: Re: ZWNJ & Persian Collation > > > > Roozbeh Pournader wrote: > > > Well, anything that is completely ignored in collation creates problems > > > with deterministic sorting. > > > > I don't think you mean "deterministic". UCA is deterministic, it just > sorts many strings as equal. > > > > > There are certain words in Persian, with > > > completely different meanings, that only differ in a ZWNJ[1]. Having > ZWNJ > > > ignored by default, means they may appear in this or that order, > possibly > > > based on the original order of input. I guess this is not what we want > > > for deterministic collation. > > > > > > The desired behavior for ZWNJ, is being treated like punctuations. > > > Ignored in the first levels, but considered at the end. (Personal Note: > > > write something for UTC on this.) > > > > Possible. I assume that ZWNJ is ignored in UCA because that is the > expected behavior for many other > > languages. Not ignoring ZWNJ is possible with a tailoring that gives it > some non-zero weights. > > > > Note that many languages require tailorings for at least a couple of > characters to follow national > > standards. > > > > markus > > > > -- > > Opinions expressed here may not reflect my company's positions unless > otherwise noted. > > > > > > > > >