In the latest ICU, we took the work we did for Java collation and extended
it substantially (and made it many times faster). It also allows arbitrary
customization at runtime.

I happen to be giving a presentation on it in a few hours at the conference.
For more information, see the draft collation chapter in the User guide, at
http://oss.software.ibm.com/icu/. The presentation (a slightly older draft)
is on my site at www.macchiato.com

Mark
—————

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο 
πάντα — Όμήρου Μαργίτῃ
[http://www.macchiato.com]
----- Original Message -----
From: "David Gallardo" <[EMAIL PROTECTED]>
To: "Edward Cherlin" <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Thursday, September 13, 2001 8:35 AM
Subject: Re: Collation (was RE: [OT] o-circumflex)


> Java's collation class has a rule-based  collator that is in effect
> programmable using a little language. Here is how an example from Sun's
API
> doc for Norwegian:
>
> String Norwegian = "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J"
>                  "< k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T"
>                  "< u,U< v,V< w,W< x,X< y,Y< z,Z"
>                  "< å=a?,Å=A?"
>                  ";aa,AA< æ,Æ< ø,Ø";
>  RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);
>
> There is also syntax for things such as specifying reverse order (for
French
> accents for example), contraction and expansion.
>
> - David Gallardo
>
> ----- Original Message -----
> From: "Edward Cherlin" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Thursday, September 13, 2001 3:40 AM
> Subject: Collation (was RE: [OT] o-circumflex)
>
>
> > English and several other languages have dozens of collations. Compare
> telephone books, library catalogs, book indexes (sic), and other sorted
> data. Knuth vol. 3 Sorting and Searching gives an example of a set of
> library sorting rules that runs to more than a page, and suggests
> programming it as an exercise. ;-) Among the rules are to spell out
numbers.
> > For example,
> >
> > 1984 (Nineteen Eighty Four)
> > 1066 and all that (Ten Sixty Six)
> > 3001 (Three Thousand One)
> > 2050 (Twenty Fifty)
> > 2010 (Twenty Ten)
> > 2001, A Space Odyssey (Two Thousand One)
> >
> > Bell Labs invented a whole programming language, Snobol, to deal with
> telephone listing conversions, matches, and sorts. Many phone books sort
Mc-
> and Mac- together, others one after the other but separate from other
names.
> >
> > Edward Cherlin
> > Generalist
> > "A knot! Oh, do let me help to undo it."
> > Alice in Wonderland
> >
> >
>
>
>
>


Reply via email to