Collation (was RE: [OT] o-circumflex)

2001-09-13 Thread Edward Cherlin
English and several other languages have dozens of collations. Compare telephone books, library catalogs, book indexes (sic), and other sorted data. Knuth vol. 3 Sorting and Searching gives an example of a set of library sorting rules that runs to more than a page, and suggests programming it

Collation (was RE: [OT] o-circumflex)

2001-09-13 Thread てんどう瘢雹りゅう瘢雹じ
Whoever invented English number words, then, had a very sick sense of humour. Why doesn't the word for "one" start with "a", the word for "two" with "b", etc.,? rubyrbじゅういっちゃん/rbrp(/rprtJuuitchan/rtrp)/rp/ruby Well, I guess what you say is true, I could never be the right kind of

Re: PDUTR #26 posted

2001-09-13 Thread Marcin 'Qrczak' Kowalczyk
Wed, 12 Sep 2001 11:08:41 -0700, Julie Doll Allen [EMAIL PROTECTED] pisze: Proposed Draft Unicode Technical Report #26: Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8) is now available at: http://www.unicode.org/unicode/reports/tr26/ IMHO Unicode would have been a better standard if

Re: Collation (was RE: [OT] o-circumflex)

2001-09-13 Thread David Gallardo
Java's collation class has a rule-based collator that is in effect programmable using a little language. Here is how an example from Sun's API doc for Norwegian: String Norwegian = a,A b,B c,C d,D e,E f,F g,G h,H i,I j,J k,K l,L m,M n,N o,O p,P q,Q r,R s,S t,T

Re: Collation (was RE: [OT] o-circumflex)

2001-09-13 Thread Mark Davis
In the latest ICU, we took the work we did for Java collation and extended it substantially (and made it many times faster). It also allows arbitrary customization at runtime. I happen to be giving a presentation on it in a few hours at the conference. For more information, see the draft

Re: What code point is assigned for the Newton unit?

2001-09-13 Thread Asmus Freytag
Your letter makes clear that Unicode needs to do a better job of identifying the preferred character code for many situations. The information is there to a large extent, but buried in the fine print or in data tables. You will see that there is a canonical decomposition from U+212B to

Re: PDUTR #26 posted

2001-09-13 Thread Asmus Freytag
At 11:42 AM 9/13/01 +, Marcin 'Qrczak' Kowalczyk wrote: IMHO Unicode would have been a better standard if UTF-16 hadn't existed. Decidedly not. In fact, Unicode would not be widely implemented today. Just UTF-8 and UTF-32, code points in the range U+..7FFF, no surrogates, no

Re: Alternative sorting for digraphs (Was Re: [OT] o-circumflex)

2001-09-13 Thread Roozbeh Pournader
On Mon, 10 Sep 2001, Mark Davis wrote: A ZWNJ will break ligatures and cursive connections. While probably safe in Danish or Dutch, it is unclear to me that that is safe in all languages where this situation occurs. There are diagraphs in Urdu, for example. While I don't know their sorting