English and several other languages have dozens of collations. Compare telephone
books, library catalogs, book indexes (sic), and other sorted data. Knuth vol. 3
Sorting and Searching gives an example of a set of library sorting rules that runs to
more than a page, and suggests programming it
Whoever invented English number words, then, had a very sick sense of humour. Why
doesn't the word for "one" start with "a", the word for "two" with "b", etc.,?
rubyrbじゅういっちゃん/rbrp(/rprtJuuitchan/rtrp)/rp/ruby
Well, I guess what you say is true,
I could never be the right kind of
Wed, 12 Sep 2001 11:08:41 -0700, Julie Doll Allen [EMAIL PROTECTED] pisze:
Proposed Draft Unicode Technical Report #26: Compatibility Encoding
Scheme for UTF-16: 8-Bit (CESU-8) is now available at:
http://www.unicode.org/unicode/reports/tr26/
IMHO Unicode would have been a better standard if
Java's collation class has a rule-based collator that is in effect
programmable using a little language. Here is how an example from Sun's API
doc for Norwegian:
String Norwegian = a,A b,B c,C d,D e,E f,F g,G h,H i,I j,J
k,K l,L m,M n,N o,O p,P q,Q r,R s,S t,T
In the latest ICU, we took the work we did for Java collation and extended
it substantially (and made it many times faster). It also allows arbitrary
customization at runtime.
I happen to be giving a presentation on it in a few hours at the conference.
For more information, see the draft
Your letter makes clear that Unicode needs to do a better job of
identifying the preferred character code for many situations. The
information is there to a large extent, but buried in the fine print or in
data tables.
You will see that there is a canonical decomposition from U+212B to
At 11:42 AM 9/13/01 +, Marcin 'Qrczak' Kowalczyk wrote:
IMHO Unicode would have been a better standard if UTF-16
hadn't existed.
Decidedly not. In fact, Unicode would not be widely implemented today.
Just UTF-8 and UTF-32, code points in the range
U+..7FFF, no surrogates, no
On Mon, 10 Sep 2001, Mark Davis wrote:
A ZWNJ will break ligatures and cursive connections. While probably safe in
Danish or Dutch, it is unclear to me that that is safe in all languages
where this situation occurs. There are diagraphs in Urdu, for example. While
I don't know their sorting
8 matches
Mail list logo