Re: UCA and Russian letter Ё

Markus Scherer Fri, 21 Dec 2012 11:02:03 -0800

Resending my earlier reply. Apparently, by default, Gmail sends subject
lines in KOI8-R if they contain Cyrillic, and unicode.org rejects those as
likely spam. I just changed my Gmail settings to "Use Unicode (UTF-8)
encoding for outgoing messages" and hope this goes through. (*Please change
the subject line* if you want to discuss *this* issue.)


My earlier reply was:

Theoretically, it is possible to select collation elements based on the
proximity of word boundaries or other criteria. However, I don't know if
there is an implementation that has that built in. ICU (one of the commonly
used implementations of UCA+CLDR) does not.

It sounds like the secondary difference is ok for sorting, but you are
looking to customize an alphabetic index such that there is a separate
"bucket" for words beginning with Ё. I think the best would be to do that
with some custom code that looks for Ё as the first character, in addition
to the regular bucketing and sorting.

Best regards,
markus
-- 
Google Internationalization Engineering

Re: UCA and Russian letter Ё

Reply via email to