On 18/01/11 07:41, Amir E. Aharoni wrote:
> 2011/1/17 Tim Starling <tstarl...@wikimedia.org>:
>> * It automatically drops accents, since accented letters sort the same
>> as unaccented letters (at the primary level).
> 
> How locale aware is it? For example, in Swedish accented letters come
> at the end of the alphabet and in Lithuanian I, Į and Y are collated
> together as if they were one letter. There are many quirks of this
> kind in other languages.

It's not locale-aware. As I said, it's a compromise collation. I was
hoping that other people might be interested in adding support for
specific locales, that's part of the reason for my post. ICU supports
lots of different locales, and there is locale-specific collation data
in the CLDR.

> And i don't know what to do when in the Lithuanian Wikipedia you sort
> names of places in the UK - should Islington come before or after
> York? 

Before.

> $collator = new Collator('lt')
> print $collator->compare( 'Islington', 'York' )
-1

But more interestingly, York goes before London:

> print $collator->compare( 'York', 'London' )
-1

I think attempting to do it any other way would be a lot of trouble,
and not what is wanted anyway. To put the question another way: on the
English Wikipedia, should Kybartai sort before Klaipėda? I would think
not.

> (But hey, there's at least one Lithuanian MediaWiki developer,
> so i don't know whether my help is really needed here.)

If you mean Domas, I don't think this is the kind of thing he's
interested in.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to