Re: [Wikitech-l] Category sorting and first letters

Maciej Jaros Tue, 18 Jan 2011 06:43:24 -0800

Tim Starling (2011-01-18 02:03):
> On 18/01/11 07:41, Amir E. Aharoni wrote:
>> 2011/1/17 Tim Starling<tstarl...@wikimedia.org>:
>>> * It automatically drops accents, since accented letters sort the same
>>> as unaccented letters (at the primary level).
>> How locale aware is it? For example, in Swedish accented letters come
>> at the end of the alphabet and in Lithuanian I, Į and Y are collated
>> together as if they were one letter. There are many quirks of this
>> kind in other languages.
> It's not locale-aware. As I said, it's a compromise collation. I was
> hoping that other people might be interested in adding support for
> specific locales, that's part of the reason for my post. ICU supports
> lots of different locales, and there is locale-specific collation data
> in the CLDR.
>
>> And i don't know what to do when in the Lithuanian Wikipedia you sort
>> names of places in the UK - should Islington come before or after
>> York?
> Before.
>
>> $collator = new Collator('lt')
>> print $collator->compare( 'Islington', 'York' )
> -1
>
> But more interestingly, York goes before London:
>
>> print $collator->compare( 'York', 'London' )
> -1
>
> I think attempting to do it any other way would be a lot of trouble,
> and not what is wanted anyway. To put the question another way: on the
> English Wikipedia, should Kybartai sort before Klaipėda? I would think
> not.


I've seen sorting accent insensitive and so for example "Bańka" would be 
sorted as if it was "Banka", but I haven't yet seen phone insensitive or 
whatever you call it. What I mean is in Poland "rz" i pronounced the 
same (almost the same) as "ż", but "rz" is nowhere near "ż" when it 
comes to sorting. In fact it would be very counter intuitive for me (as 
would be 'York' < 'London'). I think it would not be helpful especially 
for foreigners. I've also said that I've _seen_ accent insensitive 
dictionaries, but _most_ are case sensitive and so "ą" > "a" not "ą"="a" 
also when it comes to the first letter all dictionaries I know have "Ż" 
separate from "Z". You might see our collation as - without accent first 
and with accent second. This is the why we say are ABC. And it would be 
intuitive for to have English collation by it's ABC with Y coming just 
before Z.

I think the problem should only be solved for letters which are not just 
Latin character + accent. How to sort them in Latin (and Latin based) 
characters.

Regards,
Nux.


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Category sorting and first letters

Reply via email to