Maciej Jaros (2011-01-18 15:42): > Tim Starling (2011-01-18 02:03): >> On 18/01/11 07:41, Amir E. Aharoni wrote: >>> 2011/1/17 Tim Starling<tstarl...@wikimedia.org>: >>>> * It automatically drops accents, since accented letters sort the same >>>> as unaccented letters (at the primary level). >>> How locale aware is it? For example, in Swedish accented letters come >>> at the end of the alphabet and in Lithuanian I, Į and Y are collated >>> together as if they were one letter. There are many quirks of this >>> kind in other languages. >> It's not locale-aware. As I said, it's a compromise collation. I was >> hoping that other people might be interested in adding support for >> specific locales, that's part of the reason for my post. ICU supports >> lots of different locales, and there is locale-specific collation data >> in the CLDR. >> >>> And i don't know what to do when in the Lithuanian Wikipedia you sort >>> names of places in the UK - should Islington come before or after >>> York? >> Before. >> >>> $collator = new Collator('lt') >>> print $collator->compare( 'Islington', 'York' ) >> -1 >> >> But more interestingly, York goes before London: >> >>> print $collator->compare( 'York', 'London' ) >> -1 >> >> I think attempting to do it any other way would be a lot of trouble, >> and not what is wanted anyway. To put the question another way: on the >> English Wikipedia, should Kybartai sort before Klaipėda? I would think >> not. > I've seen sorting accent insensitive and so for example "Bańka" would be > sorted as if it was "Banka", but I haven't yet seen phone insensitive or > whatever you call it. What I mean is in Poland "rz" is pronounced the > same (almost the same) as "ż", but "rz" is nowhere near "ż" when it > comes to sorting. In fact it would be very counter intuitive for me (as > would be 'York'< 'London'). I think it would not be helpful especially > for foreigners. I've also said that I've _seen_ accent insensitive > dictionaries, but _most_ are case sensitive and so "ą"> "a" not "ą"="a" > also when it comes to the first letter all dictionaries I know have "Ż" > separate from "Z". You might see our collation as - without accent first > and with accent second. /This is the why we say are ABC. And it would be > intuitive for to have English collation by it's ABC with Y coming just > before Z./
Sorry, sometimes I type phonetically :-). The last sentences were supposed to be: This is the way we say our ABC. And it would be intuitive for me to have English collation by its ABC with Y coming just before Z. > I think the problem should only be solved for letters which are not just > Latin character + accent. How to sort them in Latin (and Latin based) > characters. > > Regards, > Nux. > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l