2009/5/11 Lars Aronsson <l...@aronsson.se>:
> Category sorting in MediaWiki has always been done wrong.
> Categories are not sorted alphabetically, but in Unicode order.

Sure thing. See https://bugzilla.wikimedia.org/show_bug.cgi?id=164
with 95 votes…

> Another example of broken sorting is when whitespace is compared
> to letters.  In ASCII and Unicode, whitespace (position 32) sorts
> ahead of all printable characters.  This means Moon illusion sorts
> ahead of Moonbow in http://en.wikipedia.org/wiki/Category:Moon
> because the whitespace before "illusion" is compared to the b in
> Moonbow.  I'm not sure if this is correct in English, but in
> Swedish it is wrong; bow should sort before illusion, regardless
> of the whitespace.

In Czech, this is correct, Czech collation works on individual words.
As you see, the rules are language-specific.

> There is a way to avoid all such problems, namely by a more
> aggressive use of DEFAULTSORT that removes from sorting all upper
> case letters (except the initial one), all whitespace and all
> commas.

The problem is much more difficult than that (see the linked bug).
Commas, case sensitivity and whitespace are a trivial problem in
comparison with non-ASCII letters.

-- [[cs:User:Mormegil | Petr Kadlec]]

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to