There are currently 94 WMF wikis using UCA category collation rather than
the default "uppercase" collation. The Unicode Collation Algorithm (UCA) is
the official standard for how to sort Unicode characters, and generally
follows how a human would typically alphabetize strings. For example,
uppercase collation sorts Aztec, Ärsenik, Zoo, Aardvark as "Aardvark,
Aztec, Zoo, Ärsenik", but uca-default collation sorts them as "Aardvark,
Ärsenik, Aztec, Zoo". UCA collation also (optionally) supports natural
numeric sorting so that 100, 1, 99 sorts as "1, 99, 100" rather than "1,
100, 99". The WMF Community Tech team has recently posted proposals on
English Wikipedia and several Wiktionaries asking if these communities
would support switching to UCA collation. The proposal on English Wikipedia
has received unanimous support so far.[1] We thought that Wiktionaries
would be more skeptical of the change, but so far we have received only
positive responses.[2]

Since it seems that most wikis are receptive to switching to UCA, maybe we
should just make it the default rather than waiting on all the wikis to
request it separately. Of the large Wikipedias, French, Dutch, Polish,
Portuguese, and Russian are already using UCA, and German is in the process
of switching.[3] For non-Latin scripts, my understanding is that UCA will
be a big improvement, especially if we switch them to language-specific
implementations, like uca-ja, uca-zh, uca-ar, etc.

Three questions:
1. Does switching the default collation from "uppercase" to "uca-default"
sound like a good idea?
2. Should this be proposed on meta or is it too technical?
3. Are there any wikis that would need to opt out of this for some reason?
(I know there are issues with Kurdish,[4] but that's the only one I know
about.)

1.
https://en.wikipedia.org/wiki/Wikipedia_talk:Categorization#OK_to_switch_English_Wikipedia.27s_category_collation_to_uca-default.3F
2. https://phabricator.wikimedia.org/T128502
3. https://phabricator.wikimedia.org/T128806
4. https://phabricator.wikimedia.org/T48235
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to