Re: [Wikitech-l] Architectural revisions to improve category sorting

Nikola Smolenski Tue, 17 Aug 2010 13:46:52 -0700

Дана Tuesday 17 August 2010 22:11:32 Aryeh Gregor написа:
> On Tue, Aug 17, 2010 at 4:06 PM, Nikola Smolenski <smole...@eunet.rs> wrote:
> > For some time now, I am thinking about a stupidly simple solution:
> >
> > php -r 'for($i = 0; $i < 65536; $i++) { echo pack("nx", $i); echo "\n";
> > }'| iconv -f ucs-2be -t utf8 | sort | php -r 'foreach(file("php://stdin")
> > as $v) { echo var_export(substr($v, 0, -1)) . " => \"" .
> > str_pad(base_convert($i, 10, 36), 4, 0, STR_PAD_LEFT) . "\",\n"; $i++; }'
>
> This doesn't account for how complicated proper locale-specific
> sorting is.  Multi-character strings do not sort just based on
> splitting them into characters and sorting those.  You can have the
> same character sorting differently in different contexts.  There are
> well-established libraries for Unicode sorting, and we certainly
> should not try to reinvent the wheel here.


All right; but right now we are not paying attention to character context too, 
and not properly sorting even single characters. I mean, we are sorting Ђ 
before А! Surely this would be an improvement?

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Architectural revisions to improve category sorting

Reply via email to