bug#32472: sort doesn't sort and uniq loses data for many non-Latin scripts on UTF-8 locales

2018-10-29 Thread Assaf Gordon
tags 32472 notabug close 32472 stop On 2018-08-18 11:34 a.m., Paul Eggert wrote: Vaayda Yaasra wrote: Here’s an example in Syriac: ܡܠܬܐ ܒܝܬܐ ܒܪܢܫܐ ܡܠܬܐ Sort produces the following: ܡܠܬܐ ܒܝܬܐ ܡܠܬܐ ܒܪܢܫܐ This is a property of your locale, so I suggest sending a bug report to whoever

bug#32472: sort doesn't sort and uniq loses data for many non-Latin scripts on UTF-8 locales

2018-08-18 Thread Paul Eggert
Vaayda Yaasra wrote: Here’s an example in Syriac: ܡܠܬܐ ܒܝܬܐ ܒܪܢܫܐ ܡܠܬܐ Sort produces the following: ܡܠܬܐ ܒܝܬܐ ܡܠܬܐ ܒܪܢܫܐ This is a property of your locale, so I suggest sending a bug report to whoever maintains your locale. You should be able to reproduce the problem by bypassing GNU

bug#32472: sort doesn't sort and uniq loses data for many non-Latin scripts on UTF-8 locales

2018-08-18 Thread Vaayda Yaasra
I’ve found out that sort doesn’t sort strings for many non-Latin scripts at all if the locale you’re using is one of en_US.UTF-8, fr_FR.UTF-8 or fi_FI.UTF-8 (probably others, too, but these are the ones I have tested). For locales ”C” and ko_KR.UTF-8, things work as expected. Here’s a test case: