Public bug reported: (do humor my lack of full understanding of these packages).
Was having issues sorting with COLLATE "en_US.UTF-8" on ubuntu 16.04, told it was related to glibc. On ubuntu 14.04 (with eglibc 2.19) I could sort a file of 2 million lines of international text (<40chars per line) in 20 seconds. On 16.04 (with glibc 2.23) sorting the same file with the same COLLATE took 10+ minutes. My only theory is that in 2.22 glibc added new 7.0 Unicode library (?) but really don't have a real grasp of what's going on here. Came upon this issue when trying to index my database for over 400M rows. What should've taken 4 hours was running for over 24 hours (never finished). Created a subset of that table to test / sort. Not sure how to replicate it easily, tried creating subsets to show my issue without success. Instead put 5000 lines into pastebin that you can try sorting yourself on 14.04 vs 16.04. http://pastebin.com/r47uD690 If you put that into a file and run the following you can see the discrepancy between 14.04 and 16.04: LC_COLLATE="en_US.UTF-8" sort /path/to/file > /dev/null LC_COLLATE="C" has no problems (should be way faster anyways, but differences between 14.04 and 16.04 not noticeable). If you do it on a 14.04 fresh build it takes < 1 second. On 16.04 it takes 8+ seconds. Small example, but it appeared to be even worse the larger the file (e.g. earlier example of 20 seconds vs 10 minutes). That's about all the info I have at this moment. If you need more information throw me a question. I am not very technically familiar with a lot of packages involved. Only posting here as I was directed to glibc as a potential issue with regards to sorting in different COLLATE settings. ** Affects: glibc (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1648641 Title: COLLATE "en_US.UTF-8" sorting takes 30x longer on newer builds To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1648641/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs