Public bug reported:

(do humor my lack of full understanding of these packages).

Was having issues sorting with COLLATE "en_US.UTF-8" on ubuntu 16.04,
told it was related to glibc.

On ubuntu 14.04 (with eglibc 2.19) I could sort a file of 2 million
lines of international text (<40chars per line) in 20 seconds. On 16.04
(with glibc 2.23) sorting the same file with the same COLLATE took 10+
minutes. My only theory is that in 2.22 glibc added new 7.0 Unicode
library (?) but really don't have a real grasp of what's going on here.

Came upon this issue when trying to index my database for over 400M
rows. What should've taken 4 hours was running for over 24 hours (never
finished). Created a subset of that table to test / sort.

Not sure how to replicate it easily, tried creating subsets to show my issue 
without success. Instead put 5000 lines into pastebin that you can try sorting 
yourself on 14.04 vs 16.04.
http://pastebin.com/r47uD690

If you put that into a file and run the following you can see the discrepancy 
between 14.04 and 16.04:
LC_COLLATE="en_US.UTF-8" sort /path/to/file > /dev/null

LC_COLLATE="C" has no problems (should be way faster anyways, but
differences between 14.04 and 16.04 not noticeable).

If you do it on a 14.04 fresh build it takes < 1 second. On 16.04 it
takes 8+ seconds. Small example, but it appeared to be even worse the
larger the file (e.g. earlier example of 20 seconds vs 10 minutes).

That's about all the info I have at this moment. If you need more
information throw me a question. I am not very technically familiar with
a lot of packages involved. Only posting here as I was directed to glibc
as a potential issue with regards to sorting in different COLLATE
settings.

** Affects: glibc (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1648641

Title:
  COLLATE "en_US.UTF-8" sorting takes 30x longer on newer builds

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1648641/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to