bug#7489: [coreutils] over aggressive threads in sort

2018-10-30 Thread Assaf Gordon
(triaging old bugs) Hello, This long thread ( http://bugs.gnu.org/7489 ) deals with multiple parallel-sort bugs, resulting in many commits: 1d0a12037 Paul Eggert 2010-12-22 sort: minor performance tweak with num_processors 41159f960 Pádraig Brady 2010-12-20 maint: fix a typo in sort --p

bug#7489: [coreutils] over aggressive threads in sort

2010-12-07 Thread Jim Meyering
Chen Guo wrote: > Hi Professor Eggert, > On Sun, Dec 5, 2010 at 11:01 PM, Paul Eggert wrote: >> On 12/05/2010 09:16 PM, Chen Guo wrote: >>> Before saying anything else, I should note that for mutexes, on 4 >>> threads 20% of the time there's a segfault on a seemingly innocuous >>> line in queue_in

bug#7489: [coreutils] over aggressive threads in sort

2010-12-07 Thread Jim Meyering
Chen Guo wrote: ... > I've attached the patch (inlined at the bottom). Here's the GDB > crash, with backtrace. I also printed node->queued in GDB, so it's > evident that its accessible. > > (gdb) run --parallel 2 rec_1M > /dev/null > Starting program: /data/chen/Coding/Coreutils/test/sort-mutex

bug#7489: [coreutils] over aggressive threads in sort

2010-12-06 Thread Paul Eggert
On 12/05/10 03:21, Jim Meyering wrote: > seq -w 20 > exp && tac exp > in > PATH=.:$PATH ./sort --compress-program=dzip -S 1k in > out > > That gets stuck in waitpid (from sort.c's reap), waiting for a > dzip invocation that appears will never terminate. This is also > on that same 4-core

bug#7489: [coreutils] over aggressive threads in sort

2010-12-06 Thread Chen Guo
Hi Professor Eggert, On Sun, Dec 5, 2010 at 11:01 PM, Paul Eggert wrote: > On 12/05/2010 09:16 PM, Chen Guo wrote: >> Before saying anything else, I should note that for mutexes, on 4 >> threads 20% of the time there's a segfault on a seemingly innocuous >> line in queue_insert (): >>   node->queu

bug#7489: [coreutils] over aggressive threads in sort

2010-12-05 Thread Paul Eggert
On 12/05/2010 09:16 PM, Chen Guo wrote: > Before saying anything else, I should note that for mutexes, on 4 > threads 20% of the time there's a segfault on a seemingly innocuous > line in queue_insert (): > node->queued = true It does sound like mutexes are the way to go, and that this bug needs

bug#7489: [coreutils] over aggressive threads in sort

2010-12-05 Thread Chen Guo
Hi Professor Eggert, On Fri, Dec 3, 2010 at 1:10 PM, Paul Eggert wrote: > On 12/03/10 12:18, Chen Guo wrote: > Either option (either switch to mutexes everywhere, or have the top-level > merge go to memory) should work.  Perhaps we should try both and benchmark > them. Test machine is 4 core

bug#7489: [coreutils] over aggressive threads in sort

2010-12-05 Thread Jim Meyering
Paul Eggert wrote: > On 11/29/2010 02:46 PM, Paul Eggert wrote: >> My current guess, by the way, >> is that it's not a bug that can be triggered: it's merely >> useless code that is harmless and can safely be removed. > > I removed it as part of the following series of cleanup > patches. These are

bug#7489: [coreutils] over aggressive threads in sort

2010-12-04 Thread Paul Eggert
On 11/29/2010 02:46 PM, Paul Eggert wrote: > My current guess, by the way, > is that it's not a bug that can be triggered: it's merely > useless code that is harmless and can safely be removed. I removed it as part of the following series of cleanup patches. These are intended merely to refactor

bug#7489: [coreutils] over aggressive threads in sort

2010-12-03 Thread Chen Guo
Thanks Jim, that helped a lot. I'll try out Professor Eggert's suggestion, of switching to mutexes only at the top level merge. Of the following approaches, which would you guys consider better practice? 1) void pointer, cast as either mutex or spinlock in lock function 2) union of mutex and spin

bug#7489: [coreutils] over aggressive threads in sort

2010-12-02 Thread Chen Guo
Hi Professor Eggert, On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert wrote: >  (for i in $(seq 12); do read line; echo $i; sleep .1; done >  cat > /dev/null) < fifo & >  (ulimit -t 1; ./sort in > fifo \ >  || echo killed via $(env kill -l $(expr $? - 128))) I ran this 10 times or so on an i7 and co

bug#7489: [coreutils] over aggressive threads in sort

2010-11-30 Thread Paul Eggert
On 11/30/2010 04:19 PM, Chen Guo wrote: > could you detail how you can trigger the divide-by-zero bug? Invoke MAX_MERGE(total, level) with level == 15. 2 << level yields 65536, and 65536 * 65536 overflows to zero.

bug#7489: [coreutils] over aggressive threads in sort

2010-11-30 Thread Jim Meyering
Jim Meyering wrote: > Paul Eggert wrote: >> Could you please try this little patch? It should fix your >> problem. I came up with this fix in my sleep (literally! >> I woke up this morning and the patch was in my head), but >> haven't had time to look at the code in this area to see >> if it's th

bug#7489: [coreutils] over aggressive threads in sort

2010-11-30 Thread Chen Guo
Hi guys, Is something up with Savannah? I just tried a git clone and got connection time out; I cant even reach git.sv.gnu.org via ping. I'll try again at work tomorrow.

bug#7489: [coreutils] over aggressive threads in sort

2010-11-30 Thread Paul Eggert
On 11/29/2010 08:32 PM, Chen Guo wrote: > Hi guys, > Is something up with Savannah? I just tried a git clone and got > connection time out; I cant even reach git.sv.gnu.org via ping. There was a breakin, which led to leaking of encrypted account passwords, some of them discovered via a brute-f

bug#7489: [coreutils] over aggressive threads in sort

2010-11-30 Thread Paul Eggert
On 11/30/10 13:41, Jim Meyering wrote: > Is there anything you'd like to add? No, thanks, that looks good. I have some other patches to clean things up in this area, but they can wait. I hate to tease, so here is a draft of the cleanup patches. Most of this stuff is cleanup, but the first line of

bug#7489: [coreutils] over aggressive threads in sort

2010-11-30 Thread Chen Guo
On Tue, Nov 30, 2010 at 2:22 PM, Paul Eggert wrote: Hi Professor Eggert, > Anyway, perhaps Chen can review them (I don't have time > to test them right now). I'll look at it as soon as Savannah's back up; I never actually pulled from coreutils after the original patch was submitted. Professor Egge

bug#7489: [coreutils] over aggressive threads in sort

2010-11-29 Thread Paul Eggert
On 11/29/10 16:34, Chen Guo wrote: > The only way this would work is if, when a struct is locked via mutex the only > threads trying to acquire the struct are trying to do so via mutex, > and no threads > are looking to lock via spinlock. Yes, that's definitely the idea. Under either of my propos

bug#7489: [coreutils] over aggressive threads in sort

2010-11-29 Thread Chen Guo
Hi all, On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert wrote: > entirely and use mutexes instead.  Perhaps a better fix would be to > use mutexes at the top level (where threads can write to a file and > therefore can wait) and to use spin locks at lower levels (where > threads are merely storing

bug#7489: [coreutils] over aggressive threads in sort

2010-11-29 Thread Paul Eggert
On 11/28/10 23:14, DJ Lucas wrote: > http://lists.gnu.org/archive/html/coreutils/2010-11/msg00124.html Ah, sorry, I didn't understand that message and thought Pádraig had handled it. On an 8-core RHEL 5.5 x86-64 host I reproduced the problem with the stated test case: (for i in $(seq 12); do r

bug#7489: [coreutils] over aggressive threads in sort

2010-11-29 Thread Pádraig Brady
On 29/11/10 07:14, DJ Lucas wrote: > On 11/27/2010 08:18 PM, DJ Lucas wrote: > >> >> lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ cat >> /lfs-source-archive/cracklib-words-20080507 | sort -u > /dev/null; echo $? >> 0 >> lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ >> >> Appears to work

bug#7489: [coreutils] over aggressive threads in sort

2010-11-28 Thread DJ Lucas
On 11/27/2010 08:18 PM, DJ Lucas wrote: > > lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ cat > /lfs-source-archive/cracklib-words-20080507 | sort -u > /dev/null; echo $? > 0 > lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ > > Appears to work as expected. Thanks for jumping on this so

bug#7489: [coreutils] over aggressive threads in sort

2010-11-27 Thread DJ Lucas
On 11/27/2010 06:57 PM, Paul Eggert wrote: > Could you please try this little patch? It should fix your > problem. I came up with this fix in my sleep (literally! > I woke up this morning and the patch was in my head), but > haven't had time to look at the code in this area to see > if it's the b

bug#7489: [coreutils] over aggressive threads in sort

2010-11-27 Thread Pádraig Brady
On 28/11/10 00:57, Paul Eggert wrote: > Could you please try this little patch? It should fix your > problem. I came up with this fix in my sleep (literally! > I woke up this morning and the patch was in my head), but > haven't had time to look at the code in this area to see > if it's the best f

bug#7489: [coreutils] over aggressive threads in sort

2010-11-27 Thread Paul Eggert
Could you please try this little patch? It should fix your problem. I came up with this fix in my sleep (literally! I woke up this morning and the patch was in my head), but haven't had time to look at the code in this area to see if it's the best fix. Clearly there's at least one more bug as no

bug#7489: [coreutils] over aggressive threads in sort

2010-11-27 Thread Paul Eggert
Following up on my previous email, it appears to me that the following line in mergelines_node is weird: node->dest -= lo_orig - node->lo + hi_orig - node->hi; Surely there should be a "*" in front of that line? (This does not fix the bug; perhaps it is a different bug?)

bug#7489: [coreutils] over aggressive threads in sort

2010-11-27 Thread Paul Eggert
On 11/26/2010 06:52 PM, Pádraig Brady wrote: > Hmm, seems like multiple threads are racing to update the > static "saved" variable in write_unique() ? I don't think it's as simple as that. write_unique is generating output, and when it is run it is supposed to have exclusive access to the output

bug#7489: [coreutils] over aggressive threads in sort

2010-11-26 Thread Pádraig Brady
On 26/11/10 18:01, DJ Lucas wrote: > Sent too bug-coreutils too (no bug id currently AFAICT). > > Bug only affects multi-byte locales. Take the following samples: > > > > bash-4.1# zcat cracklib-words-20080507.gz | sort -u --debug > file && > echo $? > sort: using `en_US.UTF-8' sorting rules

bug#7489: [coreutils] over aggressive threads in sort

2010-11-26 Thread DJ Lucas
On 11/26/2010 05:24 PM, Paul Eggert wrote: > Thanks for the bug report. Unfortunately, > I cannot reproduce the problem with coreutils 8.7, either on > RHEL 5.5 x86-64 or on Ubuntu 10.10 x86. > > Which version of coreutils are you running? 8.7. Haven't tested on 8.6 or 8.5. 8.4 worked correc

bug#7489: [coreutils] over aggressive threads in sort

2010-11-26 Thread Paul Eggert
Thanks for the bug report. Unfortunately, I cannot reproduce the problem with coreutils 8.7, either on RHEL 5.5 x86-64 or on Ubuntu 10.10 x86. Which version of coreutils are you running? And on what platform? How did you build it? Can you reproduce it with --parallel=2? If not, which value of

bug#7489: [coreutils] over aggressive threads in sort

2010-11-26 Thread DJ Lucas
Sent too bug-coreutils too (no bug id currently AFAICT). Bug only affects multi-byte locales. Take the following samples: bash-4.1# zcat cracklib-words-20080507.gz | sort -u --debug > file && echo $? sort: using `en_US.UTF-8' sorting rules Segmentation fault bash-4.1# echo $? 139 bash-4.1#