(triaging old bugs)
Hello,
This long thread ( http://bugs.gnu.org/7489 )
deals with multiple parallel-sort bugs, resulting in many commits:
1d0a12037 Paul Eggert 2010-12-22 sort: minor performance tweak with
num_processors
41159f960 Pádraig Brady 2010-12-20 maint: fix a typo in sort --p
Chen Guo wrote:
> Hi Professor Eggert,
> On Sun, Dec 5, 2010 at 11:01 PM, Paul Eggert wrote:
>> On 12/05/2010 09:16 PM, Chen Guo wrote:
>>> Before saying anything else, I should note that for mutexes, on 4
>>> threads 20% of the time there's a segfault on a seemingly innocuous
>>> line in queue_in
Chen Guo wrote:
...
> I've attached the patch (inlined at the bottom). Here's the GDB
> crash, with backtrace. I also printed node->queued in GDB, so it's
> evident that its accessible.
>
> (gdb) run --parallel 2 rec_1M > /dev/null
> Starting program: /data/chen/Coding/Coreutils/test/sort-mutex
On 12/05/10 03:21, Jim Meyering wrote:
> seq -w 20 > exp && tac exp > in
> PATH=.:$PATH ./sort --compress-program=dzip -S 1k in > out
>
> That gets stuck in waitpid (from sort.c's reap), waiting for a
> dzip invocation that appears will never terminate. This is also
> on that same 4-core
Hi Professor Eggert,
On Sun, Dec 5, 2010 at 11:01 PM, Paul Eggert wrote:
> On 12/05/2010 09:16 PM, Chen Guo wrote:
>> Before saying anything else, I should note that for mutexes, on 4
>> threads 20% of the time there's a segfault on a seemingly innocuous
>> line in queue_insert ():
>> node->queu
On 12/05/2010 09:16 PM, Chen Guo wrote:
> Before saying anything else, I should note that for mutexes, on 4
> threads 20% of the time there's a segfault on a seemingly innocuous
> line in queue_insert ():
> node->queued = true
It does sound like mutexes are the way to go, and that this bug
needs
Hi Professor Eggert,
On Fri, Dec 3, 2010 at 1:10 PM, Paul Eggert wrote:
> On 12/03/10 12:18, Chen Guo wrote:
> Either option (either switch to mutexes everywhere, or have the top-level
> merge go to memory) should work. Perhaps we should try both and benchmark
> them.
Test machine is 4 core
Paul Eggert wrote:
> On 11/29/2010 02:46 PM, Paul Eggert wrote:
>> My current guess, by the way,
>> is that it's not a bug that can be triggered: it's merely
>> useless code that is harmless and can safely be removed.
>
> I removed it as part of the following series of cleanup
> patches. These are
On 11/29/2010 02:46 PM, Paul Eggert wrote:
> My current guess, by the way,
> is that it's not a bug that can be triggered: it's merely
> useless code that is harmless and can safely be removed.
I removed it as part of the following series of cleanup
patches. These are intended merely to refactor
Thanks Jim, that helped a lot.
I'll try out Professor Eggert's suggestion, of switching to mutexes
only at the top level merge. Of the following approaches, which would
you guys consider better practice?
1) void pointer, cast as either mutex or spinlock in lock function
2) union of mutex and spin
Hi Professor Eggert,
On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert wrote:
> (for i in $(seq 12); do read line; echo $i; sleep .1; done
> cat > /dev/null) < fifo &
> (ulimit -t 1; ./sort in > fifo \
> || echo killed via $(env kill -l $(expr $? - 128)))
I ran this 10 times or so on an i7 and co
On 11/30/2010 04:19 PM, Chen Guo wrote:
> could you detail how you can trigger the divide-by-zero bug?
Invoke MAX_MERGE(total, level) with level == 15.
2 << level yields 65536, and 65536 * 65536 overflows to zero.
Jim Meyering wrote:
> Paul Eggert wrote:
>> Could you please try this little patch? It should fix your
>> problem. I came up with this fix in my sleep (literally!
>> I woke up this morning and the patch was in my head), but
>> haven't had time to look at the code in this area to see
>> if it's th
Hi guys,
Is something up with Savannah? I just tried a git clone and got
connection time out; I cant even reach git.sv.gnu.org via ping.
I'll try again at work tomorrow.
On 11/29/2010 08:32 PM, Chen Guo wrote:
> Hi guys,
> Is something up with Savannah? I just tried a git clone and got
> connection time out; I cant even reach git.sv.gnu.org via ping.
There was a breakin, which led to leaking of encrypted account
passwords, some of them discovered via a brute-f
On 11/30/10 13:41, Jim Meyering wrote:
> Is there anything you'd like to add?
No, thanks, that looks good. I have some other patches
to clean things up in this area, but they can wait.
I hate to tease, so here is a draft of the cleanup patches.
Most of this stuff is cleanup, but the first line of
On Tue, Nov 30, 2010 at 2:22 PM, Paul Eggert wrote:
Hi Professor Eggert,
> Anyway, perhaps Chen can review them (I don't have time
> to test them right now).
I'll look at it as soon as Savannah's back up; I never actually pulled
from coreutils after the original patch was submitted. Professor
Egge
On 11/29/10 16:34, Chen Guo wrote:
> The only way this would work is if, when a struct is locked via mutex the only
> threads trying to acquire the struct are trying to do so via mutex,
> and no threads
> are looking to lock via spinlock.
Yes, that's definitely the idea. Under either of my
propos
Hi all,
On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert wrote:
> entirely and use mutexes instead. Perhaps a better fix would be to
> use mutexes at the top level (where threads can write to a file and
> therefore can wait) and to use spin locks at lower levels (where
> threads are merely storing
On 11/28/10 23:14, DJ Lucas wrote:
> http://lists.gnu.org/archive/html/coreutils/2010-11/msg00124.html
Ah, sorry, I didn't understand that message and thought Pádraig
had handled it. On an 8-core RHEL 5.5 x86-64 host I reproduced
the problem with the stated test case:
(for i in $(seq 12); do r
On 29/11/10 07:14, DJ Lucas wrote:
> On 11/27/2010 08:18 PM, DJ Lucas wrote:
>
>>
>> lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ cat
>> /lfs-source-archive/cracklib-words-20080507 | sort -u > /dev/null; echo $?
>> 0
>> lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$
>>
>> Appears to work
On 11/27/2010 08:18 PM, DJ Lucas wrote:
>
> lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ cat
> /lfs-source-archive/cracklib-words-20080507 | sort -u > /dev/null; echo $?
> 0
> lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$
>
> Appears to work as expected. Thanks for jumping on this so
On 11/27/2010 06:57 PM, Paul Eggert wrote:
> Could you please try this little patch? It should fix your
> problem. I came up with this fix in my sleep (literally!
> I woke up this morning and the patch was in my head), but
> haven't had time to look at the code in this area to see
> if it's the b
On 28/11/10 00:57, Paul Eggert wrote:
> Could you please try this little patch? It should fix your
> problem. I came up with this fix in my sleep (literally!
> I woke up this morning and the patch was in my head), but
> haven't had time to look at the code in this area to see
> if it's the best f
Could you please try this little patch? It should fix your
problem. I came up with this fix in my sleep (literally!
I woke up this morning and the patch was in my head), but
haven't had time to look at the code in this area to see
if it's the best fix.
Clearly there's at least one more bug as no
Following up on my previous email, it appears to me that
the following line in mergelines_node is weird:
node->dest -= lo_orig - node->lo + hi_orig - node->hi;
Surely there should be a "*" in front of that line?
(This does not fix the bug; perhaps it is a different bug?)
On 11/26/2010 06:52 PM, Pádraig Brady wrote:
> Hmm, seems like multiple threads are racing to update the
> static "saved" variable in write_unique() ?
I don't think it's as simple as that. write_unique
is generating output, and when it is run it is supposed
to have exclusive access to the output
On 26/11/10 18:01, DJ Lucas wrote:
> Sent too bug-coreutils too (no bug id currently AFAICT).
>
> Bug only affects multi-byte locales. Take the following samples:
>
>
>
> bash-4.1# zcat cracklib-words-20080507.gz | sort -u --debug > file &&
> echo $?
> sort: using `en_US.UTF-8' sorting rules
On 11/26/2010 05:24 PM, Paul Eggert wrote:
> Thanks for the bug report. Unfortunately,
> I cannot reproduce the problem with coreutils 8.7, either on
> RHEL 5.5 x86-64 or on Ubuntu 10.10 x86.
>
> Which version of coreutils are you running?
8.7. Haven't tested on 8.6 or 8.5. 8.4 worked correc
Thanks for the bug report. Unfortunately,
I cannot reproduce the problem with coreutils 8.7, either on
RHEL 5.5 x86-64 or on Ubuntu 10.10 x86.
Which version of coreutils are you running? And on what
platform? How did you build it?
Can you reproduce it with --parallel=2? If not, which value
of
Sent too bug-coreutils too (no bug id currently AFAICT).
Bug only affects multi-byte locales. Take the following samples:
bash-4.1# zcat cracklib-words-20080507.gz | sort -u --debug > file &&
echo $?
sort: using `en_US.UTF-8' sorting rules
Segmentation fault
bash-4.1# echo $?
139
bash-4.1#
31 matches
Mail list logo