On Dec 30, 2005, at 4:15 AM, Graziano Giuliani wrote:
#0 0xb7ca2599 in orte_pls_rsh_launch (jobid=1) at pls_rsh_module.c:
716
716 if (mca_pls_rsh_component.debug) {
which means we have a memory corruption somewhere else...
Agreed.
Investigating from outside on what
Ok Brian,
for the build part, attached is my config.log.
About stacktrace, I have with my compile options from gdb:
#0 0xb7d105b9 in orte_pls_rsh_launch ()
from /home/cluster/openmpi/lib/openmpi/mca_pls_rsh.so
and recompiling with -g
#0 0xb7ca2599 in orte_pls_rsh_launch (jobid=1) at pls_
On Dec 28, 2005, at 4:50 AM, Graziano Giuliani wrote:
Hi all,
can confirm this bug also on Linux Debian testing with kernel
2.6.14 and
gcc (GCC) 4.0.3 20051201 (prerelease) (Debian 4.0.2-5) running WRF
atmospheric
model compiled with portland pgf90. For who cares about this, it
needs just
Hi all,
can confirm this bug also on Linux Debian testing with kernel 2.6.14 and
gcc (GCC) 4.0.3 20051201 (prerelease) (Debian 4.0.2-5) running WRF atmospheric
model compiled with portland pgf90. For who cares about this, it needs just a
little patch in the RSL layer of the model to convert fort
Yes it appears to be the exact same error.
Greg
On Dec 22, 2005, at 5:25 AM, Jeff Squyres wrote:
Blast! Is it still a segv in the rsh component?
(I should be able to try this myself on an FC4 machine in a day or
two)
On Dec 21, 2005, at 2:11 PM, Greg Watson wrote:
I just tried 1.0.2a1
Blast! Is it still a segv in the rsh component?
(I should be able to try this myself on an FC4 machine in a day or two)
On Dec 21, 2005, at 2:11 PM, Greg Watson wrote:
I just tried 1.0.2a1r8580 but the problem is still there...
Greg
On Dec 20, 2005, at 5:02 PM, Jeff Squyres wrote:
I thin
I just tried 1.0.2a1r8580 but the problem is still there...
Greg
On Dec 20, 2005, at 5:02 PM, Jeff Squyres wrote:
I think we found the problem and committed a fix this afternoon to
both the trunk and v1.0 branch. Anything after r8564 should have the
fix.
Greg -- could you try again?
On Dec
I think we found the problem and committed a fix this afternoon to
both the trunk and v1.0 branch. Anything after r8564 should have the
fix.
Greg -- could you try again?
On Dec 19, 2005, at 4:59 PM, Paul H. Hargrove wrote:
Jeff,
I have an FC4 x86 w/ OSCAR bits on it :-). Let me know
Jeff,
I have an FC4 x86 w/ OSCAR bits on it :-). Let me know if you want
access.
-Paul
Jeff Squyres wrote:
Yoinks. Let me try to scrounge up an FC4 box to reproduce this on.
If it really is an -O problem, this segv may just be the symptom, not
the cause (seems likely, because mca_rsh
Yoinks. Let me try to scrounge up an FC4 box to reproduce this on.
If it really is an -O problem, this segv may just be the symptom, not
the cause (seems likely, because mca_rsh_pls_component is a
statically-defined variable -- accessing a member on it should
definitely not cause a segv).
Sure seems like it:
(gdb) p *mca_pls_rsh_component.argv@4
$12 = {0x90e0428 "ssh", 0x90e0438 "-x", 0x0, 0x11 of bounds>}
(gdb) p mca_pls_rsh_component.argc
$13 = 2
(gdb) p local_exec_index
$14 = 3
Greg
On Dec 18, 2005, at 4:56 AM, Rainer Keller wrote:
Hello Greg,
I don't know, whether it's s
Hello Greg,
I don't know, whether it's segfaulting at that particular line, but could You
please print the argv, since I guess, that might be the local_exec_index
into the argv being wrong?
Thanks,
Rainer
On Saturday 17 December 2005 19:16, Greg Watson wrote:
> Here's the stacktrace:
>
> #0 0
Here's the stacktrace:
#0 0x00ae1fe8 in orte_pls_rsh_launch (jobid=1) at pls_rsh_module.c:714
714 if (mca_pls_rsh_component.debug) {
(gdb) where
#0 0x00ae1fe8 in orte_pls_rsh_launch (jobid=1) at pls_rsh_module.c:714
#1 0x00a29642 in orte_rmgr_urm_spawn ()
from /usr/l
On Dec 16, 2005, at 10:47 AM, Greg Watson wrote:
I finally worked out why I couldn't reproduce the problem. You're not
going to like it though.
You're right -- this kind of buglet is among the most un-fun. :-(
Here's the stacktracefrom the core file:
#0 0x00e93fe8 in orte_pls_rsh_launch (
Jeff,
I finally worked out why I couldn't reproduce the problem. You're not
going to like it though.
As before, this is running on FC4 and I'm using 1.0.1r8453 (the 1.0.1
release version).
First test:
$ ./configure --with-devel-headers --prefix=/usr/local/ompi
$ make
$ make install
$ mpi
On Dec 1, 2005, at 10:58 AM, Greg Watson wrote:
@#$%^& it! I can't get the problem to manifest for either branch now.
Well, that's good for me. :-)
FWIW, the problem existed on systems that could/would return different
addresses in different processes from mmap() for memory that was common
@#$%^& it! I can't get the problem to manifest for either branch now.
Greg
On 30/11/2005, at 2:12 PM, Jeff Squyres wrote:
On Nov 30, 2005, at 2:12 PM, Greg Watson wrote:
Fedora Core 4 on x86. I installed the overnight snapshot from trunk
first and immediately got the error. Then I tried 1.0.
On Nov 30, 2005, at 2:12 PM, Greg Watson wrote:
Fedora Core 4 on x86. I installed the overnight snapshot from trunk
first and immediately got the error. Then I tried 1.0.x and it
worked. Want debugging info?
Blah!
Yes, please send any debugging info that you have -- SVN r number and a
backtr
Fedora Core 4 on x86. I installed the overnight snapshot from trunk
first and immediately got the error. Then I tried 1.0.x and it
worked. Want debugging info?
Greg
On 30/11/2005, at 10:14 AM, Jeff Squyres wrote:
No, I was not aware of this -- I migrated all the changes from the
trunk to t
No, I was not aware of this -- I migrated all the changes from the
trunk to the v1.0 branch (not the other way around).
What kind of systems are you running into this on?
On Nov 30, 2005, at 10:37 AM, Greg Watson wrote:
You probably already know this, but this problem is still present in
the
You probably already know this, but this problem is still present in
the trunk. It appears to be fixed in the 1.0.x tree.
Greg
21 matches
Mail list logo