Not really. This is the backtrace of the process that get killed because mpirun detect that the other one died ... What I need it's the backtrace on the process which generate the segfault. Second, in order to understand the backtrace, it's better to have run debug version of Open MPI. Without the debug version we only see the address where the fault occur without having access to the line number ...

  Thanks,
    george.

On Mon, 8 Jan 2007, Grobe, Gary L. \(JSC-EV\)[ESCG] wrote:

PS: Is there any way you can attach to the processes with gdb ? I
would like to see the backtrace as showed by gdb in order
to be able
to figure out what's wrong there.


I found out that all processes on the 2nd node crash so I just put a 30
second wait before MPI_Init in order to attach gdb and go from there.

The code in cpi starts off as follows (in order to show where the
SIGTERM below is coming from).

   MPI_Init(&argc,&argv);
   MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
   MPI_Comm_rank(MPI_COMM_WORLD,&myid);
   MPI_Get_processor_name(processor_name,&namelen);

---

Attaching to process 11856
Reading symbols from /home/ggrobe/Projects/ompi/cpi/cpi...done.
Using host libthread_db library "/lib/libthread_db.so.1".
Reading symbols from
/usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0...done.
Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0
Reading symbols from
/usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0...done.
Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0
Reading symbols from
/usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0...done.
Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib64/libutil.so.1...done.
Loaded symbols for /lib/libutil.so.1
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 46974166086512 (LWP 11856)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00002ab90661e880 in nanosleep () from /lib/libc.so.6
(gdb) break MPI_Init
Breakpoint 1 at 0x2ab905c0c880
(gdb) break MPI_Comm_size
Breakpoint 2 at 0x2ab905c01af0
(gdb) continue
Continuing.
[Switching to Thread 46974166086512 (LWP 11856)]

Breakpoint 1, 0x00002ab905c0c880 in PMPI_Init ()
  from /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0
(gdb) n
Single stepping until exit from function PMPI_Init,
which has no line number information.
[New Thread 1082132816 (LWP 11862)]

Program received signal SIGTERM, Terminated.
0x00002ab906643f47 in ioctl () from /lib/libc.so.6
(gdb) backtrace
#0  0x00002ab906643f47 in ioctl () from /lib/libc.so.6
Cannot access memory at address 0x7fffa50102f8
---

Does this help in anyway?

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


"We must accept finite disappointment, but we must never lose infinite
hope."
                                  Martin Luther King

Reply via email to