FWIW, pstack <pid> Is a gdb wrapper that displays the stack trace.
PADB http://padb.pittman.org.uk is a great OSS that automatically collect the stack traces of all the MPI tasks (and can do some grouping similar to dshbak) Cheers, Gilles Noam Bernstein <noam.bernst...@nrl.navy.mil> wrote: > > >On Dec 1, 2017, at 8:10 AM, Götz Waschk <goetz.was...@gmail.com> wrote: > > >On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk <goetz.was...@gmail.com> wrote: > >I have attached my slurm job script, it will simply do an mpirun >IMB-MPI1 with 1024 processes. I haven't set any mca parameters, so for >instance, vader is enabled. > >I have tested again, with > mpirun --mca btl "^vader" IMB-MPI1 >it made no difference. > > >I’ve lost track of the earlier parts of this thread, but has anyone suggested >logging into the nodes it’s running on, doing “gdb -p PID” for each of the mpi >processes, and doing “where” to see where it’s hanging? > > >I use this script (trace_all), which depends on a variable process that is a >grep regexp that matches the mpi executable: > >echo "where" > /tmp/gf > > >pids=`ps aux | grep $process | grep -v grep | grep -v trace_all | awk '{print >\$2}'` > >for pid in $pids; do > > echo $pid > > prog=`ps auxw | grep " $pid " | grep -v grep | awk '{print $11}'` > > gdb -x /tmp/gf -batch $prog $pid > > echo "" > >done >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users