FWIW,

pstack <pid>
Is a gdb wrapper that displays the stack trace.

PADB http://padb.pittman.org.uk is a great OSS that automatically collect the 
stack traces of all the MPI tasks (and can do some grouping similar to dshbak)

Cheers,

Gilles

Noam Bernstein <noam.bernst...@nrl.navy.mil> wrote:
>
>
>On Dec 1, 2017, at 8:10 AM, Götz Waschk <goetz.was...@gmail.com> wrote:
>
>
>On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk <goetz.was...@gmail.com> wrote:
>
>I have attached my slurm job script, it will simply do an mpirun
>IMB-MPI1 with 1024 processes. I haven't set any mca parameters, so for
>instance, vader is enabled.
>
>I have tested again, with
>   mpirun --mca btl "^vader" IMB-MPI1
>it made no difference.
>
>
>I’ve lost track of the earlier parts of this thread, but has anyone suggested 
>logging into the nodes it’s running on, doing “gdb -p PID” for each of the mpi 
>processes, and doing “where” to see where it’s hanging?
>
>
>I use this script (trace_all), which depends on a variable process that is a 
>grep regexp that matches the mpi executable:
>
>echo "where" > /tmp/gf
>
>
>pids=`ps aux | grep $process | grep -v grep | grep -v trace_all | awk '{print 
>\$2}'`
>
>for pid in $pids; do
>
>   echo $pid
>
>   prog=`ps auxw | grep " $pid " | grep -v grep | awk '{print $11}'`
>
>   gdb -x /tmp/gf -batch $prog $pid
>
>   echo ""
>
>done
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to