> On Dec 1, 2017, at 8:10 AM, Götz Waschk <goetz.was...@gmail.com> wrote:
> 
> On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk <goetz.was...@gmail.com> wrote:
>> I have attached my slurm job script, it will simply do an mpirun
>> IMB-MPI1 with 1024 processes. I haven't set any mca parameters, so for
>> instance, vader is enabled.
> I have tested again, with
>    mpirun --mca btl "^vader" IMB-MPI1
> it made no difference.

I’ve lost track of the earlier parts of this thread, but has anyone suggested 
logging into the nodes it’s running on, doing “gdb -p PID” for each of the mpi 
processes, and doing “where” to see where it’s hanging?

I use this script (trace_all), which depends on a variable process that is a 
grep regexp that matches the mpi executable:
echo "where" > /tmp/gf

pids=`ps aux | grep $process | grep -v grep | grep -v trace_all | awk '{print 
\$2}'`
for pid in $pids; do
   echo $pid
   prog=`ps auxw | grep " $pid " | grep -v grep | awk '{print $11}'`
   gdb -x /tmp/gf -batch $prog $pid
   echo ""
done

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to