I feel a little funny posting this but I have observed this problem now over 
three different versions of OpenMPI (1.10.2, 2.0.3, 3.0.0) and have refrained 
from asking about it before now because we always had a work-around.  That may 
not be the case now and  feel like I’m missing something obvious.

I’ve tried to summarize our system configuration as succinctly as possible 
below but it is a pretty standard Linux cluster with an IB interconnect 
(mellanox).

In short, we run many MPI applications (LAMMPS, VASP, NAMD, AMBER, ENZO, etc) 
successfully.  However, the astrophysical galaxy modeling codes Arepo and Gizmo 
(both Gadget derivatives) seem to give us fits - deadlocking randomly after 
running for hours or days.  I’ve tracked this down to a deadlock with some 
processes in MPI_Waitall() and others in MPI_Sendrecv().  I’ve looked at the 
code where the processes deadlock and can’t see any obvious issue.  I also know 
that the same versions of the same codes are run on other, similar platforms at 
other sites (TACC, NASA, for example).  

While trying various things over the last few days I have learned that setting

         export 
OMPI_MCA_btl_openib_flags="send,fetching-atomics,need-ack,need-csum,hetero-rdma”

seems to avoid the deadlocks.  In other words, disabling RDMA read/write seems 
to avoid the deadlocks.  Perhaps some RDMA read/write tuning is in order but 
I’ve had no success with that so far.

There are a couple of MPI related ifdefs in the code with regard to 
MPI_IN_PLACE and async sendrecv().  I’ve experimented with both.  Prior to 
OpenMPI 3.0.0 the gizmo code would run without deadlocking if 
-DNO_ISEND_IRECV_IN_DOMAIN was used at build time.  Under OpenMPI 3.0.0 that is 
no longer the case.  

FWIW, I also know that gizmo runs (on our system) using intel mpi (5.1.1) but 
I’m trying to avoid making that generally available since every other app we 
have works just fine with OpenMPI.

Anyone else have experience with these codes using OpenMPI (or otherwise)?  Any 
comments or suggestions would be appreciated. 

Regards,

Charlie Taylor
UF Research Computing



Applications: Gadget derivatives gizmo and arepo
Problem:       Random Deadlocks in MPI_waitall, MPI_sendrecv
Platform:       RedHat EL7 (and RedHat EL6 previously)
Systems:      Dell SOS6320, Haswell (2 x CPU E5-2698 v3 @ 2.30GHz)
Interconnect: Mellanox ConnectX-3 FDR (OpenSM fabric manager)
IB Stack:       RedHat EL7.4 native
OpenMPI:     3.0.0 (currently - see configure options below, but the problem 
has been persistent across versions)
Compilers:   Intel Suite (various versions - 2016, 2017, 2018)

Build time configure options.
--------------------------------------
CFG_OPTS="$CFG_OPTS C=icc CXX=icpc FC=ifort FFLAGS=\"-O2 -g -warn -m64\" 
LDFLAGS=\"\" "
CFG_OPTS="$CFG_OPTS --enable-static"
CFG_OPTS="$CFG_OPTS --enable-orterun-prefix-by-default"
CFG_OPTS="$CFG_OPTS --with-slurm=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-pmix=/opt/pmix"
CFG_OPTS="$CFG_OPTS --with-libevent=external"
CFG_OPTS="$CFG_OPTS --with-hwloc=external"
CFG_OPTS="$CFG_OPTS --with-verbs=/usr"
CFG_OPTS="$CFG_OPTS --with-verbs-libdir=/usr/lib64"
CFG_OPTS="$CFG_OPTS --with-mxm=no"
CFG_OPTS="$CFG_OPTS --with-cuda=${HPC_CUDA_DIR}"
CFG_OPTS="$CFG_OPTS --enable-openib-udcm"
CFG_OPTS="$CFG_OPTS --enable-openib-rdmacm"

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to