Good Morning

We are running OpenMPI 4.0.2 on several CentOS 7.8 nodes with V-100s, driver 
version 418.87.01, persistent mode enabled, and nvidia peer memory module 
present.

We are seeing low penalty for running jobs across multiple GPU nodes.  I am 
relatively new to tuning OpenMPI so I wanted to ask the community if the chosen 
configuration could be optimized further.

We have FDR interconnect, plans for upgrading, with ConnectX-3 cards running 
latest OFED and firmware.

openmpi-mca-params.conf

btl=openib,self
btl_openib_if_include=mlx3_0,mlx4_0,mlx5_0
btl_openib_warn_nonexistent_if=0
btl_tcp_if_include=ib0
btl_openib_allow_ib=true
orte_base_help_aggregate=0
btl_openib_want_cuda_gdr=true
oob_tcp_if_include=ib0
#mpi_common_cuda_verbose=100
#opal_cuda_verbose=10
#btl_openib_verbose=true
orte_keep_fqdn_hostname=true
oob_tcp_if_include=ib0


Thank you!
Doug


--
Thanks,

Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit<https://scu.med.cornell.edu/>
Weill Cornell Medicine
E: d...@med.cornell.edu
O: 212-746-6305
F: 212-746-8690

Reply via email to