Good Morning We are running OpenMPI 4.0.2 on several CentOS 7.8 nodes with V-100s, driver version 418.87.01, persistent mode enabled, and nvidia peer memory module present.
We are seeing low penalty for running jobs across multiple GPU nodes. I am relatively new to tuning OpenMPI so I wanted to ask the community if the chosen configuration could be optimized further. We have FDR interconnect, plans for upgrading, with ConnectX-3 cards running latest OFED and firmware. openmpi-mca-params.conf btl=openib,self btl_openib_if_include=mlx3_0,mlx4_0,mlx5_0 btl_openib_warn_nonexistent_if=0 btl_tcp_if_include=ib0 btl_openib_allow_ib=true orte_base_help_aggregate=0 btl_openib_want_cuda_gdr=true oob_tcp_if_include=ib0 #mpi_common_cuda_verbose=100 #opal_cuda_verbose=10 #btl_openib_verbose=true orte_keep_fqdn_hostname=true oob_tcp_if_include=ib0 Thank you! Doug -- Thanks, Douglas Duckworth, MSc, LFCS HPC System Administrator Scientific Computing Unit<https://scu.med.cornell.edu/> Weill Cornell Medicine E: d...@med.cornell.edu O: 212-746-6305 F: 212-746-8690