Hello, I am trying to run OpenMPI on AWSs new p4d instances. These instances have 4x 100Gb/s network interfaces, each with their own ipv4 address.
I am primarily testing the bandwidth with the osu_micro_benchmarks test suite. Specifically I am running the osu_bibw and osu_mbw_mr tests to calculate the peak aggregate bandwidth I can achieve between two instances. I have found that running the osu_biwb test can only obtain the achieved throughput of one network interface (100 Gb/s). This is the command I am using: /opt/amazon/openmpi/bin/mpirun -v -x FI_EFA_USE_DEVICE_RDMA=1 -x FI_PROVIDER="efa" -np 2 -host host1,host2 --map-by node --mca btl_baes_verbose 30 --mca btl tcp,self --mca btl_tcp_if_exclude lo,do\cker0 ./osu_bw -m 40000000 As far as I understand it, openmpi should be detecting the four interfaces and striping data across them, correct? I have found that the osu_mbw_mr test can achieve 4x the bandwidth of a single network interface, if the configuration is correct. For example, I am using the following command: /opt/amazon/openmpi/bin/mpirun -v -x FI_EFA_USE_DEVICE_RDMA=1 -x FI_PROVIDER="efa" -np 8 -hostfile hostfile5 --map-by node --mca btl_baes_verbose 30 --mca btl tcp,self --mca btl_tcp_if_exclude lo,d\ocker0 ./osu_mbw_mr This will run four pairs of send/recv calls across the different nodes. hostfile5 contains all 8 local ipv4 addresses associated with the four nodes. I believe this is why I am getting the expected performance. So, now I want to runa real use case, but I can't use --map-by node. I want to run two ranks per ipv4 address (interface) with the ranks ordered sequentially according to the hostfile (the first 8 ranks will belong to the first host, but the ranks will be divided among four ipv4 addresses to utilize the full network bandwidth). But OpenMPI won't allow me to assign slots=2 to each ipv4 address because they all belong to the same host. Any recommendation would be greatly appreciated. Thanks, John