Hello,

I am trying to run OpenMPI on AWSs new p4d instances. These instances have
4x 100Gb/s network interfaces, each with their own ipv4 address.

I am primarily testing the bandwidth with the osu_micro_benchmarks test
suite. Specifically I am running the osu_bibw and osu_mbw_mr tests to
calculate the peak aggregate bandwidth I can achieve between two instances.

I have found that running the osu_biwb test can only obtain the achieved
throughput of one network interface (100 Gb/s).  This is the command I am
using:
/opt/amazon/openmpi/bin/mpirun -v -x FI_EFA_USE_DEVICE_RDMA=1 -x
FI_PROVIDER="efa" -np 2 -host host1,host2 --map-by node --mca
btl_baes_verbose 30 --mca btl tcp,self --mca btl_tcp_if_exclude lo,do\cker0
 ./osu_bw -m 40000000

As far as I understand it, openmpi should be detecting the four interfaces
and striping data across them, correct?

I have found that the osu_mbw_mr test can achieve 4x the bandwidth of a
single network interface, if the configuration is correct. For example, I
am using the following command:
/opt/amazon/openmpi/bin/mpirun -v -x FI_EFA_USE_DEVICE_RDMA=1 -x
FI_PROVIDER="efa" -np 8 -hostfile hostfile5 --map-by node --mca
btl_baes_verbose 30 --mca btl tcp,self --mca btl_tcp_if_exclude lo,d\ocker0
 ./osu_mbw_mr
This will run four pairs of send/recv calls across the different nodes.
hostfile5 contains all 8 local ipv4 addresses associated with the four
nodes. I believe this is why I am getting the expected performance.

So, now I want to runa real use case, but I can't use --map-by node. I want
to run two ranks per ipv4 address (interface) with the ranks ordered
sequentially according to the hostfile (the first 8 ranks will belong to
the first host, but the ranks will be divided among four ipv4 addresses to
utilize the full network bandwidth). But OpenMPI won't allow me to assign
slots=2 to each ipv4 address because they all belong to the same host.

Any recommendation would be greatly appreciated.

Thanks,
John

Reply via email to