[OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5

Fei Mao Wed, 17 Jun 2015 13:08:04 -0400 (EDT)

Hi there,

I am doing benchmarks on a GPU cluster with two CPU sockets and 4 K80 GPUs each 
node. Two K80 are connected with CPU socket 0, another two with socket 1. An IB 
ConnectX-3 (FDR) is also under socket 1. We are using Linux’s OFED, so I know 
there is no way to do GPU RDMA inter-node communication. I can do intra-node 
IPC for MPI_Send and MPI_Receive with two K80 (4 GPUs in total) which are 
connected under same socket (PCI-e switch). So I thought I could do intra-node 
MPI_Reduce with IPC support in openmpi 1.8.5.


The benchmark I was using is osu-micro-benchmarks-4.4.1, and I got the same 
results when I use two GPU under the same socket or different socket. The 
result was the same even I used two GPUs in different nodes. 

Does MPI_Reduce use IPC for intra-node? Should I have to install Mellanox OFED 
stack to support GPU RDMA reduction on GPUs even they are under with the same 
PCI-e switch?

Thanks,

Fei Mao
High Performance Computing Technical Consultant 
SHARCNET | http://www.sharcnet.ca
Compute/Calcul Canada | http://www.computecanada.ca

[OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5

Reply via email to