Hi Rolf,

Thank you very much for clarifying the problem. Is there any plan to support 
GPU RDMA for reduction in the future?

On Jun 17, 2015, at 1:38 PM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote:

> Hi Fei:
>  
> The reduction support for CUDA-aware in Open MPI is rather simple.  The GPU 
> buffers are copied into temporary host buffers and then the reduction is done 
> with the host buffers.  At the completion of the host reduction, the data is 
> copied back into the GPU buffers.  So, there is no use of CUDA IPC or GPU 
> Direct RDMA in the reduction.
>  
> Rolf
>  
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Fei Mao
> Sent: Wednesday, June 17, 2015 1:08 PM
> To: us...@open-mpi.org
> Subject: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5
>  
> Hi there,
>  
> I am doing benchmarks on a GPU cluster with two CPU sockets and 4 K80 GPUs 
> each node. Two K80 are connected with CPU socket 0, another two with socket 
> 1. An IB ConnectX-3 (FDR) is also under socket 1. We are using Linux’s OFED, 
> so I know there is no way to do GPU RDMA inter-node communication. I can do 
> intra-node IPC for MPI_Send and MPI_Receive with two K80 (4 GPUs in total) 
> which are connected under same socket (PCI-e switch). So I thought I could do 
> intra-node MPI_Reduce with IPC support in openmpi 1.8.5.
>  
> The benchmark I was using is osu-micro-benchmarks-4.4.1, and I got the same 
> results when I use two GPU under the same socket or different socket. The 
> result was the same even I used two GPUs in different nodes. 
>  
> Does MPI_Reduce use IPC for intra-node? Should I have to install Mellanox 
> OFED stack to support GPU RDMA reduction on GPUs even they are under with the 
> same PCI-e switch?
>  
> Thanks,
>  
> Fei Mao
> High Performance Computing Technical Consultant 
> SHARCNET | http://www.sharcnet.ca
> Compute/Calcul Canada | http://www.computecanada.ca
> This email message is for the sole use of the intended recipient(s) and may 
> contain confidential information.  Any unauthorized review, use, disclosure 
> or distribution is prohibited.  If you are not the intended recipient, please 
> contact the sender by reply email and destroy all copies of the original 
> message.
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27147.php

Reply via email to