Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Jeff Squyres
On Apr 14, 2011, at 3:13 PM, Shamis, Pavel wrote: >> That can easily be a run-time check during startup. > > It could be fixed. My point was that in the existing code, it's compile time > decision and not run time. I agree; I mentioned the same issue in my review, too. Some of the code can cl

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Shamis, Pavel
> >> Actually I'm not sure that it is good idea to enable CUDA by default, since >> it disables zero-copy protocol, that is critical for good performance. > > That can easily be a run-time check during startup. It could be fixed. My point was that in the existing code, it's compile time decisi

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Jeff Squyres
On Apr 14, 2011, at 12:41 PM, Brice Goglin wrote: > hwloc (since 1.1, on Linux) can already tell you which CPUs are close to a > CUDA device, see > https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cuda.h and > https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cudart

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Jeff Squyres
On Apr 14, 2011, at 12:37 PM, Brice Goglin wrote: > GPUDirect is only about using the same host buffer for DMA from/to both > the NIC and the GPU. Without GPUDirect, you have a host buffer for the > GPU and another one for IB (looks like some strange memory registration > problem to me...), and yo

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Jeff Squyres
On Apr 14, 2011, at 11:48 AM, Shamis, Pavel wrote: > Actually I'm not sure that it is good idea to enable CUDA by default, since > it disables zero-copy protocol, that is critical for good performance. That can easily be a run-time check during startup. -- Jeff Squyres jsquy...@cisco.com For c

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Brice Goglin
hwloc (since 1.1, on Linux) can already tell you which CPUs are close to a CUDA device, see https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cuda.h and https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cudart.h Do you need anything else ? Brice Le 14/04/2011 17:44,

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Brice Goglin
Le 14/04/2011 17:58, George Bosilca a écrit : > On Apr 13, 2011, at 20:07 , Ken Lloyd wrote: > > >> George, Yes. GPUDirect eliminated an additional (host) memory buffering step >> between the HCA and the GPU that took CPU cycles. >> > If this is the case then why do we need to use special

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread George Bosilca
On Apr 13, 2011, at 20:07 , Ken Lloyd wrote: > George, Yes. GPUDirect eliminated an additional (host) memory buffering step > between the HCA and the GPU that took CPU cycles. If this is the case then why do we need to use special memcpy functions to copy the data back into the host memory pri

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Shamis, Pavel
> >> By default, the code is disable and has to be configured into the library. >> --with-cuda(=DIR) Build cuda support, optionally adding DIR/include, >> DIR/lib, and DIR/lib64 >> --with-cuda-libdir=DIR Search for cuda libraries in DIR > > My

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Ken Lloyd
I'd suggest supporting CUDA device queries in carto and hwloc. Ken On Thu, 2011-04-14 at 11:25 -0400, Jeff Squyres wrote: > On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote: > > > By default, the code is disable and has to be configured into the library. > > --with-cuda(=DIR) Build

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Jeff Squyres
On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote: > An initial implementation can be viewed at: > https://bitbucket.org/rolfv/ompi-trunk-cuda-3 Random comments on the code... 1. I see changes like this: mca_btl_sm_la_LIBADD += \ $(top_ompi_builddir)/ompi/mca/common/cuda/libmca_common_cuda

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Jeff Squyres
On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote: > By default, the code is disable and has to be configured into the library. > --with-cuda(=DIR) Build cuda support, optionally adding DIR/include, > DIR/lib, and DIR/lib64 > --with-cuda-lib

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Shamis, Pavel
Hello Rolf, CUDA support is always welcome. Please see my comments bellow +#if OMPI_CUDA_SUPPORT +fl->fl_frag_block_alignment = 0; +fl->fl_flags = 0; +#endif [pasha] It seem that the "fl_flags" is a hack that allow you to do the second (cuda) registration in mpool_rdma: +#if OMPI_CUD

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-13 Thread Ken Lloyd
George, Yes. GPUDirect eliminated an additional (host) memory buffering step between the HCA and the GPU that took CPU cycles. I was never very comfortable with the kernel patch necessary, nor the patched OFED required to make it all work. Having said that, it did provide a ~14% improvement in th

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-13 Thread George Bosilca
On Apr 13, 2011, at 14:48 , Rolf vandeVaart wrote: > This work does not depend on GPU Direct. It is making use of the fact that > one can malloc memory, register it with IB, and register it with CUDA via the > new 4.0 API cuMemHostRegister API. Then one can copy device memory into this > mem

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-13 Thread Rolf vandeVaart
: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Brice Goglin Sent: Wednesday, April 13, 2011 1:00 PM To: de...@open-mpi.org Subject: Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly Hello Rolf, This "CUDA device memory" isn'

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-13 Thread Ken Lloyd
Rolf, I haven't had a chance to review the code yet, but how do these changes relate to CUDA 4.0 - especially the UVA and GPUDirect 2.0 implementation? Ken On Wed, 2011-04-13 at 09:47 -0700, Rolf vandeVaart wrote: > WHAT: Add support to send data directly from CUDA device memory via > MPI calls.

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-13 Thread Brice Goglin
Hello Rolf, This "CUDA device memory" isn't memory mapped in the host, right? Then what does its address look like ? When you say "when it is detected that a buffer is CUDA device memory", if the actual device and host address spaces are different, how do you know that device addresses and usual h

[OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-13 Thread Rolf vandeVaart
WHAT: Add support to send data directly from CUDA device memory via MPI calls. TIMEOUT: April 25, 2011 DETAILS: When programming in a mixed MPI and CUDA environment, one cannot currently send data directly from CUDA device memory. The programmer first has to move the data into host memory, and