On Apr 14, 2011, at 3:13 PM, Shamis, Pavel wrote:
>> That can easily be a run-time check during startup.
>
> It could be fixed. My point was that in the existing code, it's compile time
> decision and not run time.
I agree; I mentioned the same issue in my review, too. Some of the code can
cl
>
>> Actually I'm not sure that it is good idea to enable CUDA by default, since
>> it disables zero-copy protocol, that is critical for good performance.
>
> That can easily be a run-time check during startup.
It could be fixed. My point was that in the existing code, it's compile time
decisi
On Apr 14, 2011, at 12:41 PM, Brice Goglin wrote:
> hwloc (since 1.1, on Linux) can already tell you which CPUs are close to a
> CUDA device, see
> https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cuda.h and
> https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cudart
On Apr 14, 2011, at 12:37 PM, Brice Goglin wrote:
> GPUDirect is only about using the same host buffer for DMA from/to both
> the NIC and the GPU. Without GPUDirect, you have a host buffer for the
> GPU and another one for IB (looks like some strange memory registration
> problem to me...), and yo
On Apr 14, 2011, at 11:48 AM, Shamis, Pavel wrote:
> Actually I'm not sure that it is good idea to enable CUDA by default, since
> it disables zero-copy protocol, that is critical for good performance.
That can easily be a run-time check during startup.
--
Jeff Squyres
jsquy...@cisco.com
For c
hwloc (since 1.1, on Linux) can already tell you which CPUs are close to
a CUDA device, see
https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cuda.h
and https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cudart.h
Do you need anything else ?
Brice
Le 14/04/2011 17:44,
Le 14/04/2011 17:58, George Bosilca a écrit :
> On Apr 13, 2011, at 20:07 , Ken Lloyd wrote:
>
>
>> George, Yes. GPUDirect eliminated an additional (host) memory buffering step
>> between the HCA and the GPU that took CPU cycles.
>>
> If this is the case then why do we need to use special
On Apr 13, 2011, at 20:07 , Ken Lloyd wrote:
> George, Yes. GPUDirect eliminated an additional (host) memory buffering step
> between the HCA and the GPU that took CPU cycles.
If this is the case then why do we need to use special memcpy functions to copy
the data back into the host memory pri
>
>> By default, the code is disable and has to be configured into the library.
>> --with-cuda(=DIR) Build cuda support, optionally adding DIR/include,
>> DIR/lib, and DIR/lib64
>> --with-cuda-libdir=DIR Search for cuda libraries in DIR
>
> My
I'd suggest supporting CUDA device queries in carto and hwloc.
Ken
On Thu, 2011-04-14 at 11:25 -0400, Jeff Squyres wrote:
> On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:
>
> > By default, the code is disable and has to be configured into the library.
> > --with-cuda(=DIR) Build
On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:
> An initial implementation can be viewed at:
> https://bitbucket.org/rolfv/ompi-trunk-cuda-3
Random comments on the code...
1. I see changes like this:
mca_btl_sm_la_LIBADD += \
$(top_ompi_builddir)/ompi/mca/common/cuda/libmca_common_cuda
On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:
> By default, the code is disable and has to be configured into the library.
> --with-cuda(=DIR) Build cuda support, optionally adding DIR/include,
> DIR/lib, and DIR/lib64
> --with-cuda-lib
Hello Rolf,
CUDA support is always welcome.
Please see my comments bellow
+#if OMPI_CUDA_SUPPORT
+fl->fl_frag_block_alignment = 0;
+fl->fl_flags = 0;
+#endif
[pasha] It seem that the "fl_flags" is a hack that allow you to do the second
(cuda) registration in
mpool_rdma:
+#if OMPI_CUD
George, Yes. GPUDirect eliminated an additional (host) memory buffering
step between the HCA and the GPU that took CPU cycles.
I was never very comfortable with the kernel patch necessary, nor the
patched OFED required to make it all work. Having said that, it did
provide a ~14% improvement in th
On Apr 13, 2011, at 14:48 , Rolf vandeVaart wrote:
> This work does not depend on GPU Direct. It is making use of the fact that
> one can malloc memory, register it with IB, and register it with CUDA via the
> new 4.0 API cuMemHostRegister API. Then one can copy device memory into this
> mem
: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf
Of Brice Goglin
Sent: Wednesday, April 13, 2011 1:00 PM
To: de...@open-mpi.org
Subject: Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory
directly
Hello Rolf,
This "CUDA device memory" isn'
Rolf,
I haven't had a chance to review the code yet, but how do these changes
relate to CUDA 4.0 - especially the UVA and GPUDirect 2.0
implementation?
Ken
On Wed, 2011-04-13 at 09:47 -0700, Rolf vandeVaart wrote:
> WHAT: Add support to send data directly from CUDA device memory via
> MPI calls.
Hello Rolf,
This "CUDA device memory" isn't memory mapped in the host, right? Then
what does its address look like ? When you say "when it is detected that
a buffer is CUDA device memory", if the actual device and host address
spaces are different, how do you know that device addresses and usual
h
WHAT: Add support to send data directly from CUDA device memory via MPI calls.
TIMEOUT: April 25, 2011
DETAILS: When programming in a mixed MPI and CUDA environment, one cannot
currently send data directly from CUDA device memory. The programmer first has
to move the data into host memory, and
19 matches
Mail list logo