I've run across an interesting issue for which I don't have a ready answer.
If an MPI process aborts, we automatically abort the entire job.
If an MPI process returns a non-zero exit status, indicating that there was
something abnormal about its termination, we ignore it and let the job
continu
George, Yes. GPUDirect eliminated an additional (host) memory buffering
step between the HCA and the GPU that took CPU cycles.
I was never very comfortable with the kernel patch necessary, nor the
patched OFED required to make it all work. Having said that, it did
provide a ~14% improvement in th
On Apr 13, 2011, at 14:48 , Rolf vandeVaart wrote:
> This work does not depend on GPU Direct. It is making use of the fact that
> one can malloc memory, register it with IB, and register it with CUDA via the
> new 4.0 API cuMemHostRegister API. Then one can copy device memory into this
> mem
[Answering both questions with this email]
These changes depend on new features in CUDA 4.0. With CUDA 4.0, there is the
concept of Unified Virtual Addresses, so the addresses do not overlap. They
are all unique within the process. There is an API in the CUDA 4.0 that one
can use to query wh
Rolf,
I haven't had a chance to review the code yet, but how do these changes
relate to CUDA 4.0 - especially the UVA and GPUDirect 2.0
implementation?
Ken
On Wed, 2011-04-13 at 09:47 -0700, Rolf vandeVaart wrote:
> WHAT: Add support to send data directly from CUDA device memory via
> MPI calls.
Hello Rolf,
This "CUDA device memory" isn't memory mapped in the host, right? Then
what does its address look like ? When you say "when it is detected that
a buffer is CUDA device memory", if the actual device and host address
spaces are different, how do you know that device addresses and usual
h
WHAT: Add support to send data directly from CUDA device memory via MPI calls.
TIMEOUT: April 25, 2011
DETAILS: When programming in a mixed MPI and CUDA environment, one cannot
currently send data directly from CUDA device memory. The programmer first has
to move the data into host memory, and
When the proc restarts, it calls orte_routed.init_routes. If you look in
routed cm, you should see a call to "register_sync" - this is where the proc
sends a message to the local daemon, allowing it to "learn" the port/address
where the proc resides.
I've done this. I had a problem because when i