George --
Unfortunately, this didn't automatically create CMRs (I'm not sure why). :-(
Begin forwarded message:
> From: bosi...@osl.iu.edu
> Date: April 14, 2011 5:50:07 PM EDT
> To: svn-f...@open-mpi.org
> Subject: [OMPI svn-full] svn:open-mpi r24617
> Reply-To: de...@open-mpi.org
>
> Author
Interesting, this issue exists in 2 out of 3 functions defined in the
ompi_datatype_create_indexed.c file. Based on your patch I create one that
fixes all the issues with the indexed type creation. Attached is the patch.
I'll push it in the trunk and create CMRs.
Thanks,
george.
Index: o
On Apr 14, 2011, at 3:13 PM, Shamis, Pavel wrote:
>> That can easily be a run-time check during startup.
>
> It could be fixed. My point was that in the existing code, it's compile time
> decision and not run time.
I agree; I mentioned the same issue in my review, too. Some of the code can
cl
>
>> Actually I'm not sure that it is good idea to enable CUDA by default, since
>> it disables zero-copy protocol, that is critical for good performance.
>
> That can easily be a run-time check during startup.
It could be fixed. My point was that in the existing code, it's compile time
decisi
On Apr 14, 2011, at 12:41 PM, Brice Goglin wrote:
> hwloc (since 1.1, on Linux) can already tell you which CPUs are close to a
> CUDA device, see
> https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cuda.h and
> https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cudart
On Apr 14, 2011, at 12:37 PM, Brice Goglin wrote:
> GPUDirect is only about using the same host buffer for DMA from/to both
> the NIC and the GPU. Without GPUDirect, you have a host buffer for the
> GPU and another one for IB (looks like some strange memory registration
> problem to me...), and yo
On Apr 14, 2011, at 11:48 AM, Shamis, Pavel wrote:
> Actually I'm not sure that it is good idea to enable CUDA by default, since
> it disables zero-copy protocol, that is critical for good performance.
That can easily be a run-time check during startup.
--
Jeff Squyres
jsquy...@cisco.com
For c
hwloc (since 1.1, on Linux) can already tell you which CPUs are close to
a CUDA device, see
https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cuda.h
and https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cudart.h
Do you need anything else ?
Brice
Le 14/04/2011 17:44,
Le 14/04/2011 17:58, George Bosilca a écrit :
> On Apr 13, 2011, at 20:07 , Ken Lloyd wrote:
>
>
>> George, Yes. GPUDirect eliminated an additional (host) memory buffering step
>> between the HCA and the GPU that took CPU cycles.
>>
> If this is the case then why do we need to use special
On Apr 13, 2011, at 20:07 , Ken Lloyd wrote:
> George, Yes. GPUDirect eliminated an additional (host) memory buffering step
> between the HCA and the GPU that took CPU cycles.
If this is the case then why do we need to use special memcpy functions to copy
the data back into the host memory pri
>
>> By default, the code is disable and has to be configured into the library.
>> --with-cuda(=DIR) Build cuda support, optionally adding DIR/include,
>> DIR/lib, and DIR/lib64
>> --with-cuda-libdir=DIR Search for cuda libraries in DIR
>
> My
I'd suggest supporting CUDA device queries in carto and hwloc.
Ken
On Thu, 2011-04-14 at 11:25 -0400, Jeff Squyres wrote:
> On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:
>
> > By default, the code is disable and has to be configured into the library.
> > --with-cuda(=DIR) Build
On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:
> An initial implementation can be viewed at:
> https://bitbucket.org/rolfv/ompi-trunk-cuda-3
Random comments on the code...
1. I see changes like this:
mca_btl_sm_la_LIBADD += \
$(top_ompi_builddir)/ompi/mca/common/cuda/libmca_common_cuda
On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:
> By default, the code is disable and has to be configured into the library.
> --with-cuda(=DIR) Build cuda support, optionally adding DIR/include,
> DIR/lib, and DIR/lib64
> --with-cuda-lib
That looks reasonable to me, but I'd also re-indent the body of the else{}
(i.e., remove 4 spaces from each).
George?
On Apr 14, 2011, at 10:48 AM, Pascal Deveze wrote:
> Calling MPI_Type_create_hindexed(int count, int array_of_blocklengths[],
> MPI_Aint array_of_displacements[], MPI
Calling MPI_Type_create_hindexed(int count, int array_of_blocklengths[],
MPI_Aint array_of_displacements[], MPI_Datatype oldtype,
MPI_Datatype *newtype)
with a count parameter of 1 causes a loss of memory detected by valgrind :
==2053== 576 (448 direct, 128 indirect) bytes i
On Apr 14 2011, Jeff Squyres wrote:
I think Ralph's point is that OMPI is providing the run-time environment
for the application, and it would probably behoove us to support both
kinds of behaviors since there are obviously people in both camps out
there.
It's pretty easy to add a non-defaul
Hello Rolf,
CUDA support is always welcome.
Please see my comments bellow
+#if OMPI_CUDA_SUPPORT
+fl->fl_frag_block_alignment = 0;
+fl->fl_flags = 0;
+#endif
[pasha] It seem that the "fl_flags" is a hack that allow you to do the second
(cuda) registration in
mpool_rdma:
+#if OMPI_CUD
I think Ralph's point is that OMPI is providing the run-time environment for
the application, and it would probably behoove us to support both kinds of
behaviors since there are obviously people in both camps out there.
It's pretty easy to add a non-default MCA param / orterun CLI option to sa
Point well made, Nick. In other words, irrespective of OS or language,
are we citing the need for "application correcting code" from OpenMPI,
(relocate a/o retry) similar to ECC in memory?
Ken
On Thu, 2011-04-14 at 14:31 +0100, N.M. Maclaren wrote:
> On Apr 14 2011, Ralph Castain wrote:
> >>
>
On Apr 14 2011, Ralph Castain wrote:
... It's hopeless, and whatever you do will be wrong for many
people. ...
I think that sums it up pretty well. :-)
It does seem a little strange that the scenario you describe somewhat
implies that one process is calling MPI_Finalize lng before th
On Apr 14, 2011, at 9:13 AM, Ralph Castain wrote:
> I figure this last is the best option. My point was just that we abort the
> job if someone calls "abort". However, if they indicate their program is
> exiting with "something is wrong", we ignore it.
Another option for the user is to kill(get
On Apr 14, 2011, at 5:33 AM, Jeff Squyres wrote:
> On Apr 14, 2011, at 4:02 AM, N.M. Maclaren wrote:
>
>> ... It's hopeless, and whatever you do will be wrong for many
>> people. ...
>
> I think that sums it up pretty well. :-)
>
> It does seem a little strange that the scenario you describ
On Apr 14, 2011, at 4:02 AM, N.M. Maclaren wrote:
> ... It's hopeless, and whatever you do will be wrong for many
> people. ...
I think that sums it up pretty well. :-)
It does seem a little strange that the scenario you describe somewhat implies
that one process is calling MPI_Finalize looo
On Apr 14 2011, Ralph Castain wrote:
I've run across an interesting issue for which I don't have a ready answer.
If an MPI process aborts, we automatically abort the entire job.
If an MPI process returns a non-zero exit status, indicating that there
was something abnormal about its terminatio
25 matches
Mail list logo