Ramps up over time, we had a bunch of locked up nodes over the weekend and have 
traced it back to this.

Let me see if I can share more details,

I will review with everyone tomorrow and get back to you,



Rolf vandeVaart <rvandeva...@nvidia.com> wrote:


Hi Steven,
Thanks for the report.  Very little has changed between 1.8.5 and 1.8.6 within 
the CUDA-aware specific code so I am perplexed.  Also interesting that you do 
not see the issue with 1.8.5 and CUDA 7.0.
You mentioned that it is hard to share the code on this but maybe you could 
share how you observed the behavior.  Does the code need to run for a while to 
see this?
Any suggestions on how I could reproduce this?

Thanks,
Rolf


From: Steven Eliuk [mailto:s.el...@samsung.com]
Sent: Tuesday, June 30, 2015 6:05 PM
To: Rolf vandeVaart
Cc: Open MPI Users
Subject: 1.8.6 w/ CUDA 7.0 & GDR Huge Memory Leak

Hi All,

Looks like we have found a large memory leak,

Very difficult to share code on this but here are some details,

1.8.5 w/ Cuda 7.0 — no memory leak
1.8.5 w/ cuda 6.5 — no memory leak
1.8.6 w/ cuda 7.0 — large memory leak
1.8.5 w/ cuda 6.5 — no memory leak
mvapich2 2.1 GDR — no issue on either flavor of CUDA.

We have a relatively basic program that reproduces the error and have even 
narrowed it back to a single machine w/ multiple gpus and only two slaves. 
Looks like something in the IPC within a single node,

We don’t have many free cycles at the moment but less us know if we can help w/ 
something basic,

Heres our config flag for 1.8.5,

./configure FC=gfortran --without-mx --with-openib=/usr 
--with-openib-libdir=/usr/lib64/ --enable-openib-rdmacm --without-psm 
--with-cuda=/cm/shared/apps/cuda70/toolkit/current 
--prefix=/cm/shared/OpenMPI_1_8_5_CUDA70

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Project Lead,
Computing Science Innovation Center,
SRA - SV,
Samsung Electronics,
665 Clyde Avenue,
Mountain View, CA 94043,
Work: +1 650-623-2986,
Cell: +1 408-819-4407.

________________________________
This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.
________________________________

Reply via email to