Let me clarify as that wasn’t very clear… if we enable, or disable, GDR it 
doesn’t make a difference. Seems to be in the base code,

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.


From: Rolf vandeVaart <rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>>
Reply-To: Open MPI Users <us...@open-mpi.org<mailto:us...@open-mpi.org>>
List-Post: users@lists.open-mpi.org
Date: Thursday, November 6, 2014 at 10:18 AM
To: Open MPI Users <us...@open-mpi.org<mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of 
MPI_Ibcast

The CUDA person is now responding.  I will try and reproduce.  I looked through 
the zip file but did not see the mpirun command.   Can this be reproduced with 
–np 4 running across four nodes?
Also, in your original message you wrote “Likewise, it doesn't matter if I 
enable CUDA support or not. “  Can you provide more detail about what that 
means?
Thanks

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, November 06, 2014 1:05 PM
To: Open MPI Users
Subject: Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of 
MPI_Ibcast

I was hoping our CUDA person would respond, but in the interim - I would 
suggest trying the nightly 1.8.4 tarball as we are getting ready to release it, 
and I know there were some CUDA-related patches since 1.8.1

http://www.open-mpi.org/nightly/v1.8/


On Nov 5, 2014, at 4:45 PM, Steven Eliuk 
<s.el...@samsung.com<mailto:s.el...@samsung.com>> wrote:

OpenMPI: 1.8.1 with CUDA RDMA…

Thanks sir and sorry for the late response,

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.


From: Ralph Castain <rhc.open...@gmail.com<mailto:rhc.open...@gmail.com>>
Reply-To: Open MPI Users <us...@open-mpi.org<mailto:us...@open-mpi.org>>
List-Post: users@lists.open-mpi.org
Date: Monday, November 3, 2014 at 10:02 AM
To: Open MPI Users <us...@open-mpi.org<mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of 
MPI_Ibcast

Which version of OMPI were you testing?

On Nov 3, 2014, at 9:14 AM, Steven Eliuk 
<s.el...@samsung.com<mailto:s.el...@samsung.com>> wrote:

Hello,

We were using OpenMPI for some testing, everything works fine but randomly, 
MPI_Ibcast()
takes long time to finish. We have a standalone program just to test it.  The 
following
is the profiling results of the simple test program on our cluster:

Ibcast 604 mb takes 103 ms
Ibcast 608 mb takes 106 ms
Ibcast 612 mb takes 105 ms
Ibcast 616 mb takes 105 ms
Ibcast 620 mb takes 107 ms
Ibcast 624 mb takes 107 ms
Ibcast 628 mb takes 108 ms
Ibcast 632 mb takes 110 ms
Ibcast 636 mb takes 110 ms
Ibcast 640 mb takes 7437 ms
Ibcast 644 mb takes 115 ms
Ibcast 648 mb takes 111 ms
Ibcast 652 mb takes 112 ms
Ibcast 656 mb takes 112 ms
Ibcast 660 mb takes 114 ms
Ibcast 664 mb takes 114 ms
Ibcast 668 mb takes 115 ms
Ibcast 672 mb takes 116 ms
Ibcast 676 mb takes 116 ms
Ibcast 680 mb takes 116 ms
Ibcast 684 mb takes 122 ms
Ibcast 688 mb takes 7385 ms
Ibcast 692 mb takes 8729 ms
Ibcast 696 mb takes 120 ms
Ibcast 700 mb takes 124 ms
Ibcast 704 mb takes 121 ms
Ibcast 708 mb takes 8240 ms
Ibcast 712 mb takes 122 ms
Ibcast 716 mb takes 123 ms
Ibcast 720 mb takes 123 ms
Ibcast 724 mb takes 124 ms
Ibcast 728 mb takes 125 ms
Ibcast 732 mb takes 125 ms
Ibcast 736 mb takes 126 ms

As you can see, Ibcast takes a long to finish and it's totally random.
The same program was compiled and tested with MVAPICH2-gdr but it went smoothly.
Both tests were running exclusively on our four nodes cluster without 
contention. Likewise, it doesn't matter
if I enable CUDA support or not.  The followings are the configuration of our 
server:

We have four nodes in this test, each with one K40 GPU and connected with 
mellanox IB.

Please find attached config details and some sample code…

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.

<Ibcast_config_details.txt.zip><Ibcast_SampleCode.cpp>_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25662.php

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25695.php

________________________________
This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.
________________________________

Reply via email to