Hello,

We were using OpenMPI for some testing, everything works fine but randomly, 
MPI_Ibcast()
takes long time to finish. We have a standalone program just to test it.  The 
following
is the profiling results of the simple test program on our cluster:

Ibcast 604 mb takes 103 ms
Ibcast 608 mb takes 106 ms
Ibcast 612 mb takes 105 ms
Ibcast 616 mb takes 105 ms
Ibcast 620 mb takes 107 ms
Ibcast 624 mb takes 107 ms
Ibcast 628 mb takes 108 ms
Ibcast 632 mb takes 110 ms
Ibcast 636 mb takes 110 ms
Ibcast 640 mb takes 7437 ms
Ibcast 644 mb takes 115 ms
Ibcast 648 mb takes 111 ms
Ibcast 652 mb takes 112 ms
Ibcast 656 mb takes 112 ms
Ibcast 660 mb takes 114 ms
Ibcast 664 mb takes 114 ms
Ibcast 668 mb takes 115 ms
Ibcast 672 mb takes 116 ms
Ibcast 676 mb takes 116 ms
Ibcast 680 mb takes 116 ms
Ibcast 684 mb takes 122 ms
Ibcast 688 mb takes 7385 ms
Ibcast 692 mb takes 8729 ms
Ibcast 696 mb takes 120 ms
Ibcast 700 mb takes 124 ms
Ibcast 704 mb takes 121 ms
Ibcast 708 mb takes 8240 ms
Ibcast 712 mb takes 122 ms
Ibcast 716 mb takes 123 ms
Ibcast 720 mb takes 123 ms
Ibcast 724 mb takes 124 ms
Ibcast 728 mb takes 125 ms
Ibcast 732 mb takes 125 ms
Ibcast 736 mb takes 126 ms

As you can see, Ibcast takes a long to finish and it's totally random.
The same program was compiled and tested with MVAPICH2-gdr but it went smoothly.
Both tests were running exclusively on our four nodes cluster without 
contention. Likewise, it doesn't matter
if I enable CUDA support or not.  The followings are the configuration of our 
server:

We have four nodes in this test, each with one K40 GPU and connected with 
mellanox IB.

Please find attached config details and some sample code…

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.

<<attachment: Ibcast_config_details.txt.zip>>

Attachment: Ibcast_SampleCode.cpp
Description: Ibcast_SampleCode.cpp

Reply via email to