Hello, We were using OpenMPI for some testing, everything works fine but randomly, MPI_Ibcast() takes long time to finish. We have a standalone program just to test it. The following is the profiling results of the simple test program on our cluster:
Ibcast 604 mb takes 103 ms Ibcast 608 mb takes 106 ms Ibcast 612 mb takes 105 ms Ibcast 616 mb takes 105 ms Ibcast 620 mb takes 107 ms Ibcast 624 mb takes 107 ms Ibcast 628 mb takes 108 ms Ibcast 632 mb takes 110 ms Ibcast 636 mb takes 110 ms Ibcast 640 mb takes 7437 ms Ibcast 644 mb takes 115 ms Ibcast 648 mb takes 111 ms Ibcast 652 mb takes 112 ms Ibcast 656 mb takes 112 ms Ibcast 660 mb takes 114 ms Ibcast 664 mb takes 114 ms Ibcast 668 mb takes 115 ms Ibcast 672 mb takes 116 ms Ibcast 676 mb takes 116 ms Ibcast 680 mb takes 116 ms Ibcast 684 mb takes 122 ms Ibcast 688 mb takes 7385 ms Ibcast 692 mb takes 8729 ms Ibcast 696 mb takes 120 ms Ibcast 700 mb takes 124 ms Ibcast 704 mb takes 121 ms Ibcast 708 mb takes 8240 ms Ibcast 712 mb takes 122 ms Ibcast 716 mb takes 123 ms Ibcast 720 mb takes 123 ms Ibcast 724 mb takes 124 ms Ibcast 728 mb takes 125 ms Ibcast 732 mb takes 125 ms Ibcast 736 mb takes 126 ms As you can see, Ibcast takes a long to finish and it's totally random. The same program was compiled and tested with MVAPICH2-gdr but it went smoothly. Both tests were running exclusively on our four nodes cluster without contention. Likewise, it doesn't matter if I enable CUDA support or not. The followings are the configuration of our server: We have four nodes in this test, each with one K40 GPU and connected with mellanox IB. Please find attached config details and some sample code… Kindest Regards, — Steven Eliuk, Ph.D. Comp Sci, Advanced Software Platforms Lab, SRA - SV, Samsung Electronics, 1732 North First Street, San Jose, CA 95112, Work: +1 408-652-1976, Work: +1 408-544-5781 Wednesdays, Cell: +1 408-819-4407.
<<attachment: Ibcast_config_details.txt.zip>>
Ibcast_SampleCode.cpp
Description: Ibcast_SampleCode.cpp