On Tue, May 19, 2009 at 3:29 PM, Peter Kjellstrom <c...@nsc.liu.se> wrote:
> On Tuesday 19 May 2009, Roman Martonak wrote:
> ...
>> openmpi-1.3.2                           time per one MD step is 3.66 s
>>    ELAPSED TIME :    0 HOURS  1 MINUTES 25.90 SECONDS
>>  = ALL TO ALL COMM           102033. BYTES               4221.  =
>>  = ALL TO ALL COMM             7.802  MB/S          55.200 SEC  =
> ...
>> mvapich-1.1.0                            time per one MD step is 2.55 s
>>    ELAPSED TIME :    0 HOURS  1 MINUTES  0.65 SECONDS
>>  = ALL TO ALL COMM           102033. BYTES               4221.  =
>>  = ALL TO ALL COMM            14.815  MB/S          29.070 SEC  =
> ...
>> Intel MPI 3.2.1.009                 time per one MD step is 1.58 s
>>    ELAPSED TIME :    0 HOURS  0 MINUTES 38.16 SECONDS
>>  = ALL TO ALL COMM           102033. BYTES               4221.  =
>>  = ALL TO ALL COMM            38.696  MB/S          11.130 SEC  =
> ...
>> Clearly the whole difference is basically in the ALL TO ALL COMM time.
>> Running on 1 blade (8 cores) all three MPI implementations have very
>> similar same time per step of about 8.6 s.
>
> My guess is that what you see is the difference in MPI_Alltoall performance
> for the different MPI-implementations (running in your env. on your hw.).
>
> You could write a trivial loop like this and try on the three MPIs:
>
>  MPI_init
>  for i in 1 to 4221
>   MPI_Alltoall(size=102033, ...)
>  MPI_finialize
>
> And time it to comfirm this.
>
>> For CPMD I found that using the keyword TASKGROUP
>> which introduces a different way of parallelization it is possible to
>> improve on the openmpi time substantially and lower the time from 3.66
>> s to 1.67 s, almost to the value found with Intel MPI.
>
> I guess this changed what kind of communication is done and you no longer have
> to do 4221x 100Kbytes alltoall that seems to hurt OpenMPI so much.

With TASKGROUP=2 the summary looks as follows

       CPU TIME :    0 HOURS  0 MINUTES 42.09 SECONDS
   ELAPSED TIME :    0 HOURS  0 MINUTES 44.01 SECONDS
 ***      CPMD| SIZE OF THE PROGRAM IS   73532/ 322740 kBYTES ***

 PROGRAM CPMD ENDED AT:   Tue May 19 11:16:18 2009

 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE                8585. BYTES              48447.  =
 = BROADCAST                  19063. BYTES                396.  =
 = GLOBAL SUMMATION          103463. BYTES                372.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM           231821. BYTES               4221.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE              193.459  MB/S           2.150 SEC  =
 = BROADCAST                  10.785  MB/S           0.700 SEC  =
 = GLOBAL SUMMATION          339.605  MB/S           0.680 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM            82.716  MB/S          11.830 SEC  =
 = SYNCHRONISATION                                   2.360 SEC  =
 ================================================================

Roman

Reply via email to