Hello,

I'm troubleshooting a weird benchmark situation that having the sm btl 
enabled gives me worse results than disabling it.

For example, this on a single compute node with 2*Xeon5420, 8 GB RAM and a 
ConnectX gen2 IB card, with OFED 1.3 and OpenMPI 1.2.6 as software setup:

[cvsupport@extern src]$ mpirun -np 8 --mca btl self,sm,openib -hostfile \
hostfile ./IMB-MPI1.openmpi -npmin 8 PingPong

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         0.87         0.00
            1         1000         0.98         0.97
            2         1000         0.97         1.96
            4         1000         0.99         3.87
            8         1000         0.98         7.78
           16         1000         1.15        13.33
           32         1000         1.13        26.93
           64         1000         1.12        54.42
          128         1000         1.27        96.31
          256         1000         1.55       157.01
          512         1000         2.04       239.00
         1024         1000         2.75       355.62
         2048         1000         4.58       426.40
         4096         1000         7.12       548.93
         8192         1000        11.29       692.14
        16384         1000        18.83       829.75
        32768         1000        34.57       904.08
        65536          640        60.73      1029.22
       131072          320       112.06      1115.43
       262144          160       215.48      1160.21
       524288           80       423.34      1181.09
      1048576           40       858.18      1165.26
      2097152           20      1744.15      1146.69
      4194304           10      4055.60       986.29

Now, when disabling the sm btl, the score is:

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         1.08         0.00
            1         1000         1.42         0.67
            2         1000         1.19         1.60
            4         1000         1.21         3.14
            8         1000         1.61         4.75
           16         1000         1.30        11.70
           32         1000         1.32        23.13
           64         1000         1.61        37.97
          128         1000         2.80        43.53
          256         1000         3.21        76.05
          512         1000         4.06       120.15
         1024         1000         5.03       194.21
         2048         1000         7.15       273.05
         4096         1000        10.05       388.55
         8192         1000        16.02       487.76
        16384         1000        29.63       527.41
        32768         1000        51.23       610.03
        65536          640        92.26       677.43
       131072          320       141.03       886.36
       262144          160       233.62      1070.14
       524288           80       434.56      1150.60
      1048576           40       818.84      1221.24
      2097152           20      1403.75      1424.76
      4194304           10      2523.40      1585.16


Now, I do have fast Infiniband, but I can't believe that the openib btl is 
supposed to be faster than the sm btl. Does anyone know wether 
something can be tuned here?

Best regards,

Daniël Mantione

Reply via email to