Hello,

We add recently enhanced our network with Infiniband modules on a six node
cluster.

We have install all OFED drivers related to our hardware

We have set network IP like following :
- eth : 192.168.1.0 / 255.255.255.0
- ib : 192.168.70.0 / 255.255.255.0

After first tests all seems good. IB interfaces ping each other, ssh and
other king of exchanges over IB works well.

Then we started to run our job thought openmpi (building with --with-openib
option) and our first results were very bad.

After investigations, our system have the following behaviour :
- job starts over ib network (few packet are sent)
- job switch to eth network (all next packet sent to these interfaces)

We never specified the IP Address of our eth interfaces.

We tried to launch our jobs with the following options :
- mpirun -hostfile hostfile.list -mca blt openib,self
/common_gfs2/script-test.sh
- mpirun -hostfile hostfile.list -mca blt openib,sm,self
/common_gfs2/script-test.sh
- mpirun -hostfile hostfile.list -mca blt openib,self -mca
btl_tcp_if_exclude lo,eth0,eth1,eth2 /common_gfs2/script-test.sh

The final behaviour remain the same : job is initiated over ib and runs over
eth.

We grab performance tests file (osu_bw and osu_latency) and we got not so
bad results (see attached files).

We had tried plenty of different things but we are stuck : we don't have any
error message...

Thanks per advance for your help.

Thierry.
# OSU MPI Latency Test (Version 2.0)
# Size          Latency (us) 
0               9.39
1               8.98
2               6.92
4               6.94
8               6.94
16              6.99
32              7.09
64              7.30
128             7.56
256             7.70
512             8.27
1024            9.38
2048            12.14
4096            14.51
8192            19.79
16384           43.00
32768           64.82
65536           104.82
131072          164.28
262144          293.86
524288          536.71
1048576         1049.46
2097152         2213.57
4194304         3686.72
# OSU MPI Bandwidth Test (Version 2.0)
# Size          Bandwidth (MB/s) 
1               0.180975
2               0.365537
4               0.730864
8               1.461231
16              2.920952
32              5.793988
64              11.254934
128             27.403607
256             55.811413
512             109.614427
1024            210.083847
2048            329.558204
4096            506.783138
8192            749.913297
16384           570.730147
32768           794.796561
65536           968.103658
131072          990.723946
262144          1009.216695
524288          1032.053241
1048576         1063.046034
2097152         1209.998818
4194304         1346.575306
                                                      HSN Codes
                                       ----------------------------------------
        Summary of Results              TCP=GigE GM/MX=Myrinet 
                                        IBV/VAPI/UDAPL/PSM=Infiniband
 ------------------------------------------------------------------------------
   Maximum Performance
   -------------------
      GigE :   57 usec     HSN-PSM  :    2 usec
      GigE :  102 MB/s     HSN-PSM  : 1134 MB/s

   Average Performance
   -------------------
      GigE :   57 usec     HSN-PSM  :    2 usec
      GigE :  101 MB/s     HSN-PSM  : 1124 MB/s

   Minimum Performance
   -------------------
      GigE :   57 usec     HSN-PSM  :    2 usec
      GigE :  100 MB/s     HSN-PSM  : 1115 MB/s

Reply via email to