George Bosilca wrote:

> Btw, can you run the Netpipe benchmark on this configuration please ?
> Once compiled with MPI support and once with TCP. This might give us
> more equitable details (same benchmark).

NPmpi and NPtcp belong to Netpipe but this doesn't mean that they do the same ;-). Anyway, here the results:

mpirun --hostfile my_hostfile_netpipe -mca btl_tcp_if_include eth2 -np 2 -nolocal NPmpi -l 20971520 -u 209715200
0: chel02
1: chel03
Now starting the main loop
  0: 20971517 bytes      3 times -->   6630.84 Mbps in   24129.67 usec
  1: 20971520 bytes      3 times -->   6631.67 Mbps in   24126.65 usec
  2: 20971523 bytes      3 times -->   6649.08 Mbps in   24063.47 usec
  3: 31457277 bytes      3 times -->   6680.40 Mbps in   35925.98 usec
  4: 31457280 bytes      3 times -->   6667.34 Mbps in   35996.36 usec
  5: 31457283 bytes      3 times -->   6667.53 Mbps in   35995.32 usec
  6: 41943037 bytes      3 times -->   6708.97 Mbps in   47697.35 usec
  7: 41943040 bytes      3 times -->   6700.24 Mbps in   47759.49 usec
  8: 41943043 bytes      3 times -->   6698.70 Mbps in   47770.50 usec
  9: 62914557 bytes      3 times -->   6724.57 Mbps in   71380.02 usec
 10: 62914560 bytes      3 times -->   6726.09 Mbps in   71363.85 usec
 11: 62914563 bytes      3 times -->   6728.26 Mbps in   71340.84 usec
 12: 83886077 bytes      3 times -->   6736.77 Mbps in   95000.98 usec
 13: 83886080 bytes      3 times -->   6741.62 Mbps in   94932.68 usec
 14: 83886083 bytes      3 times -->   6743.01 Mbps in   94913.16 usec
 15: 125829117 bytes      3 times -->   6765.21 Mbps in  141902.49 usec
 16: 125829120 bytes      3 times -->   6764.36 Mbps in  141920.33 usec
 17: 125829123 bytes      3 times -->   6765.18 Mbps in  141903.00 usec
 18: 167772157 bytes      3 times -->   6767.28 Mbps in  189145.53 usec
 19: 167772160 bytes      3 times -->   6775.90 Mbps in  188904.84 usec
 20: 167772163 bytes      3 times -->   6768.77 Mbps in  189103.80 usec

NPtcp -h 192.168.10.2 -l 20971520 -u 209715200
Send and receive buffers are 107374182 and 107374182 bytes
(A bug in Linux doubles the requested buffer sizes)
Now starting the main loop
  0: 20971517 bytes      3 times -->   8565.70 Mbps in   18679.14 usec
  1: 20971520 bytes      3 times -->   8561.11 Mbps in   18689.16 usec
  2: 20971523 bytes      3 times -->   8570.28 Mbps in   18669.17 usec
  3: 31457277 bytes      3 times -->   8554.48 Mbps in   28055.47 usec
  4: 31457280 bytes      3 times -->   8556.30 Mbps in   28049.51 usec
  5: 31457283 bytes      3 times -->   8566.58 Mbps in   28015.85 usec
  6: 41943037 bytes      3 times -->   8560.95 Mbps in   37379.03 usec
  7: 41943040 bytes      3 times -->   8554.36 Mbps in   37407.84 usec
  8: 41943043 bytes      3 times -->   8558.02 Mbps in   37391.82 usec
  9: 62914557 bytes      3 times -->   8546.83 Mbps in   56161.17 usec
 10: 62914560 bytes      3 times -->   8549.41 Mbps in   56144.20 usec
 11: 62914563 bytes      3 times -->   8561.18 Mbps in   56067.03 usec
 12: 83886077 bytes      3 times -->   8565.96 Mbps in   74714.34 usec
 13: 83886080 bytes      3 times -->   8563.17 Mbps in   74738.66 usec
 14: 83886083 bytes      3 times -->   8549.71 Mbps in   74856.32 usec
 15: 125829117 bytes      3 times -->   8580.90 Mbps in  111876.33 usec
 16: 125829120 bytes      3 times -->   8574.20 Mbps in  111963.83 usec
 17: 125829123 bytes      3 times -->   8572.41 Mbps in  111987.19 usec
 18: 167772157 bytes      3 times -->   8601.10 Mbps in  148818.17 usec
 19: 167772160 bytes      3 times -->   8602.33 Mbps in  148796.84 usec
 20: 167772163 bytes      3 times -->   8595.99 Mbps in  148906.67 usec

Let us be optimistic and say ~1075 MB/s with NPtcp and ~850 MB/s with NPmpi. Vaguely the same differences but lower values.

George Bosilca wrote:

> There are two parameters hat can slightly improve the
> behavior: btl_tcp_rdma_pipeline_send_length and
> btl_tcp_min_rdma_pipeline_size.

These two parameters doesn't exist. Here the result of ompi_info --param btl 
tcp:

MCA btl: parameter "btl_base_debug" (current value: "0")
        If btl_base_debug is 1 standard debug is output, if > 1 verbose debug 
is output
MCA btl: parameter "btl" (current value: <none>)
Default selection set of components for the btl framework (<none> means "use all components that can be found")
MCA btl: parameter "btl_base_verbose" (current value: "0")
        Verbosity level for the btl framework (0 = no verbosity)
MCA btl: parameter "btl_tcp_if_include" (current value: <none>)
MCA btl: parameter "btl_tcp_if_exclude" (current value: "lo")
MCA btl: parameter "btl_tcp_free_list_num" (current value: "8")
MCA btl: parameter "btl_tcp_free_list_max" (current value: "-1")
MCA btl: parameter "btl_tcp_free_list_inc" (current value: "32")
MCA btl: parameter "btl_tcp_sndbuf" (current value: "393216")
MCA btl: parameter "btl_tcp_rcvbuf" (current value: "393216")
MCA btl: parameter "btl_tcp_endpoint_cache" (current value: "30720")
MCA btl: parameter "btl_tcp_exclusivity" (current value: "0")
MCA btl: parameter "btl_tcp_eager_limit" (current value: "65536")
MCA btl: parameter "btl_tcp_min_send_size" (current value: "65536")
MCA btl: parameter "btl_tcp_max_send_size" (current value: "131072")
MCA btl: parameter "btl_tcp_min_rdma_size" (current value: "131072")
MCA btl: parameter "btl_tcp_max_rdma_size" (current value: "2147483647")
MCA btl: parameter "btl_tcp_flags" (current value: "122")
MCA btl: parameter "btl_tcp_priority" (current value: "0")
MCA btl: parameter "btl_base_warn_component_unused" (current value: "1")
        This parameter is used to turn on warning messages when certain NICs 
are not used

The buffer sizes are already "tuned". Btw, what is optimal for the send values ("btl_tcp_min_send_size" and "btl_tcp_max_send_size")? A high value (just few segmentations on MPI level) or a low value? We changed it both ways, but it had nearly no effects.


Jon Mason wrote:

> That script should optimally tune your NIC.  If you are still not
> satisfied with the performance, Chelsio should have people available to
> help.  Since the TOE module is not opensource, there is not much anyone
> else can do.  You can try tweaking any module parms that are exposed.
> Checkout the modinfo output for that module.

The NIC is well tuned, as the results on TCP-level show. There seems to be a 
bottleneck in Open MPI.


Jon Mason wrote:

> You can also try the new iWARP support in OMPI 1.3.  The perf for that
> should be much better.

Yes i will try it, but i can't offer a unstable version of Open MPI in a productive system. So still it's not released official the users have to work with 1.2.6


Steve Wise wrote:

> So OMPI experts, what is the overhead you see on other TCP links for
> OMPI BW tests vs native sockets TCP BW tests?

This is exactly what i need to know :)

Thanks a lot for the interest and the hints by now

Andy

--


Dresden University of Technology
Center for Information Services
and High Performance Computing (ZIH)
D-01062 Dresden
Germany

Phone:    (+49) 351/463-38783
Fax:      (+49) 351/463-38245

e-mail: andy.geo...@zih.tu-dresden.de
WWW:    http://www.tu-dresden.de/zih

Reply via email to