George Bosilca wrote:
> Btw, can you run the Netpipe benchmark on this configuration please ?
> Once compiled with MPI support and once with TCP. This might give us
> more equitable details (same benchmark).
NPmpi and NPtcp belong to Netpipe but this doesn't mean that they do the same ;-). Anyway, here the
results:
mpirun --hostfile my_hostfile_netpipe -mca btl_tcp_if_include eth2 -np 2 -nolocal NPmpi -l 20971520
-u 209715200
0: chel02
1: chel03
Now starting the main loop
0: 20971517 bytes 3 times --> 6630.84 Mbps in 24129.67 usec
1: 20971520 bytes 3 times --> 6631.67 Mbps in 24126.65 usec
2: 20971523 bytes 3 times --> 6649.08 Mbps in 24063.47 usec
3: 31457277 bytes 3 times --> 6680.40 Mbps in 35925.98 usec
4: 31457280 bytes 3 times --> 6667.34 Mbps in 35996.36 usec
5: 31457283 bytes 3 times --> 6667.53 Mbps in 35995.32 usec
6: 41943037 bytes 3 times --> 6708.97 Mbps in 47697.35 usec
7: 41943040 bytes 3 times --> 6700.24 Mbps in 47759.49 usec
8: 41943043 bytes 3 times --> 6698.70 Mbps in 47770.50 usec
9: 62914557 bytes 3 times --> 6724.57 Mbps in 71380.02 usec
10: 62914560 bytes 3 times --> 6726.09 Mbps in 71363.85 usec
11: 62914563 bytes 3 times --> 6728.26 Mbps in 71340.84 usec
12: 83886077 bytes 3 times --> 6736.77 Mbps in 95000.98 usec
13: 83886080 bytes 3 times --> 6741.62 Mbps in 94932.68 usec
14: 83886083 bytes 3 times --> 6743.01 Mbps in 94913.16 usec
15: 125829117 bytes 3 times --> 6765.21 Mbps in 141902.49 usec
16: 125829120 bytes 3 times --> 6764.36 Mbps in 141920.33 usec
17: 125829123 bytes 3 times --> 6765.18 Mbps in 141903.00 usec
18: 167772157 bytes 3 times --> 6767.28 Mbps in 189145.53 usec
19: 167772160 bytes 3 times --> 6775.90 Mbps in 188904.84 usec
20: 167772163 bytes 3 times --> 6768.77 Mbps in 189103.80 usec
NPtcp -h 192.168.10.2 -l 20971520 -u 209715200
Send and receive buffers are 107374182 and 107374182 bytes
(A bug in Linux doubles the requested buffer sizes)
Now starting the main loop
0: 20971517 bytes 3 times --> 8565.70 Mbps in 18679.14 usec
1: 20971520 bytes 3 times --> 8561.11 Mbps in 18689.16 usec
2: 20971523 bytes 3 times --> 8570.28 Mbps in 18669.17 usec
3: 31457277 bytes 3 times --> 8554.48 Mbps in 28055.47 usec
4: 31457280 bytes 3 times --> 8556.30 Mbps in 28049.51 usec
5: 31457283 bytes 3 times --> 8566.58 Mbps in 28015.85 usec
6: 41943037 bytes 3 times --> 8560.95 Mbps in 37379.03 usec
7: 41943040 bytes 3 times --> 8554.36 Mbps in 37407.84 usec
8: 41943043 bytes 3 times --> 8558.02 Mbps in 37391.82 usec
9: 62914557 bytes 3 times --> 8546.83 Mbps in 56161.17 usec
10: 62914560 bytes 3 times --> 8549.41 Mbps in 56144.20 usec
11: 62914563 bytes 3 times --> 8561.18 Mbps in 56067.03 usec
12: 83886077 bytes 3 times --> 8565.96 Mbps in 74714.34 usec
13: 83886080 bytes 3 times --> 8563.17 Mbps in 74738.66 usec
14: 83886083 bytes 3 times --> 8549.71 Mbps in 74856.32 usec
15: 125829117 bytes 3 times --> 8580.90 Mbps in 111876.33 usec
16: 125829120 bytes 3 times --> 8574.20 Mbps in 111963.83 usec
17: 125829123 bytes 3 times --> 8572.41 Mbps in 111987.19 usec
18: 167772157 bytes 3 times --> 8601.10 Mbps in 148818.17 usec
19: 167772160 bytes 3 times --> 8602.33 Mbps in 148796.84 usec
20: 167772163 bytes 3 times --> 8595.99 Mbps in 148906.67 usec
Let us be optimistic and say ~1075 MB/s with NPtcp and ~850 MB/s with NPmpi. Vaguely the same
differences but lower values.
George Bosilca wrote:
> There are two parameters hat can slightly improve the
> behavior: btl_tcp_rdma_pipeline_send_length and
> btl_tcp_min_rdma_pipeline_size.
These two parameters doesn't exist. Here the result of ompi_info --param btl
tcp:
MCA btl: parameter "btl_base_debug" (current value: "0")
If btl_base_debug is 1 standard debug is output, if > 1 verbose debug
is output
MCA btl: parameter "btl" (current value: <none>)
Default selection set of components for the btl framework (<none> means "use all components that
can be found")
MCA btl: parameter "btl_base_verbose" (current value: "0")
Verbosity level for the btl framework (0 = no verbosity)
MCA btl: parameter "btl_tcp_if_include" (current value: <none>)
MCA btl: parameter "btl_tcp_if_exclude" (current value: "lo")
MCA btl: parameter "btl_tcp_free_list_num" (current value: "8")
MCA btl: parameter "btl_tcp_free_list_max" (current value: "-1")
MCA btl: parameter "btl_tcp_free_list_inc" (current value: "32")
MCA btl: parameter "btl_tcp_sndbuf" (current value: "393216")
MCA btl: parameter "btl_tcp_rcvbuf" (current value: "393216")
MCA btl: parameter "btl_tcp_endpoint_cache" (current value: "30720")
MCA btl: parameter "btl_tcp_exclusivity" (current value: "0")
MCA btl: parameter "btl_tcp_eager_limit" (current value: "65536")
MCA btl: parameter "btl_tcp_min_send_size" (current value: "65536")
MCA btl: parameter "btl_tcp_max_send_size" (current value: "131072")
MCA btl: parameter "btl_tcp_min_rdma_size" (current value: "131072")
MCA btl: parameter "btl_tcp_max_rdma_size" (current value: "2147483647")
MCA btl: parameter "btl_tcp_flags" (current value: "122")
MCA btl: parameter "btl_tcp_priority" (current value: "0")
MCA btl: parameter "btl_base_warn_component_unused" (current value: "1")
This parameter is used to turn on warning messages when certain NICs
are not used
The buffer sizes are already "tuned". Btw, what is optimal for the send values
("btl_tcp_min_send_size" and "btl_tcp_max_send_size")? A high value (just few segmentations on MPI
level) or a low value? We changed it both ways, but it had nearly no effects.
Jon Mason wrote:
> That script should optimally tune your NIC. If you are still not
> satisfied with the performance, Chelsio should have people available to
> help. Since the TOE module is not opensource, there is not much anyone
> else can do. You can try tweaking any module parms that are exposed.
> Checkout the modinfo output for that module.
The NIC is well tuned, as the results on TCP-level show. There seems to be a
bottleneck in Open MPI.
Jon Mason wrote:
> You can also try the new iWARP support in OMPI 1.3. The perf for that
> should be much better.
Yes i will try it, but i can't offer a unstable version of Open MPI in a productive system. So still
it's not released official the users have to work with 1.2.6
Steve Wise wrote:
> So OMPI experts, what is the overhead you see on other TCP links for
> OMPI BW tests vs native sockets TCP BW tests?
This is exactly what i need to know :)
Thanks a lot for the interest and the hints by now
Andy
--
Dresden University of Technology
Center for Information Services
and High Performance Computing (ZIH)
D-01062 Dresden
Germany
Phone: (+49) 351/463-38783
Fax: (+49) 351/463-38245
e-mail: andy.geo...@zih.tu-dresden.de
WWW: http://www.tu-dresden.de/zih