Ralph is right: OMPI aggressively uses all Ethernet interfaces by default.  

This short FAQ has links to 2 other FAQs that provide detailed information 
about reachability:

    http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network

The usNIC BTL uses UDP for its wire transport and actually does a much more 
standards-conformant peer reachability determination (i.e., it actually checks 
routing tables to see if it can reach a given peer which has all kinds of 
caching benefits, kernel controls if you want them, etc.).  We haven't 
back-ported this to the TCP BTL because a) most people who use TCP for MPI 
still use a single L2 address space, and b) no one has asked for it.  :-)

As for the round robin scheduling, there's no indication from the Linux TCP 
stack what the bandwidth is on a given IP interface.  So unless you use the 
btl_tcp_bandwidth_<IP_INTERFACE_NAME> (e.g., btl_tcp_bandwidth_eth0) MCA 
params, OMPI will round-robin across them equally.

If you have multiple IP interfaces sharing a single physical link, there will 
likely be no benefit from having Open MPI use more than one of them.  You 
should probably use btl_tcp_if_include / btl_tcp_if_exclude to select just one.




On Nov 7, 2014, at 2:53 PM, Brock Palen <bro...@umich.edu> wrote:

> I was doing a test on our IB based cluster, where I was diabling IB
> 
> --mca btl ^openib --mca mtl ^mxm
> 
> I was sending very large messages >1GB  and I was surppised by the speed.
> 
> I noticed then that of all our ethernet interfaces
> 
> eth0  (1gig-e)
> ib0  (ip over ib, for lustre configuration at vendor request)
> eoib0  (ethernet over IB interface for IB -> Ethernet gateway for some 
> extrnal storage support at >1Gig speed
> 
> I saw all three were getting traffic.
> 
> We use torque for our Resource Manager and use TM support, the hostnames 
> given by torque match the eth0 interfaces.
> 
> How does OMPI figure out that it can also talk over the others?  How does it 
> chose to load balance?
> 
> BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 and 
> eoib0  are the same physical device and may screw with load balancing if 
> anyone ver falls back to TCP.
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25709.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to