Does a "hello world" MPI app (with no MPI_SEND/MPI_RECV) in it work without 
those params, but an MPI app with MPI_SEND/MPI_RECV hang?

If so, that's a little disappointing -- OMPI's MPI layer should be able to tell 
the difference between the different networks and should be able to figure out 
routability between them.

But if hello world doesn't even run, then try running with "mpirun --mca 
oob_tcp_if_include <the interface(s) you want to use> ...", per Ralph's 
suggestion.  If *that* doesn't work, also add "--mca btl_tcp_if_include ..." as 
well.


On Oct 4, 2011, at 7:54 PM, Ralph Castain wrote:

> OMPI always tries to use the lowest numbered address first - just a natural 
> ordering. You need to tell it to use just the public ones for this topology. 
> Use the oob_tcp and btl_tcp parameters to do this. See "ompi_info --param oob 
> tcp" and "ompi_info --param btl tcp" for the exact syntax.
> 
> 
> Sent from my iPad
> 
> On Oct 4, 2011, at 10:21 AM, "(.-=Kiwi=-.)" <heffe...@gmail.com> wrote:
> 
>> We are constructing a set of computers with Open MPI and there's a small 
>> problem with mixing public and private IPs.
>> 
>> We aren't sure about what's causing the problem or how to solve it.
>> 
>> The files are shared thanks to NFS and we have a couple computers with 
>> private IPs and public IPs that we want them to send MPI work to some 
>> machines that have public IPs.
>> 
>> I'm going to try to describe with example IPs.
>> 
>> Computer 1 sees itself as eth0:  172...2  but has a public IP assigned:  
>> 210...2
>> Computer 2 sees itself as eth0:  172...3  but has a public IP assigned:  
>> 210...3
>> Computers outside the subnet directly have public IPs assigned:  210...100+
>> 
>> The computers outside see Computer 1 and 2 only with 210... they can't see 
>> the 172... internal IPs.
>> 
>> If an outside computer launches mpirun to Computer 1, it works ok.
>> If Computer 1 tries to launch mpirun to Computer 2 (with 172...) it also 
>> works ok (not with 210... because they don't know that that's their public 
>> IP, but that's not an issue).
>> 
>> The problem comes when Computer 1 or 2 try to launch mpirun to outside 
>> computers.
>> 
>> We tried to check out what was happening and installed wireshark on an 
>> outside computer and it seems that the ssh part works ok (the ssh talk 
>> between 210...2 and 210...101 is ok), but after that the outside computer 
>> tries to send a TCP SYN package to 172...2 instead of 210...2 and the rest 
>> of the packets onward the same.
>> 
>> Is there a way to solve this problem?
>> 
>> I've read this ( 
>> http://www.open-mpi.org/community/lists/users/2009/11/11184.php ) but I'm 
>> not really sure what he's doing there.
>> 
>> We have the option of plugging Computer 1 and Computer 2 directly to the 
>> switch that the outside computers are on, but we'd rather not because we'd 
>> prefer the computers to stay on the private network, but if there's no other 
>> way, I guess we can.
>> 
>> Can it be done without having to change the network topology?
>> 
>> Thanks in advance.
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to