Did you have both of the ethernet ports on the same subnet, or were they on 
different subnets?


On Feb 17, 2012, at 5:36 AM, Richard Bardwell wrote:

> I had exactly the same problem.
> Trying to run mpi between 2 separate machines, with each machine having
> 2 ethernet ports, causes really weird behaviour on the most basic code.
> I had to disable one of the ethernet ports on each of the machines
> and it worked just fine after that. No idea why though !
>  
> ----- Original Message -----
> From: Jingcha Joba
> To: us...@open-mpi.org
> Sent: Thursday, February 16, 2012 8:43 PM
> Subject: [OMPI users] Problem running an mpi applicatio​n on nodes with more 
> than one interface
> 
> Hello Everyone,
> This is my 1st post in open-mpi forum.
> I am trying to run a simple program which does Sendrecv between two nodes 
> having 2 interface cards on each of two nodes.
> Both the nodes are running RHEL6, with open-mpi 1.4.4 on a 8 core Xeon 
> processor.
> What I noticed was that when using two or more interface on both the nodes, 
> the mpi "hangs" attempting to connect.
> These details might help,
> Node 1 - Denver has a single port "A" card (eth21 - 25.192.xx.xx - which I 
> use to ssh to that machine), and a double port "B" card (eth23 - 10.3.1.1 & 
> eth24 - 10.3.1.2).
> Node 2 - Chicago also the same single port A card (eth19 - 25.192.xx.xx - 
> again uses for ssh) and a double port B card ( eth29 - 10.3.1.3 &eth30 - 
> 10.3.1.4).
> My /etc/host looks like
> 25.192.xx.xx denver.xxx.com denver
> 10.3.1.1 denver.xxx.com denver
> 10.3.1.2 denver.xxx.com denver
> 25.192.xx.xx chicago.xxx.com chicago
> 10.3.1.3 chicago.xxx.com chicago
> 10.3.1.4 chicago.xxx.com chicago
> ...
> ...
> ...
> This is how I run,
> mpirun --hostfile host1 --mca btl tcp,sm,self --mca btl_tcp_if_exclude 
> eth21,eth19,lo,virbr0 --mca btl_base_verbose 30 -np 4 ./Sendrecv
> I get bunch of things from both chicago and denver, which says its has found 
> components like tcp, sm, self and stuffs, and then hangs at
> [denver.xxx.com:21682] btl: tcp: attempting to connect() to address 10.3.1.3 
> on port 4
> [denver.xxx.com:21682] btl: tcp: attempting to connect() to address 10.3.1.4 
> on port 4
> However, if I run the same program by excluding eth29 or eth30, then it works 
> fine. Something like this:
> mpirun --hostfile host1 --mca btl tcp,sm,self --mca btl_tcp_if_exclude 
> eth21,eth19,eth29,lo,virbr0 --mca btl_base_verbose 30 -np 4 ./Sendrecv
> My hostfile looks like this
> [sshuser@denver Sendrecv]$ cat host1
> denver slots=2
> chicago slots=2
> I am not sure if I have to provide somethbing else. Please if I have to, 
> please feel to ask me..
> thanks,
> --
> Joba
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to