Yes, they were on the same subnet. I guess that is the problem.
----- Original Message -----
From: "Jeff Squyres" <jsquy...@cisco.com>
To: "Open MPI Users" <us...@open-mpi.org>
Sent: Friday, February 17, 2012 4:20 PM
Subject: Re: [OMPI users] Problem running an mpi application on nodes with
more than one interface
Did you have both of the ethernet ports on the same subnet, or were they on
different subnets?
On Feb 17, 2012, at 5:36 AM, Richard Bardwell wrote:
I had exactly the same problem.
Trying to run mpi between 2 separate machines, with each machine having
2 ethernet ports, causes really weird behaviour on the most basic code.
I had to disable one of the ethernet ports on each of the machines
and it worked just fine after that. No idea why though !
----- Original Message -----
From: Jingcha Joba
To: us...@open-mpi.org
Sent: Thursday, February 16, 2012 8:43 PM
Subject: [OMPI users] Problem running an mpi application on nodes with more
than one interface
Hello Everyone,
This is my 1st post in open-mpi forum.
I am trying to run a simple program which does Sendrecv between two nodes
having 2 interface cards on each of two nodes.
Both the nodes are running RHEL6, with open-mpi 1.4.4 on a 8 core Xeon
processor.
What I noticed was that when using two or more interface on both the nodes, the mpi
"hangs" attempting to connect.
These details might help,
Node 1 - Denver has a single port "A" card (eth21 - 25.192.xx.xx - which I use to ssh to that machine), and a double port "B"
card (eth23 - 10.3.1.1 & eth24 - 10.3.1.2).
Node 2 - Chicago also the same single port A card (eth19 - 25.192.xx.xx - again uses for ssh) and a double port B card ( eth29 -
10.3.1.3 ð30 - 10.3.1.4).
My /etc/host looks like
25.192.xx.xx denver.xxx.com denver
10.3.1.1 denver.xxx.com denver
10.3.1.2 denver.xxx.com denver
25.192.xx.xx chicago.xxx.com chicago
10.3.1.3 chicago.xxx.com chicago
10.3.1.4 chicago.xxx.com chicago
...
...
...
This is how I run,
mpirun --hostfile host1 --mca btl tcp,sm,self --mca btl_tcp_if_exclude eth21,eth19,lo,virbr0 --mca btl_base_verbose 30 -np 4
./Sendrecv
I get bunch of things from both chicago and denver, which says its has found components like tcp, sm, self and stuffs, and then
hangs at
[denver.xxx.com:21682] btl: tcp: attempting to connect() to address 10.3.1.3 on
port 4
[denver.xxx.com:21682] btl: tcp: attempting to connect() to address 10.3.1.4 on
port 4
However, if I run the same program by excluding eth29 or eth30, then it works
fine. Something like this:
mpirun --hostfile host1 --mca btl tcp,sm,self --mca btl_tcp_if_exclude eth21,eth19,eth29,lo,virbr0 --mca btl_base_verbose 30 -np
4 ./Sendrecv
My hostfile looks like this
[sshuser@denver Sendrecv]$ cat host1
denver slots=2
chicago slots=2
I am not sure if I have to provide somethbing else. Please if I have to, please
feel to ask me..
thanks,
--
Joba
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users