Did you have both of the ethernet ports on the same subnet, or were they on different subnets?
On Feb 17, 2012, at 5:36 AM, Richard Bardwell wrote: > I had exactly the same problem. > Trying to run mpi between 2 separate machines, with each machine having > 2 ethernet ports, causes really weird behaviour on the most basic code. > I had to disable one of the ethernet ports on each of the machines > and it worked just fine after that. No idea why though ! > > ----- Original Message ----- > From: Jingcha Joba > To: us...@open-mpi.org > Sent: Thursday, February 16, 2012 8:43 PM > Subject: [OMPI users] Problem running an mpi application on nodes with more > than one interface > > Hello Everyone, > This is my 1st post in open-mpi forum. > I am trying to run a simple program which does Sendrecv between two nodes > having 2 interface cards on each of two nodes. > Both the nodes are running RHEL6, with open-mpi 1.4.4 on a 8 core Xeon > processor. > What I noticed was that when using two or more interface on both the nodes, > the mpi "hangs" attempting to connect. > These details might help, > Node 1 - Denver has a single port "A" card (eth21 - 25.192.xx.xx - which I > use to ssh to that machine), and a double port "B" card (eth23 - 10.3.1.1 & > eth24 - 10.3.1.2). > Node 2 - Chicago also the same single port A card (eth19 - 25.192.xx.xx - > again uses for ssh) and a double port B card ( eth29 - 10.3.1.3 ð30 - > 10.3.1.4). > My /etc/host looks like > 25.192.xx.xx denver.xxx.com denver > 10.3.1.1 denver.xxx.com denver > 10.3.1.2 denver.xxx.com denver > 25.192.xx.xx chicago.xxx.com chicago > 10.3.1.3 chicago.xxx.com chicago > 10.3.1.4 chicago.xxx.com chicago > ... > ... > ... > This is how I run, > mpirun --hostfile host1 --mca btl tcp,sm,self --mca btl_tcp_if_exclude > eth21,eth19,lo,virbr0 --mca btl_base_verbose 30 -np 4 ./Sendrecv > I get bunch of things from both chicago and denver, which says its has found > components like tcp, sm, self and stuffs, and then hangs at > [denver.xxx.com:21682] btl: tcp: attempting to connect() to address 10.3.1.3 > on port 4 > [denver.xxx.com:21682] btl: tcp: attempting to connect() to address 10.3.1.4 > on port 4 > However, if I run the same program by excluding eth29 or eth30, then it works > fine. Something like this: > mpirun --hostfile host1 --mca btl tcp,sm,self --mca btl_tcp_if_exclude > eth21,eth19,eth29,lo,virbr0 --mca btl_base_verbose 30 -np 4 ./Sendrecv > My hostfile looks like this > [sshuser@denver Sendrecv]$ cat host1 > denver slots=2 > chicago slots=2 > I am not sure if I have to provide somethbing else. Please if I have to, > please feel to ask me.. > thanks, > -- > Joba > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/