ssh may be allowed but other random TCP ports may not. iptables is the typical firewall software that most Linux installations use; it may have been enabled by default.
I'm a little doubtful that this is your problem, though, because you're apparently able to *launch* your application, which means that OMPI's out-of-band communication system was able to make some sockets. So it's a little weird that the MPI layer's TCP sockets were borked. But let's check for firewall software, first... On May 26, 2011, at 12:42 AM, Jagannath Mondal wrote: > Hi Jeff, > I was wondering how I can check whether there is any firewall software . > In fact I can use ssh to go from one machine to another . But, only with > mpirun , it does not work. I was wondering whether it is possible that even > in presence of firewall ssh may work but mpirun may not. > Jagannath > > On Wed, May 25, 2011 at 10:42 PM, Jeff Squyres (jsquyres) > <jsquy...@cisco.com> wrote: > Are you running any firewall software? > > Sent from my phone. No type good. > > On May 25, 2011, at 10:41 PM, "Jagannath Mondal" <jagannath.mon...@gmail.com> > wrote: > >> Hi, >> I am having a problem in running mpirun over multiple nodes. >> To run a job over two 8-core processors, I generated a hostfile as follows: >> yethiraj30 slots=8 max_slots=8 >> yethiraj31 slots=8 max_slots=8 >> >> These two machines are intra-connected and I have installed openmpi 1.3.3. >> Then If I try to run the replica exchange simulation using the following >> command: >> mpirun -np 16 --hostfile hostfile mdrun_4mpi -s topol_.tpr -multi 16 >> -replex 100 >& log_replica_test >> >> But I find following error and job does not proceed at all : >> btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to >> 192.168.0.31 failed: No route to host (113) >> >> Here is the full details: >> >> NNODES=16, MYRANK=0, HOSTNAME=yethiraj30 >> NNODES=16, MYRANK=1, HOSTNAME=yethiraj30 >> NNODES=16, MYRANK=4, HOSTNAME=yethiraj30 >> NNODES=16, MYRANK=2, HOSTNAME=yethiraj30 >> NNODES=16, MYRANK=6, HOSTNAME=yethiraj30 >> NNODES=16, MYRANK=3, HOSTNAME=yethiraj30 >> NNODES=16, MYRANK=5, HOSTNAME=yethiraj30 >> NNODES=16, MYRANK=7, HOSTNAME=yethiraj30 >> [yethiraj30][[22604,1],0][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] >> connect() to 192.168.0.31 failed: No route to host (113) >> [yethiraj30][[22604,1],4][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] >> connect() to 192.168.0.31 failed: No route to host (113) >> [yethiraj30][[22604,1],6][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] >> connect() to 192.168.0.31 failed: No route to host (113) >> [yethiraj30][[22604,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] >> connect() to 192.168.0.31 failed: No route to host (113) >> [yethiraj30][[22604,1],3][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] >> connect() to 192.168.0.31 failed: No route to host (113) >> [yethiraj30][[22604,1],2][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] >> connect() to 192.168.0.31 failed: No route to host (113) >> NNODES=16, MYRANK=10, HOSTNAME=yethiraj31 >> NNODES=16, MYRANK=12, HOSTNAME=yethiraj31 >> >> I am not sure how to resolve this issue. In general, I can go from one >> machine to another without any problem using ssh. But, when I am trying to >> run openmpi over both the machines, I get this error. Any help will be >> appreciated. >> >> Jagannath >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/