I would try specifically allowing only the network interface if it is
eth0 (or eth1, etc) on each node.  Reconfigure the nodes if needed so
that each interface is the same eth number for this.

I had this same issue occur and I think it was because some of my
nodes had one network for the ompi communications and another to the
internet.  I told my nodes to only use eth0 (the ompi network) and all
was better.

The command option to do this is in the FAQ*.  Alternatively, you can
search the mailing list for my last name - a different poster here was
kind enough to spell out exactly what to do.

* here is a link to the FAQ section:
http://www.open-mpi.org/faq/?category=tcp#tcp-selection

Quickly test with the hostname command to make sure the problem is
solved.  I tried running the hostname command from each node to make
sure everything was fixed on my system.

Good luck!

> Message: 8
> Date: Wed, 09 Apr 2008 22:17:59 +0200
> From: Danesh Daroui <danes...@bredband.net>
> Subject: Re: [OMPI users] submitted job stops
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <47fd2477.1010...@bredband.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Mark Kosmowski skrev:
> > Danesh:
> >
> > Have you tried "mpirun -np 4 --hostfile hosts hostname" to verify that
> > ompi is working?
> >
>
> When I run "mpirun -np 4 --hostfile hosts hostname" same thing happens
> and it just hangs. Can it be a clue?
>
> > Can you remote access from each node to each other node?
> >
> Yes all nodes can have access to each other via SSH and can login
> without being prompted for password.
>
> > If any node has more than 1 network device, are you using the ompi
> > options to specify which device to use?
> >
>
> Each node has one network interface which works properly.
>
> Regards,
>
> Danesh
>
>
> > Good luck,
> >
> > Mark
> >

Reply via email to