It depends on which network fails. If you lose all TCP connectivity, Open MPI should abort the job as the out-of-band system will detect the loss of connection. If you only lose the MPI connection (whether TCP or some other interconnect), then I believe the system will eventually generate an error after it retries sending the message a specified number of times, though it may not abort.

On Jul 22, 2009, at 10:55 PM, vipin kumar wrote:

Are you asking to find out this information before issuing "mpirun"? Open MPI does assume that the nodes you are trying to use are reachable.


 NO,

Scenario is a pair of processes are running one in "master" node say "masterprocess" and one in "slave" node say "slaveprocess". When "masterprocess" needs service of slave process, it sends message to "slaveprocess" and "slaveprocess" serves its request. In case of Network failure(by any means) "masterprocess" will keep trying to send message to "slaveprocess" without knowing that it is not reachable. So how "masterprocess" should finds out that "slaveprocess" can't be reached and leave attempting to send messages till Connection is not up.


Thanks & Regards,
--
Vipin K.
Research Engineer,
C-DOTB, India
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to