It depends on which network fails. If you lose all TCP connectivity,
Open MPI should abort the job as the out-of-band system will detect
the loss of connection. If you only lose the MPI connection (whether
TCP or some other interconnect), then I believe the system will
eventually generate an error after it retries sending the message a
specified number of times, though it may not abort.
On Jul 22, 2009, at 10:55 PM, vipin kumar wrote:
Are you asking to find out this information before issuing
"mpirun"? Open MPI does assume that the nodes you are trying to use
are reachable.
NO,
Scenario is a pair of processes are running one in "master" node say
"masterprocess" and one in "slave" node say "slaveprocess". When
"masterprocess" needs service of slave process, it sends message to
"slaveprocess" and "slaveprocess" serves its request. In case of
Network failure(by any means) "masterprocess" will keep trying to
send message to "slaveprocess" without knowing that it is not
reachable. So how "masterprocess" should finds out that
"slaveprocess" can't be reached and leave attempting to send
messages till Connection is not up.
Thanks & Regards,
--
Vipin K.
Research Engineer,
C-DOTB, India
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users