It should detect and abort - what version are you using?

Sent from my iPhone

On Jun 20, 2013, at 2:02 PM, Claire Williams <clairewilliams1...@yahoo.com> 
wrote:

> Hi all,
> 
> I was wondering if Open-MPI had any way to detect that a node has crashed, 
> rebooted, etc. I am currently trying to integrate my MPI application with 
> Amazon EC2 spot instances, and since spot instances can be terminated at any 
> time, I would like to try to make it so that my application can detect this 
> node failure, maybe remove the node from the machine file, and restart the 
> application automatically. Right now, when one of the worker nodes is 
> rebooted or terminated, the master that is waiting on the results of that 
> node will just hang, waiting for results that will never come. 
> 
> Thanks,
> 
> Claire  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to