Hi, I was wondering what support Open MPI has for allowing a job to continue running when one or more processes in the job die unexpectedly? Is there a special mpirun flag for this? Any other ways?
It seems obvious that collectives will fail once a process dies, but would it be possible to create a new group (if you knew which ranks are dead) that excludes the dead processes - then turn this group into a working communicator? Thanks, Kirk