Am 27.01.2011 um 16:10 schrieb Joshua Hursey: > > On Jan 27, 2011, at 9:47 AM, Reuti wrote: > >> Am 27.01.2011 um 15:23 schrieb Joshua Hursey: >> >>> The current version of Open MPI does not support continued operation of an >>> MPI application after process failure within a job. If a process dies, so >>> will the MPI job. Note that this is true of many MPI implementations out >>> there at the moment. >>> >>> At Oak Ridge National Laboratory, we are working on a version of Open MPI >>> that will be able to run-through process failure, if the application wishes >>> to do so. The semantics and interfaces needed to support this functionality >>> are being actively developed by the MPI Forums Fault Tolerance Working >>> Group, and can be found at the wiki page below: >>> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/run_through_stabilization >> >> I had a look at this document, but what is really covered - the application >> has to react on the notification of a failed rank and act appropriate on its >> own? > > Yes. This is to support application based fault tolerance (ABFT). Libraries > could be developed on top of these semantics to hide some of the fault > handing. The purpose is to enable fault tolerant MPI applications and > libraries to be built on top of MPI. > > This document only covers run-through stabilization, not process recovery, at > the moment. So the application will have well defined semantics to allow it > to continue processing without the failed process. Recovering the failed > process is not specified in this document. That is the subject of a > supplemental document in preparation - the two proposals are meant to be > complementary and build upon one another. > >> >> Having a true ability to survive a dying process (i.e. rank) which might be >> computing already for hours would mean to have some kind of "rank RAID" or >> "rank Parchive". E.g. start 12 ranks when you need 10 - what ever 2 ranks >> are failing, your job will be ready in time. > > Yes, that is one possible technique. So once a process failure occurs, the > application is notified via the existing error handling mechanisms. The > application is then responsible for determining how best to recover from that > process failure. This could include using MPI_Comm_spawn to create new > processes (useful in manager/worker applications), recovering the state from > an in-memory checksum, using spare processes in the communicator, rolling > back some/all ranks to an application level checkpoint, ignoring the failure > and allowing the residual error to increase, aborting the job or a single > sub-communicator, ... the list goes on. But the purpose of the proposal is to > allow an application or library to start building such techniques based on > portable semantics and well defined interfaces. > > Does that help clarify?
Yes - thx. -- Reuti > If you would like to discuss the developing proposals further or have input > on how to make it better, I would suggest moving the discussion to the > MPI3-ft mailing list so other groups can participate that do not normally > follow the Open MPI lists. The mailing list information is below: > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft > > > -- Josh > >> >> -- Reuti >> >> >>> This work is on-going, but once we have a stable prototype we will assess >>> how to bring it back to the mainline Open MPI trunk. For the moment, there >>> is no public release of this branch, but once there is we will be sure to >>> announce it on the appropriate Open MPI mailing list for folks to start >>> playing around with it. >>> >>> -- Josh >>> >>> On Jan 27, 2011, at 9:11 AM, Kirk Stako wrote: >>> >>>> Hi, >>>> >>>> I was wondering what support Open MPI has for allowing a job to >>>> continue running when one or more processes in the job die >>>> unexpectedly? Is there a special mpirun flag for this? Any other ways? >>>> >>>> It seems obvious that collectives will fail once a process dies, but >>>> would it be possible to create a new group (if you knew which ranks >>>> are dead) that excludes the dead processes - then turn this group into >>>> a working communicator? >>>> >>>> Thanks, >>>> Kirk >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> ------------------------------------ >>> Joshua Hursey >>> Postdoctoral Research Associate >>> Oak Ridge National Laboratory >>> http://users.nccs.gov/~jjhursey >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ------------------------------------ > Joshua Hursey > Postdoctoral Research Associate > Oak Ridge National Laboratory > http://users.nccs.gov/~jjhursey > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users