Re: [OMPI users] allow job to survive process death

Reuti Thu, 27 Jan 2011 10:25:02 -0500

Am 27.01.2011 um 16:10 schrieb Joshua Hursey:

> 
> On Jan 27, 2011, at 9:47 AM, Reuti wrote:
> 
>> Am 27.01.2011 um 15:23 schrieb Joshua Hursey:
>> 
>>> The current version of Open MPI does not support continued operation of an 
>>> MPI application after process failure within a job. If a process dies, so 
>>> will the MPI job. Note that this is true of many MPI implementations out 
>>> there at the moment.
>>> 
>>> At Oak Ridge National Laboratory, we are working on a version of Open MPI 
>>> that will be able to run-through process failure, if the application wishes 
>>> to do so. The semantics and interfaces needed to support this functionality 
>>> are being actively developed by the MPI Forums Fault Tolerance Working 
>>> Group, and can be found at the wiki page below:
>>> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/run_through_stabilization
>> 
>> I had a look at this document, but what is really covered - the application 
>> has to react on the notification of a failed rank and act appropriate on its 
>> own?
> 
> Yes. This is to support application based fault tolerance (ABFT). Libraries 
> could be developed on top of these semantics to hide some of the fault 
> handing. The purpose is to enable fault tolerant MPI applications and 
> libraries to be built on top of MPI.
> 
> This document only covers run-through stabilization, not process recovery, at 
> the moment. So the application will have well defined semantics to allow it 
> to continue processing without the failed process. Recovering the failed 
> process is not specified in this document. That is the subject of a 
> supplemental document in preparation - the two proposals are meant to be 
> complementary and build upon one another.
> 
>> 
>> Having a true ability to survive a dying process (i.e. rank) which might be 
>> computing already for hours would mean to have some kind of "rank RAID" or 
>> "rank Parchive". E.g. start 12 ranks when you need 10 - what ever 2 ranks 
>> are failing, your job will be ready in time.
> 
> Yes, that is one possible technique. So once a process failure occurs, the 
> application is notified via the existing error handling mechanisms. The 
> application is then responsible for determining how best to recover from that 
> process failure. This could include using MPI_Comm_spawn to create new 
> processes (useful in manager/worker applications), recovering the state from 
> an in-memory checksum, using spare processes in the communicator, rolling 
> back some/all ranks to an application level checkpoint, ignoring the failure 
> and allowing the residual error to increase, aborting the job or a single 
> sub-communicator, ... the list goes on. But the purpose of the proposal is to 
> allow an application or library to start building such techniques based on 
> portable semantics and well defined interfaces.
> 
> Does that help clarify?


Yes - thx.

-- Reuti


> If you would like to discuss the developing proposals further or have input 
> on how to make it better, I would suggest moving the discussion to the 
> MPI3-ft mailing list so other groups can participate that do not normally 
> follow the Open MPI lists. The mailing list information is below:
>  http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 
> 
> -- Josh
> 
>> 
>> -- Reuti
>> 
>> 
>>> This work is on-going, but once we have a stable prototype we will assess 
>>> how to bring it back to the mainline Open MPI trunk. For the moment, there 
>>> is no public release of this branch, but once there is we will be sure to 
>>> announce it on the appropriate Open MPI mailing list for folks to start 
>>> playing around with it.
>>> 
>>> -- Josh
>>> 
>>> On Jan 27, 2011, at 9:11 AM, Kirk Stako wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I was wondering what support Open MPI has for allowing a job to
>>>> continue running when one or more processes in the job die
>>>> unexpectedly? Is there a special mpirun flag for this? Any other ways?
>>>> 
>>>> It seems obvious that collectives will fail once a process dies, but
>>>> would it be possible to create a new group (if you knew which ranks
>>>> are dead) that excludes the dead processes - then turn this group into
>>>> a working communicator?
>>>> 
>>>> Thanks,
>>>> Kirk
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>> 
>>> ------------------------------------
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> http://users.nccs.gov/~jjhursey
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> ------------------------------------
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] allow job to survive process death

Reply via email to