Disconnect is a -collective- operation. Both parent and child have to call it. 
Your child process is "hanging" while it waits for the parent.

On Dec 21, 2009, at 1:37 AM, vipin kumar wrote:

> Hello folks,
> 
> As I explained my problem earlier, I am looking for Fault Tolerance in MPI 
> Programs. I read in Open MPI 2.1 standard document that two DISCONNECTED 
> processes does not affect each other, i.e. they can die or can be killed 
> without whithout affecting other processes.
> 
> So, I was trying this to achieve fault tolerance using 
> MPI::Comm::Disconnect() to disconnect the CHILD process with PARENT process, 
> which was spawned by calling MPI::Comm::spawn(). I am calling 
> MPI::Comm::Disconnect() from CHILD process immediatly after calling 
> MPI::Init(). It seems that CHILD process is not returning from this call. 
> 
> I tried MPI::Comm::Free() too, but this is also not working. Process is not 
> progressing from this point of call. If I comment these statements, 
> everything works fine. Note that I have tried this in Solaris as well as in 
> Linux (fedora core).
> 
> My question is, whether Open-mpi suports to disconnect two processes( like 
> child from parent). And if it is, then how?
> 
> 
> Thanks & Regards,
> 
> On Wed, Sep 23, 2009 at 6:41 PM, Josh Hursey <jjhur...@open-mpi.org> wrote:
> Unfortunately I cannot provide a precise time frame for availability at this 
> point, but we are targeting the v1.5 release series. There is a handful of 
> core developers working on this issue at the moment. Pieces of this  work 
> have already made it into the Open MPI development trunk. If you want to play 
> around with what is available try turning on the resilient mapper:
>  -mca rmaps resilient
> 
> We will be sure to email the list once this work becomes more stable and 
> available.
> 
> -- Josh
> 
> 
> On Sep 18, 2009, at 2:56 AM, vipin kumar wrote:
> 
> Hi Josh,
> 
> It is good to hear from you that work is in progress towards resiliency of 
> Open-MPI. I was and I am waiting for this capability in Open-MPI. I have 
> almost finished my development work and waiting for this to happen so that I 
> can test my programs. It will be good if you can tell how long it will take 
> to make Open-MPI a resilient impementation. Here by resiliency I mean 
> abnormal termination or intentionally killing a process should not cause 
> any(parent or sibling) process to be terminated, given that processes are 
> connected.
> 
> thanks.
> 
> Regards,
> 
> On Mon, Aug 3, 2009 at 8:37 PM, Josh Hursey <jjhur...@open-mpi.org> wrote:
> Task-farm or manager/worker recovery models typically depend on 
> intercommunicators (i.e., from MPI_Comm_spawn) and a resilient MPI 
> implementation. William Gropp and Ewing Lusk have a paper entitled "Fault 
> Tolerance in MPI Programs" that outlines how an application might take 
> advantage of these features in order to recover from process failure.
> 
> However, these techniques strongly depend upon resilient MPI implementations, 
> and behaviors that, some may argue, are non-standard. Unfortunately there are 
> not many MPI implementations that are sufficiently resilient in the face of 
> process failure to support failure in task-farm scenarios. Though Open MPI 
> supports the current MPI 2.1 standard, it is not as resilient to process 
> failure as it could be.
> 
> There are a number of people working on improving the resiliency of Open MPI 
> in the face of network and process failure (including myself). We have 
> started to move some of the resiliency work into the Open MPI trunk. 
> Resiliency in Open MPI has been improving over the past few months, but I 
> would not assess it as ready quite yet. Most of the work has focused on the 
> runtime level (ORTE), and there are still some MPI level (OMPI) issues that 
> need to be worked out.
> 
> With all of that being said, I would try some of the techniques presented in 
> the Gropp/Lusk paper in your application. Then test it with Open MPI and let 
> us know how it goes.
> 
> Best,
> Josh
> 
> 
> On Aug 3, 2009, at 10:30 AM, Durga Choudhury wrote:
> 
> Is that kind of approach possible within an MPI framework? Perhaps a
> grid approach would be better. More experienced people, speak up,
> please?
> (The reason I say that is that I too am interested in the solution of
> that kind of problem, where an individual blade of a blade server
> fails and correcting for that failure on the fly is better than taking
> checkpoints and restarting the whole process excluding the failed
> blade.
> 
> Durga
> 
> On Mon, Aug 3, 2009 at 9:21 AM, jody<jody....@gmail.com> wrote:
> Hi
> 
> I guess "task-farming" could give you a certain amount of the kind of
> fault-tolerance you want.
> (i.e. a master process distributes tasks to idle slave processors -
> however, this will only work
> if the slave processes don't need to communicate with each other)
> 
> Jody
> 
> 
> On Mon, Aug 3, 2009 at 1:24 PM, vipin kumar<vipinkuma...@gmail.com> wrote:
> Hi all,
> 
> Thanks Durga for your reply.
> 
> Jeff, once you wrote code for Mandelbrot set to demonstrate fault tolerance
> in LAM-MPI. i. e. killing any slave process doesn't
> affect others. Exact behaviour I am looking for in Open MPI. I attempted,
> but no luck. Can you please tell how to write such programs in Open MPI.
> 
> Thanks in advance.
> 
> Regards,
> On Thu, Jul 9, 2009 at 8:30 PM, Durga Choudhury <dpcho...@gmail.com> wrote:
> 
> Although I have perhaps the least experience on the topic in this
> list, I will take a shot; more experienced people, please correct me:
> 
> MPI standards specify communication mechanism, not fault tolerance at
> any level. You may achieve network tolerance at the IP level by
> implementing 'equal cost multipath' routes (which means two equally
> capable NIC cards connecting to the same destination and modifying the
> kernel routing table to use both cards; the kernel will dynamically
> load balance.). At the MAC level, you can achieve the same effect by
> trunking multiple network cards.
> 
> You can achieve process level fault tolerance by a checkpointing
> scheme such as BLCR, which has been tested to work with OpenMPI (and
> other processes as well)
> 
> Durga
> 
> On Thu, Jul 9, 2009 at 4:57 AM, vipin kumar<vipinkuma...@gmail.com> wrote:
> 
> Hi all,
> 
> I want to know whether open mpi supports Network and process fault
> tolerance
> or not? If there is any example demonstrating these features that will
> be
> best.
> 
> Regards,
> --
> Vipin K.
> Research Engineer,
> C-DOTB, India
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> --
> Vipin K.
> Research Engineer,
> C-DOTB, India
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> Vipin K.
> Research Engineer,
> C-DOTB, India
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> Vipin K.
> Research Engineer,
> C-DOTB, India
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to