Re: [OMPI devel] RFC: Fix missing code in MPI_Abort functionality

2011-06-15 Thread Josh Hursey
Committed in r24775. https://svn.open-mpi.org/trac/ompi/changeset/24775 Sorry for the delay on this, I got side tracked yesterday. -- Josh On Tue, Jun 14, 2011 at 11:36 AM, Josh Hursey wrote: > Just a reminder for those not on the call that this RFC is scheduled > to go in later today. > > --

Re: [OMPI devel] RFC: Fix missing code in MPI_Abort functionality

2011-06-14 Thread Josh Hursey
Just a reminder for those not on the call that this RFC is scheduled to go in later today. -- Josh On Fri, Jun 10, 2011 at 8:53 AM, Ralph Castain wrote: > > On Jun 10, 2011, at 6:48 AM, Josh Hursey wrote: > >> Why would this patch result in zombied processes and poor cleanup? >> When ORTE receiv

Re: [OMPI devel] RFC: Fix missing code in MPI_Abort functionality

2011-06-10 Thread Ralph Castain
On Jun 10, 2011, at 6:48 AM, Josh Hursey wrote: > Why would this patch result in zombied processes and poor cleanup? > When ORTE receive notification of a process terminating/aborting then > it triggers the termination of the job (without UTK's RFC) which > should ensure a clean shutdown. This pa

Re: [OMPI devel] RFC: Fix missing code in MPI_Abort functionality

2011-06-10 Thread Josh Hursey
Why would this patch result in zombied processes and poor cleanup? When ORTE receive notification of a process terminating/aborting then it triggers the termination of the job (without UTK's RFC) which should ensure a clean shutdown. This patch just tells ORTE that a few other processes should be t

Re: [OMPI devel] RFC: Fix missing code in MPI_Abort functionality

2011-06-10 Thread Ralph Castain
I have no issue with uncommenting the code. However, I do see a future littered with lots of zombied processes and complaints over poor cleanup again On Jun 9, 2011, at 6:08 PM, Joshua Hursey wrote: > Ah I see what you are getting at now. > > The construction of the list of connected proce

Re: [OMPI devel] RFC: Fix missing code in MPI_Abort functionality

2011-06-09 Thread Joshua Hursey
Ah I see what you are getting at now. The construction of the list of connected processes is something I, intentionally, did not modify from the current Open MPI code. The list is calculated based on the locally known set of local and remote process groups attached to the communicator. So this

Re: [OMPI devel] RFC: Fix missing code in MPI_Abort functionality

2011-06-09 Thread George Bosilca
What I'm saying is that there is no reason to have any other type of MPI_Abort if we are not able to compute the set of connected processes. With this RFC the processes on the communicator on MPI_Abort will abort. Then the other processes in the same MPI_COMM_WORLD (in fact jobid) will be notif

Re: [OMPI devel] RFC: Fix missing code in MPI_Abort functionality

2011-06-09 Thread Josh Hursey
On Thu, Jun 9, 2011 at 4:47 PM, George Bosilca wrote: > If this change the behavior of MPI_Abort to only abort processes on the > specified communicator how this doesn't affects the default user experience > (when today it aborts everything)? Open MPI does abort everything by default - decided

Re: [OMPI devel] RFC: Fix missing code in MPI_Abort functionality

2011-06-09 Thread George Bosilca
If this change the behavior of MPI_Abort to only abort processes on the specified communicator how this doesn't affects the default user experience (when today it aborts everything)? If we accept the fact that MPI_Abort will only abort the processes in the current communicator what happens with

[OMPI devel] RFC: Fix missing code in MPI_Abort functionality

2011-06-09 Thread Josh Hursey
WHAT: Fix missing code in MPI_Abort WHY: MPI_Abort is missing logic to ask for termination of the process group defined by the communicator WHERE: Mostly orte/mca/errmgr WHEN: Open MPI trunk TIMEOUT: Tuesday, June 14, 2011 (after teleconf) Details: --- A