Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Gilles Gouaillardet
good to know ! how should we handle this within mtt ? decrease nseconds to 570 ? Cheers, Gilles On Thu, May 29, 2014 at 12:03 AM, Ralph Castain wrote: > Ah, that satisfied it! > > Sorry for the chase - I'll update my test. > > > On May 28, 2014, at 7:55 AM, Gilles

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
Ah, that satisfied it! Sorry for the chase - I'll update my test. On May 28, 2014, at 7:55 AM, Gilles Gouaillardet wrote: > Ralph, > > what if ? > > the parent : > MPI_Comm_free(); > MPI_Comm_disconnect(); > > and the child > MPI_Comm_free(); >

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
On May 28, 2014, at 7:50 AM, Gilles Gouaillardet wrote: > Ralph, > > thanks for the info > >> can you detail your full mpirun command line, the number of servers you are >> using, the btl involved and the ompi release that can be used to reproduce >> the

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
On May 28, 2014, at 7:34 AM, George Bosilca wrote: > Calling MPI_Comm_free is not enough from MPI perspective to clean up > all knowledge about remote processes, nor to sever the links between > the local and remote groups. One MUST call MPI_Comm_disconnect in > order to

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread George Bosilca
Calling MPI_Comm_free is not enough from MPI perspective to clean up all knowledge about remote processes, nor to sever the links between the local and remote groups. One MUST call MPI_Comm_disconnect in order to achieve this. Look at the code in ompi/mpi/c and see the difference between

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
On May 28, 2014, at 6:41 AM, Gilles Gouaillardet wrote: > Ralph, > > > On Wed, May 28, 2014 at 9:33 PM, Ralph Castain wrote: >> This is definetly what happens : only some tasks call MPI_Comm_free() > > Really? I don't see how that can

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Gilles Gouaillardet
Ralph, On Wed, May 28, 2014 at 9:33 PM, Ralph Castain wrote: > This is definetly what happens : only some tasks call MPI_Comm_free() > > > Really? I don't see how that can happen in loop_spawn - every process is > clearly calling comm_free. Or are you referring to the

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
On May 28, 2014, at 4:31 AM, Jeff Squyres (jsquyres) wrote: > On May 27, 2014, at 9:11 PM, Gilles Gouaillardet > wrote: > >> in the case of intercomm_create, the children free all the communicators and >> then MPI_Disconnect() and then

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
On May 28, 2014, at 4:45 AM, Gilles Gouaillardet wrote: > Jeff, > > On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres) > > To be totally clear: MPI says it is erroneous for only some (not all) > > processes in a communicator to call MPI_COMM_FREE. So if

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
You can adjust the number of iterations so the parent reaches the end - in my case, I run it in a non-managed environment, and so there is no timeout. If you run it that way, you'll see the end result when the parent attempts to finalize. On May 27, 2014, at 11:18 PM, Gilles Gouaillardet

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Gilles Gouaillardet
Jeff, On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres) > To be totally clear: MPI says it is erroneous for only some (not all) processes in a communicator to call MPI_COMM_FREE. So if that's the real problem, then the discussion about why the parent(s) is(are) trying to contact the

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Gilles Gouaillardet
Ralph, On 2014/05/28 12:10, Ralph Castain wrote: > my understanding is that there are two ways of seeing things : > a) the "R-way" : the problem is the parent should not try to communicate to > already exited processes > b) the "J-way" : the problem is the children should have waited either in

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-27 Thread Gilles Gouaillardet
Ralph, in the case of intercomm_create, the children free all the communicators and then MPI_Disconnect() and then MPI_Finalize() and exits. the parent only MPI_Disconnect() without freeing all the communicators. MPI_Finalize() tries to disconnect and communicate with already exited processes.

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-27 Thread Ralph Castain
Since you ignored my response, I'll reiterate and clarify it here. The problem in the case of loop_spawn is that the parent process remains "connected" to children after the child has finalized and died. Hence, when the parent attempts to finalize, it tries to "disconnect" itself from processes

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-27 Thread Jeff Squyres (jsquyres)
Note that MPI says that COMM_DISCONNECT simply disconnects that individual communicator. It does *not* guarantee that the processes involved will be fully disconnected. So I think that the freeing of communicators is good app behavior, but it is not required by the MPI spec. If OMPI is

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-27 Thread Ralph Castain
FWIW: this now appears true for *any* case where a parent connects to more than one child - i.e., if a process calls connect-accept more than once (e.g., in loop_spawn) This didn't used to be true, so something has changed in OMPI's underlying behavior. On May 26, 2014, at 11:27 PM, Gilles

[OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-27 Thread Gilles Gouaillardet
Folks, currently, the dynamic/intercomm_create test from the ibm test suite output the following messages : dpm_base_disconnect_init: error -12 in isend to process 1 the root cause it task 0 tries to send messages to already exited tasks. one way of seeing things is that this is an application