Re: [OMPI devel] opal_free_list_t annoyance

2014-05-28 Thread Nathan Hjelm
On Wed, May 28, 2014 at 03:28:59PM +, Dave Goodell (dgoodell) wrote: > On May 13, 2014, at 4:01 PM, Nathan Hjelm wrote: > > > While tracking down memory leaks in components I ran into an interesting > > issue. osc/rdma uses an opal_free_list_t (not an ompi_free_list_t) for >

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Gilles Gouaillardet
good to know ! how should we handle this within mtt ? decrease nseconds to 570 ? Cheers, Gilles On Thu, May 29, 2014 at 12:03 AM, Ralph Castain wrote: > Ah, that satisfied it! > > Sorry for the chase - I'll update my test. > > > On May 28, 2014, at 7:55 AM, Gilles

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
Ah, that satisfied it! Sorry for the chase - I'll update my test. On May 28, 2014, at 7:55 AM, Gilles Gouaillardet wrote: > Ralph, > > what if ? > > the parent : > MPI_Comm_free(); > MPI_Comm_disconnect(); > > and the child > MPI_Comm_free(); >

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
On May 28, 2014, at 7:50 AM, Gilles Gouaillardet wrote: > Ralph, > > thanks for the info > >> can you detail your full mpirun command line, the number of servers you are >> using, the btl involved and the ompi release that can be used to reproduce >> the

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
On May 28, 2014, at 7:34 AM, George Bosilca wrote: > Calling MPI_Comm_free is not enough from MPI perspective to clean up > all knowledge about remote processes, nor to sever the links between > the local and remote groups. One MUST call MPI_Comm_disconnect in > order to

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread George Bosilca
Calling MPI_Comm_free is not enough from MPI perspective to clean up all knowledge about remote processes, nor to sever the links between the local and remote groups. One MUST call MPI_Comm_disconnect in order to achieve this. Look at the code in ompi/mpi/c and see the difference between

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
On May 28, 2014, at 6:41 AM, Gilles Gouaillardet wrote: > Ralph, > > > On Wed, May 28, 2014 at 9:33 PM, Ralph Castain wrote: >> This is definetly what happens : only some tasks call MPI_Comm_free() > > Really? I don't see how that can

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Gilles Gouaillardet
Ralph, On Wed, May 28, 2014 at 9:33 PM, Ralph Castain wrote: > This is definetly what happens : only some tasks call MPI_Comm_free() > > > Really? I don't see how that can happen in loop_spawn - every process is > clearly calling comm_free. Or are you referring to the

Re: [OMPI devel] Trunk (RDMA and VT) warnings

2014-05-28 Thread Ralph Castain
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4) ./configure --prefix=/home/common/openmpi/build/svn-trunk --enable-mpi-java --enable-orterun-prefix-by-default More inline below On May 27, 2014, at 9:29 PM, Gilles Gouaillardet wrote: > Ralph, > > can you please

Re: [OMPI devel] some info is not pushed into the dstore

2014-05-28 Thread Ralph Castain
On May 28, 2014, at 1:18 AM, Gilles Gouaillardet wrote: > i finally got it :-) Hooray! Thanks for digging deeper. > > /* i previously got it "almost" right ... */ > > here is what happens on job 2 (with trunk) : > MPI_Intercomm_create calls

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
On May 28, 2014, at 4:31 AM, Jeff Squyres (jsquyres) wrote: > On May 27, 2014, at 9:11 PM, Gilles Gouaillardet > wrote: > >> in the case of intercomm_create, the children free all the communicators and >> then MPI_Disconnect() and then

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
On May 28, 2014, at 4:45 AM, Gilles Gouaillardet wrote: > Jeff, > > On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres) > > To be totally clear: MPI says it is erroneous for only some (not all) > > processes in a communicator to call MPI_COMM_FREE. So if

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Ralph Castain
You can adjust the number of iterations so the parent reaches the end - in my case, I run it in a non-managed environment, and so there is no timeout. If you run it that way, you'll see the end result when the parent attempts to finalize. On May 27, 2014, at 11:18 PM, Gilles Gouaillardet

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Gilles Gouaillardet
Jeff, On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres) > To be totally clear: MPI says it is erroneous for only some (not all) processes in a communicator to call MPI_COMM_FREE. So if that's the real problem, then the discussion about why the parent(s) is(are) trying to contact the

Re: [OMPI devel] some info is not pushed into the dstore

2014-05-28 Thread Gilles Gouaillardet
i finally got it :-) /* i previously got it "almost" right ... */ here is what happens on job 2 (with trunk) : MPI_Intercomm_create calls ompi_comm_get_rprocs that calls ompi_proc_unpack => ompi_proc_unpack store job 3 info into opal_dstore_peer then ompi_comm_get_rprocs calls

Re: [OMPI devel] Trunk (RDMA and VT) warnings

2014-05-28 Thread Gilles Gouaillardet
Ralph, can you please describe your environment (at least compiler (and version) + configure command line) i checked osc_rdma_data_move.c only : size_t incoming_length; is used to improve readability. it is used only in an assert clause and in OPAL_OUTPUT_VERBOSE one way to silence the warning

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread Gilles Gouaillardet
Ralph, On 2014/05/28 12:10, Ralph Castain wrote: > my understanding is that there are two ways of seeing things : > a) the "R-way" : the problem is the parent should not try to communicate to > already exited processes > b) the "J-way" : the problem is the children should have waited either in