Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-27 Thread Gilles Gouaillardet
Ralph, in the case of intercomm_create, the children free all the communicators and then MPI_Disconnect() and then MPI_Finalize() and exits. the parent only MPI_Disconnect() without freeing all the communicators. MPI_Finalize() tries to disconnect and communicate with already exited processes.

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-27 Thread Ralph Castain
Since you ignored my response, I'll reiterate and clarify it here. The problem in the case of loop_spawn is that the parent process remains "connected" to children after the child has finalized and died. Hence, when the parent attempts to finalize, it tries to "disconnect" itself from processes

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-27 Thread Jeff Squyres (jsquyres)
Note that MPI says that COMM_DISCONNECT simply disconnects that individual communicator. It does *not* guarantee that the processes involved will be fully disconnected. So I think that the freeing of communicators is good app behavior, but it is not required by the MPI spec. If OMPI is

Re: [OMPI devel] OMPI Opengrok config

2014-05-27 Thread Gilles Gouaillardet
Thanks Jeff, i can only speak for myself : i use OpenGrok on a daily basis and it is a great help Cheers, Gilles On Wed, May 28, 2014 at 8:21 AM, Jeff Squyres (jsquyres) wrote: > I can ask IU to adjust the OpenGrok config. > > > On May 27, 2014, at 1:06 AM, Gilles

Re: [OMPI devel] OMPI Opengrok config

2014-05-27 Thread Jeff Squyres (jsquyres)
I can ask IU to adjust the OpenGrok config. On May 27, 2014, at 1:06 AM, Gilles Gouaillardet wrote: > Folks, > > OMPI Opengrok search (http://svn.open-mpi.org/source) currently returns > results for : > - trunk > - v1.6 branch > - v1.5 branch > - v1.3 branch >

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Ralph Castain
On May 27, 2014, at 2:28 PM, George Bosilca wrote: > On Tue, May 27, 2014 at 5:09 PM, Ralph Castain wrote: >>> That being said, I agree with Ralph on the fact that accepting them in >>> the trunk doesn't automatically qualify it for inclusion in any >>>

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread George Bosilca
On Tue, May 27, 2014 at 5:09 PM, Ralph Castain wrote: >> That being said, I agree with Ralph on the fact that accepting them in >> the trunk doesn't automatically qualify it for inclusion in any >> further stable release. However, if ORNL setup nightly builds to >> validate

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Ralph Castain
On May 27, 2014, at 1:50 PM, George Bosilca wrote: > From a practical perspective, I don't think there is a need for a > phone call. Ralph made his point, and we all took notice of it. > However, the proposed changes are in a single independent component, > with no impact

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread George Bosilca
>From a practical perspective, I don't think there is a need for a phone call. Ralph made his point, and we all took notice of it. However, the proposed changes are in a single independent component, with no impact on the rest of the code base. Therefore, there is absolutely no valid reason not to

Re: [OMPI devel] OMPI Opengrok config

2014-05-27 Thread Ralph Castain
Not sure, but I suspect Jeff set that up as a lark sometime in the past and it hasn't been maintained in years. On May 26, 2014, at 10:06 PM, Gilles Gouaillardet wrote: > Folks, > > OMPI Opengrok search (http://svn.open-mpi.org/source) currently returns >

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-27 Thread Ralph Castain
FWIW: this now appears true for *any* case where a parent connects to more than one child - i.e., if a process calls connect-accept more than once (e.g., in loop_spawn) This didn't used to be true, so something has changed in OMPI's underlying behavior. On May 26, 2014, at 11:27 PM, Gilles

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Thomas Naughton
Sure, if its helpful I can join a call. --tjn _ Thomas Naughton naught...@ornl.gov Research Associate (865) 576-4184 On Tue, 27 May 2014, Ralph

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Thomas Naughton
Inline comments ... way at the bottom. ;-) --tjn _ Thomas Naughton naught...@ornl.gov Research Associate (865) 576-4184 On Tue, 27 May 2014,

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Edgar Gabriel
not really, I stated my case, there is not much more to add. Its up to the group to decide, and I am fine with any decision. Edgar On 5/27/2014 2:57 PM, Ralph Castain wrote: > Forgot to add: would it help to discuss this over the phone instead? > > > On May 27, 2014, at 12:56 PM, Ralph Castain

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Ralph Castain
Forgot to add: would it help to discuss this over the phone instead? On May 27, 2014, at 12:56 PM, Ralph Castain wrote: > > On May 27, 2014, at 12:50 PM, Edgar Gabriel wrote: > >> >> >> On 5/27/2014 2:46 PM, Ralph Castain wrote: >>> >>> On May 27,

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Ralph Castain
On May 27, 2014, at 12:50 PM, Edgar Gabriel wrote: > > > On 5/27/2014 2:46 PM, Ralph Castain wrote: >> >> On May 27, 2014, at 12:27 PM, Edgar Gabriel >> wrote: >> >>> I'll let ORNL talk about the STCI component itself (which might >>> have additional

Re: [OMPI devel] some info is not pushed into the dstore

2014-05-27 Thread Ralph Castain
Hmmm...I did some digging, and the best I can tell is that root cause is that the second job ("b" in the test program) is never actually calling connect_accept! This looks like a change may have occurred in Intercomm_create that is causing it to not recognize the need to do so. Anyone confirm

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Edgar Gabriel
On 5/27/2014 2:46 PM, Ralph Castain wrote: > > On May 27, 2014, at 12:27 PM, Edgar Gabriel > wrote: > >> I'll let ORNL talk about the STCI component itself (which might >> have additional reasons), but keeping the code in trunk vs. an >> outside github/mercurial repository

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Ralph Castain
On May 27, 2014, at 12:27 PM, Edgar Gabriel wrote: > I'll let ORNL talk about the STCI component itself (which might have > additional reasons), but keeping the code in trunk vs. an outside > github/mercurial repository has two advantages in my opinion: i) it > simplifies the

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Edgar Gabriel
I'll let ORNL talk about the STCI component itself (which might have additional reasons), but keeping the code in trunk vs. an outside github/mercurial repository has two advantages in my opinion: i) it simplifies the propagation of know-how between the groups, and ii) avoids having to keep a

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Ralph Castain
I think so long as we leave these components out of any release, there is a limited potential for problems (probably most importantly, we sidestep all the issues about syncing releases!). However, that said, I'm not sure what it gains anyone to include a component that *isn't* going in a

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Edgar Gabriel
To through in my $0.02, I would see a benefit in adding the component to the trunk. As I mentioned in the last teleconf, we are currently working on adding support for the HPX runtime environment to Open MPI, and for various reasons (that I can explain if somebody is interested), we think at the

Re: [OMPI devel] RFC: remove PMI component in OMPI/RTE framework

2014-05-27 Thread Ralph Castain
Yeah, my concern is that we just had a user who was confused by it and thought they needed to build it to use PMI under Slurm - which is totally the wrong thing to do. So I removed it from the 1.8 branch to avoid any further confusion, and don't see any reason to continue carrying it in the

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Ralph Castain
I have mixed thoughts on this request. We have a policy of only including things in the code base that are of general utility - i.e., that should be generally distributed across the community. This component is only applicable to ORNL, and it would therefore seem more sensible to have it

Re: [OMPI devel] some info is not pushed into the dstore

2014-05-27 Thread Ralph Castain
Hi Gilles I concur on the typo and fixed it - thanks for catching it. I'll have to look into the problem you reported as it has been fixed in the past, and was working last I checked it. The info required for this 3-way connect/accept is supposed to be in the modex provided by the common

[OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Thomas Naughton
WHAT: add new component to ompi/rte framework WHY: because it will simplify our maintenance & provide an alt. reference WHEN: no rush, soon-ish? (June 12?) This is a component we currently maintain outside of the ompi tree to support using OMPI with an alternate runtime system. This will

Re: [OMPI devel] RFC: remove PMI component in OMPI/RTE framework

2014-05-27 Thread Thomas Naughton
Hi Ralph, This component does provide a alternate reference for the ompi-rte framework. But if it is unused (unmaintained), it seems less useful in practice. I'll post another RFC for related request. --tjn _ Thomas

Re: [OMPI devel] Still problems with del_procs in trunkj

2014-05-27 Thread Nathan Hjelm
On Mon, May 26, 2014 at 12:09:38PM +0900, Gilles Gouaillardet wrote: >Rolf, > >the assert fails because the endpoint reference count is greater than one. >the root cause is the endpoint has been added to the list of >eager_rdma_buffers of the openib btl device (and hence

Re: [OMPI devel] Threshold for pinning down user-buffers

2014-05-27 Thread Nathan Hjelm
This limit is controlled by several MCA variables. Contiguous segments larger than the btl_openib_eager_limit will use the RDMA protocol (Get) if mpi_leave_pinned is set and the RDMA RNDV (Put) protocol otherwise. Both of these protocol pin the user buffer on both sides. -Nathan On Fri, May 23,

[OMPI devel] some info is not pushed into the dstore

2014-05-27 Thread Gilles Gouaillardet
Folks, while debugging the dynamic/intercomm_create from the ibm test suite, i found something odd. i ran *without* any batch manager on a VM (one socket and four cpus) mpirun -np 1 ./dynamic/intercomm_create it hangs by default it works with --mca coll ^ml basically : - task 0 spawns task 1 -

[OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-27 Thread Gilles Gouaillardet
Folks, currently, the dynamic/intercomm_create test from the ibm test suite output the following messages : dpm_base_disconnect_init: error -12 in isend to process 1 the root cause it task 0 tries to send messages to already exited tasks. one way of seeing things is that this is an application

[OMPI devel] OMPI Opengrok config

2014-05-27 Thread Gilles Gouaillardet
Folks, OMPI Opengrok search (http://svn.open-mpi.org/source) currently returns results for : - trunk - v1.6 branch - v1.5 branch - v1.3 branch imho, it could/should return results for the following branches : - trunk - v1.8 branch - v1.6 branch and maybe the v1.4 branch (and the v1.9 branch when

Re: [OMPI devel] ompi_info not Giving Complete Output

2014-05-27 Thread Kevin Brown
Ah, I see. Thanks a lot guys. Kevin -- *Kevin A. Brown* *|* Tokyo Institute of Technology *|* *E-mail*: brown.k...@titech.ac.jp On Tue, May 27, 2014 at 3:06 AM, Jeff Squyres (jsquyres) wrote: > Or use --all. > > > On May 26, 2014, at 10:21 AM,