Ralph,
in the case of intercomm_create, the children free all the communicators
and then MPI_Disconnect() and then MPI_Finalize() and exits.
the parent only MPI_Disconnect() without freeing all the communicators.
MPI_Finalize() tries to disconnect and communicate with already exited
processes.
Since you ignored my response, I'll reiterate and clarify it here. The problem
in the case of loop_spawn is that the parent process remains "connected" to
children after the child has finalized and died. Hence, when the parent
attempts to finalize, it tries to "disconnect" itself from processes
Note that MPI says that COMM_DISCONNECT simply disconnects that individual
communicator. It does *not* guarantee that the processes involved will be
fully disconnected.
So I think that the freeing of communicators is good app behavior, but it is
not required by the MPI spec.
If OMPI is
Thanks Jeff,
i can only speak for myself : i use OpenGrok on a daily basis and it is a
great help
Cheers,
Gilles
On Wed, May 28, 2014 at 8:21 AM, Jeff Squyres (jsquyres) wrote:
> I can ask IU to adjust the OpenGrok config.
>
>
> On May 27, 2014, at 1:06 AM, Gilles
I can ask IU to adjust the OpenGrok config.
On May 27, 2014, at 1:06 AM, Gilles Gouaillardet
wrote:
> Folks,
>
> OMPI Opengrok search (http://svn.open-mpi.org/source) currently returns
> results for :
> - trunk
> - v1.6 branch
> - v1.5 branch
> - v1.3 branch
>
On May 27, 2014, at 2:28 PM, George Bosilca wrote:
> On Tue, May 27, 2014 at 5:09 PM, Ralph Castain wrote:
>>> That being said, I agree with Ralph on the fact that accepting them in
>>> the trunk doesn't automatically qualify it for inclusion in any
>>>
On Tue, May 27, 2014 at 5:09 PM, Ralph Castain wrote:
>> That being said, I agree with Ralph on the fact that accepting them in
>> the trunk doesn't automatically qualify it for inclusion in any
>> further stable release. However, if ORNL setup nightly builds to
>> validate
On May 27, 2014, at 1:50 PM, George Bosilca wrote:
> From a practical perspective, I don't think there is a need for a
> phone call. Ralph made his point, and we all took notice of it.
> However, the proposed changes are in a single independent component,
> with no impact
>From a practical perspective, I don't think there is a need for a
phone call. Ralph made his point, and we all took notice of it.
However, the proposed changes are in a single independent component,
with no impact on the rest of the code base. Therefore, there is
absolutely no valid reason not to
Not sure, but I suspect Jeff set that up as a lark sometime in the past and it
hasn't been maintained in years.
On May 26, 2014, at 10:06 PM, Gilles Gouaillardet
wrote:
> Folks,
>
> OMPI Opengrok search (http://svn.open-mpi.org/source) currently returns
>
FWIW: this now appears true for *any* case where a parent connects to more than
one child - i.e., if a process calls connect-accept more than once (e.g., in
loop_spawn)
This didn't used to be true, so something has changed in OMPI's underlying
behavior.
On May 26, 2014, at 11:27 PM, Gilles
Sure, if its helpful I can join a call.
--tjn
_
Thomas Naughton naught...@ornl.gov
Research Associate (865) 576-4184
On Tue, 27 May 2014, Ralph
Inline comments ... way at the bottom. ;-)
--tjn
_
Thomas Naughton naught...@ornl.gov
Research Associate (865) 576-4184
On Tue, 27 May 2014,
not really, I stated my case, there is not much more to add. Its up to
the group to decide, and I am fine with any decision.
Edgar
On 5/27/2014 2:57 PM, Ralph Castain wrote:
> Forgot to add: would it help to discuss this over the phone instead?
>
>
> On May 27, 2014, at 12:56 PM, Ralph Castain
Forgot to add: would it help to discuss this over the phone instead?
On May 27, 2014, at 12:56 PM, Ralph Castain wrote:
>
> On May 27, 2014, at 12:50 PM, Edgar Gabriel wrote:
>
>>
>>
>> On 5/27/2014 2:46 PM, Ralph Castain wrote:
>>>
>>> On May 27,
On May 27, 2014, at 12:50 PM, Edgar Gabriel wrote:
>
>
> On 5/27/2014 2:46 PM, Ralph Castain wrote:
>>
>> On May 27, 2014, at 12:27 PM, Edgar Gabriel
>> wrote:
>>
>>> I'll let ORNL talk about the STCI component itself (which might
>>> have additional
Hmmm...I did some digging, and the best I can tell is that root cause is that
the second job ("b" in the test program) is never actually calling
connect_accept! This looks like a change may have occurred in Intercomm_create
that is causing it to not recognize the need to do so.
Anyone confirm
On 5/27/2014 2:46 PM, Ralph Castain wrote:
>
> On May 27, 2014, at 12:27 PM, Edgar Gabriel
> wrote:
>
>> I'll let ORNL talk about the STCI component itself (which might
>> have additional reasons), but keeping the code in trunk vs. an
>> outside github/mercurial repository
On May 27, 2014, at 12:27 PM, Edgar Gabriel wrote:
> I'll let ORNL talk about the STCI component itself (which might have
> additional reasons), but keeping the code in trunk vs. an outside
> github/mercurial repository has two advantages in my opinion: i) it
> simplifies the
I'll let ORNL talk about the STCI component itself (which might have
additional reasons), but keeping the code in trunk vs. an outside
github/mercurial repository has two advantages in my opinion: i) it
simplifies the propagation of know-how between the groups, and ii)
avoids having to keep a
I think so long as we leave these components out of any release, there is a
limited potential for problems (probably most importantly, we sidestep all the
issues about syncing releases!).
However, that said, I'm not sure what it gains anyone to include a component
that *isn't* going in a
To through in my $0.02, I would see a benefit in adding the component to
the trunk. As I mentioned in the last teleconf, we are currently working
on adding support for the HPX runtime environment to Open MPI, and for
various reasons (that I can explain if somebody is interested), we think
at the
Yeah, my concern is that we just had a user who was confused by it and thought
they needed to build it to use PMI under Slurm - which is totally the wrong
thing to do. So I removed it from the 1.8 branch to avoid any further
confusion, and don't see any reason to continue carrying it in the
I have mixed thoughts on this request. We have a policy of only including
things in the code base that are of general utility - i.e., that should be
generally distributed across the community. This component is only applicable
to ORNL, and it would therefore seem more sensible to have it
Hi Gilles
I concur on the typo and fixed it - thanks for catching it. I'll have to look
into the problem you reported as it has been fixed in the past, and was working
last I checked it. The info required for this 3-way connect/accept is supposed
to be in the modex provided by the common
WHAT: add new component to ompi/rte framework
WHY: because it will simplify our maintenance & provide an alt. reference
WHEN: no rush, soon-ish? (June 12?)
This is a component we currently maintain outside of the ompi tree to
support using OMPI with an alternate runtime system. This will
Hi Ralph,
This component does provide a alternate reference for the ompi-rte
framework. But if it is unused (unmaintained), it seems less useful in
practice. I'll post another RFC for related request.
--tjn
_
Thomas
On Mon, May 26, 2014 at 12:09:38PM +0900, Gilles Gouaillardet wrote:
>Rolf,
>
>the assert fails because the endpoint reference count is greater than one.
>the root cause is the endpoint has been added to the list of
>eager_rdma_buffers of the openib btl device (and hence
This limit is controlled by several MCA variables. Contiguous segments
larger than the btl_openib_eager_limit will use the RDMA protocol (Get) if
mpi_leave_pinned is set and the RDMA RNDV (Put) protocol
otherwise. Both of these protocol pin the user buffer on both sides.
-Nathan
On Fri, May 23,
Folks,
while debugging the dynamic/intercomm_create from the ibm test suite, i
found something odd.
i ran *without* any batch manager on a VM (one socket and four cpus)
mpirun -np 1 ./dynamic/intercomm_create
it hangs by default
it works with --mca coll ^ml
basically :
- task 0 spawns task 1
-
Folks,
currently, the dynamic/intercomm_create test from the ibm test suite output
the following messages :
dpm_base_disconnect_init: error -12 in isend to process 1
the root cause it task 0 tries to send messages to already exited tasks.
one way of seeing things is that this is an application
Folks,
OMPI Opengrok search (http://svn.open-mpi.org/source) currently returns
results for :
- trunk
- v1.6 branch
- v1.5 branch
- v1.3 branch
imho, it could/should return results for the following branches :
- trunk
- v1.8 branch
- v1.6 branch
and maybe the v1.4 branch (and the v1.9 branch when
Ah, I see.
Thanks a lot guys.
Kevin
--
*Kevin A. Brown* *|* Tokyo Institute of Technology *|* *E-mail*:
brown.k...@titech.ac.jp
On Tue, May 27, 2014 at 3:06 AM, Jeff Squyres (jsquyres) wrote:
> Or use --all.
>
>
> On May 26, 2014, at 10:21 AM,
33 matches
Mail list logo