Ralph,
On 2014/05/28 12:10, Ralph Castain wrote:
> my understanding is that there are two ways of seeing things :
> a) the "R-way" : the problem is the parent should not try to communicate to
> already exited processes
> b) the "J-way" : the problem is the children should have waited either in
On May 27, 2014, at 6:11 PM, Gilles Gouaillardet
wrote:
> Ralph,
>
> in the case of intercomm_create, the children free all the communicators and
> then MPI_Disconnect() and then MPI_Finalize() and exits.
> the parent only MPI_Disconnect() without freeing all the communicators.
> MPI_Finaliz
Ralph,
in the case of intercomm_create, the children free all the communicators
and then MPI_Disconnect() and then MPI_Finalize() and exits.
the parent only MPI_Disconnect() without freeing all the communicators.
MPI_Finalize() tries to disconnect and communicate with already exited
processes.
my
Since you ignored my response, I'll reiterate and clarify it here. The problem
in the case of loop_spawn is that the parent process remains "connected" to
children after the child has finalized and died. Hence, when the parent
attempts to finalize, it tries to "disconnect" itself from processes
Note that MPI says that COMM_DISCONNECT simply disconnects that individual
communicator. It does *not* guarantee that the processes involved will be
fully disconnected.
So I think that the freeing of communicators is good app behavior, but it is
not required by the MPI spec.
If OMPI is requir
Thanks Jeff,
i can only speak for myself : i use OpenGrok on a daily basis and it is a
great help
Cheers,
Gilles
On Wed, May 28, 2014 at 8:21 AM, Jeff Squyres (jsquyres) wrote:
> I can ask IU to adjust the OpenGrok config.
>
>
> On May 27, 2014, at 1:06 AM, Gilles Gouaillardet <
> gilles.gou
I can ask IU to adjust the OpenGrok config.
On May 27, 2014, at 1:06 AM, Gilles Gouaillardet
wrote:
> Folks,
>
> OMPI Opengrok search (http://svn.open-mpi.org/source) currently returns
> results for :
> - trunk
> - v1.6 branch
> - v1.5 branch
> - v1.3 branch
>
> imho, it could/should return
On May 27, 2014, at 2:28 PM, George Bosilca wrote:
> On Tue, May 27, 2014 at 5:09 PM, Ralph Castain wrote:
>>> That being said, I agree with Ralph on the fact that accepting them in
>>> the trunk doesn't automatically qualify it for inclusion in any
>>> further stable release. However, if ORNL
On Tue, May 27, 2014 at 5:09 PM, Ralph Castain wrote:
>> That being said, I agree with Ralph on the fact that accepting them in
>> the trunk doesn't automatically qualify it for inclusion in any
>> further stable release. However, if ORNL setup nightly builds to
>> validate their module, I'm prett
On May 27, 2014, at 1:50 PM, George Bosilca wrote:
> From a practical perspective, I don't think there is a need for a
> phone call. Ralph made his point, and we all took notice of it.
> However, the proposed changes are in a single independent component,
> with no impact on the rest of the code
>From a practical perspective, I don't think there is a need for a
phone call. Ralph made his point, and we all took notice of it.
However, the proposed changes are in a single independent component,
with no impact on the rest of the code base. Therefore, there is
absolutely no valid reason not to
Not sure, but I suspect Jeff set that up as a lark sometime in the past and it
hasn't been maintained in years.
On May 26, 2014, at 10:06 PM, Gilles Gouaillardet
wrote:
> Folks,
>
> OMPI Opengrok search (http://svn.open-mpi.org/source) currently returns
> results for :
> - trunk
> - v1.6 br
FWIW: this now appears true for *any* case where a parent connects to more than
one child - i.e., if a process calls connect-accept more than once (e.g., in
loop_spawn)
This didn't used to be true, so something has changed in OMPI's underlying
behavior.
On May 26, 2014, at 11:27 PM, Gilles Go
Sure, if its helpful I can join a call.
--tjn
_
Thomas Naughton naught...@ornl.gov
Research Associate (865) 576-4184
On Tue, 27 May 2014, Ralph Ca
Inline comments ... way at the bottom. ;-)
--tjn
_
Thomas Naughton naught...@ornl.gov
Research Associate (865) 576-4184
On Tue, 27 May 2014, Ralp
not really, I stated my case, there is not much more to add. Its up to
the group to decide, and I am fine with any decision.
Edgar
On 5/27/2014 2:57 PM, Ralph Castain wrote:
> Forgot to add: would it help to discuss this over the phone instead?
>
>
> On May 27, 2014, at 12:56 PM, Ralph Castain
Forgot to add: would it help to discuss this over the phone instead?
On May 27, 2014, at 12:56 PM, Ralph Castain wrote:
>
> On May 27, 2014, at 12:50 PM, Edgar Gabriel wrote:
>
>>
>>
>> On 5/27/2014 2:46 PM, Ralph Castain wrote:
>>>
>>> On May 27, 2014, at 12:27 PM, Edgar Gabriel
>>> wro
On May 27, 2014, at 12:50 PM, Edgar Gabriel wrote:
>
>
> On 5/27/2014 2:46 PM, Ralph Castain wrote:
>>
>> On May 27, 2014, at 12:27 PM, Edgar Gabriel
>> wrote:
>>
>>> I'll let ORNL talk about the STCI component itself (which might
>>> have additional reasons), but keeping the code in trunk
Hmmm...I did some digging, and the best I can tell is that root cause is that
the second job ("b" in the test program) is never actually calling
connect_accept! This looks like a change may have occurred in Intercomm_create
that is causing it to not recognize the need to do so.
Anyone confirm
On 5/27/2014 2:46 PM, Ralph Castain wrote:
>
> On May 27, 2014, at 12:27 PM, Edgar Gabriel
> wrote:
>
>> I'll let ORNL talk about the STCI component itself (which might
>> have additional reasons), but keeping the code in trunk vs. an
>> outside github/mercurial repository has two advantages i
On May 27, 2014, at 12:27 PM, Edgar Gabriel wrote:
> I'll let ORNL talk about the STCI component itself (which might have
> additional reasons), but keeping the code in trunk vs. an outside
> github/mercurial repository has two advantages in my opinion: i) it
> simplifies the propagation of know
I'll let ORNL talk about the STCI component itself (which might have
additional reasons), but keeping the code in trunk vs. an outside
github/mercurial repository has two advantages in my opinion: i) it
simplifies the propagation of know-how between the groups, and ii)
avoids having to keep a separ
I think so long as we leave these components out of any release, there is a
limited potential for problems (probably most importantly, we sidestep all the
issues about syncing releases!).
However, that said, I'm not sure what it gains anyone to include a component
that *isn't* going in a releas
To through in my $0.02, I would see a benefit in adding the component to
the trunk. As I mentioned in the last teleconf, we are currently working
on adding support for the HPX runtime environment to Open MPI, and for
various reasons (that I can explain if somebody is interested), we think
at the mo
Yeah, my concern is that we just had a user who was confused by it and thought
they needed to build it to use PMI under Slurm - which is totally the wrong
thing to do. So I removed it from the 1.8 branch to avoid any further
confusion, and don't see any reason to continue carrying it in the trun
I have mixed thoughts on this request. We have a policy of only including
things in the code base that are of general utility - i.e., that should be
generally distributed across the community. This component is only applicable
to ORNL, and it would therefore seem more sensible to have it continu
Hi Gilles
I concur on the typo and fixed it - thanks for catching it. I'll have to look
into the problem you reported as it has been fixed in the past, and was working
last I checked it. The info required for this 3-way connect/accept is supposed
to be in the modex provided by the common commun
WHAT: add new component to ompi/rte framework
WHY: because it will simplify our maintenance & provide an alt. reference
WHEN: no rush, soon-ish? (June 12?)
This is a component we currently maintain outside of the ompi tree to
support using OMPI with an alternate runtime system. This will
Hi Ralph,
This component does provide a alternate reference for the ompi-rte
framework. But if it is unused (unmaintained), it seems less useful in
practice. I'll post another RFC for related request.
--tjn
_
Thomas Na
On Mon, May 26, 2014 at 12:09:38PM +0900, Gilles Gouaillardet wrote:
>Rolf,
>
>the assert fails because the endpoint reference count is greater than one.
>the root cause is the endpoint has been added to the list of
>eager_rdma_buffers of the openib btl device (and hence OBJ_RETAIN
This limit is controlled by several MCA variables. Contiguous segments
larger than the btl_openib_eager_limit will use the RDMA protocol (Get) if
mpi_leave_pinned is set and the RDMA RNDV (Put) protocol
otherwise. Both of these protocol pin the user buffer on both sides.
-Nathan
On Fri, May 23, 2
Folks,
while debugging the dynamic/intercomm_create from the ibm test suite, i
found something odd.
i ran *without* any batch manager on a VM (one socket and four cpus)
mpirun -np 1 ./dynamic/intercomm_create
it hangs by default
it works with --mca coll ^ml
basically :
- task 0 spawns task 1
-
Folks,
currently, the dynamic/intercomm_create test from the ibm test suite output
the following messages :
dpm_base_disconnect_init: error -12 in isend to process 1
the root cause it task 0 tries to send messages to already exited tasks.
one way of seeing things is that this is an application
Folks,
OMPI Opengrok search (http://svn.open-mpi.org/source) currently returns
results for :
- trunk
- v1.6 branch
- v1.5 branch
- v1.3 branch
imho, it could/should return results for the following branches :
- trunk
- v1.8 branch
- v1.6 branch
and maybe the v1.4 branch (and the v1.9 branch when
Ah, I see.
Thanks a lot guys.
Kevin
--
*Kevin A. Brown* *|* Tokyo Institute of Technology *|* *E-mail*:
brown.k...@titech.ac.jp
On Tue, May 27, 2014 at 3:06 AM, Jeff Squyres (jsquyres) wrote:
> Or use --all.
>
>
> On May 26, 2014, at 10:21 AM, Ralph Castain wrote:
35 matches
Mail list logo