good to know !
how should we handle this within mtt ?
decrease nseconds to 570 ?
Cheers,
Gilles
On Thu, May 29, 2014 at 12:03 AM, Ralph Castain wrote:
> Ah, that satisfied it!
>
> Sorry for the chase - I'll update my test.
>
>
> On May 28, 2014, at 7:55 AM, Gilles
Ah, that satisfied it!
Sorry for the chase - I'll update my test.
On May 28, 2014, at 7:55 AM, Gilles Gouaillardet
wrote:
> Ralph,
>
> what if ?
>
> the parent :
> MPI_Comm_free();
> MPI_Comm_disconnect();
>
> and the child
> MPI_Comm_free();
>
On May 28, 2014, at 7:50 AM, Gilles Gouaillardet
wrote:
> Ralph,
>
> thanks for the info
>
>> can you detail your full mpirun command line, the number of servers you are
>> using, the btl involved and the ompi release that can be used to reproduce
>> the
On May 28, 2014, at 7:34 AM, George Bosilca wrote:
> Calling MPI_Comm_free is not enough from MPI perspective to clean up
> all knowledge about remote processes, nor to sever the links between
> the local and remote groups. One MUST call MPI_Comm_disconnect in
> order to
Calling MPI_Comm_free is not enough from MPI perspective to clean up
all knowledge about remote processes, nor to sever the links between
the local and remote groups. One MUST call MPI_Comm_disconnect in
order to achieve this.
Look at the code in ompi/mpi/c and see the difference between
On May 28, 2014, at 6:41 AM, Gilles Gouaillardet
wrote:
> Ralph,
>
>
> On Wed, May 28, 2014 at 9:33 PM, Ralph Castain wrote:
>> This is definetly what happens : only some tasks call MPI_Comm_free()
>
> Really? I don't see how that can
Ralph,
On Wed, May 28, 2014 at 9:33 PM, Ralph Castain wrote:
> This is definetly what happens : only some tasks call MPI_Comm_free()
>
>
> Really? I don't see how that can happen in loop_spawn - every process is
> clearly calling comm_free. Or are you referring to the
On May 28, 2014, at 4:31 AM, Jeff Squyres (jsquyres) wrote:
> On May 27, 2014, at 9:11 PM, Gilles Gouaillardet
> wrote:
>
>> in the case of intercomm_create, the children free all the communicators and
>> then MPI_Disconnect() and then
On May 28, 2014, at 4:45 AM, Gilles Gouaillardet
wrote:
> Jeff,
>
> On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres)
> > To be totally clear: MPI says it is erroneous for only some (not all)
> > processes in a communicator to call MPI_COMM_FREE. So if
You can adjust the number of iterations so the parent reaches the end - in my
case, I run it in a non-managed environment, and so there is no timeout. If you
run it that way, you'll see the end result when the parent attempts to finalize.
On May 27, 2014, at 11:18 PM, Gilles Gouaillardet
Jeff,
On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres)
> To be totally clear: MPI says it is erroneous for only some (not all)
processes in a communicator to call MPI_COMM_FREE. So if that's the real
problem, then the discussion about why the parent(s) is(are) trying to
contact the
Ralph,
On 2014/05/28 12:10, Ralph Castain wrote:
> my understanding is that there are two ways of seeing things :
> a) the "R-way" : the problem is the parent should not try to communicate to
> already exited processes
> b) the "J-way" : the problem is the children should have waited either in
Ralph,
in the case of intercomm_create, the children free all the communicators
and then MPI_Disconnect() and then MPI_Finalize() and exits.
the parent only MPI_Disconnect() without freeing all the communicators.
MPI_Finalize() tries to disconnect and communicate with already exited
processes.
Since you ignored my response, I'll reiterate and clarify it here. The problem
in the case of loop_spawn is that the parent process remains "connected" to
children after the child has finalized and died. Hence, when the parent
attempts to finalize, it tries to "disconnect" itself from processes
Note that MPI says that COMM_DISCONNECT simply disconnects that individual
communicator. It does *not* guarantee that the processes involved will be
fully disconnected.
So I think that the freeing of communicators is good app behavior, but it is
not required by the MPI spec.
If OMPI is
FWIW: this now appears true for *any* case where a parent connects to more than
one child - i.e., if a process calls connect-accept more than once (e.g., in
loop_spawn)
This didn't used to be true, so something has changed in OMPI's underlying
behavior.
On May 26, 2014, at 11:27 PM, Gilles
Folks,
currently, the dynamic/intercomm_create test from the ibm test suite output
the following messages :
dpm_base_disconnect_init: error -12 in isend to process 1
the root cause it task 0 tries to send messages to already exited tasks.
one way of seeing things is that this is an application
17 matches
Mail list logo