Jeff and I have debated this question. We could reduce the number of
iterations, or we could up the time limit. Better solution is probably to speed
it back up again as it *used* to complete in the current time limit. So we
thought we'd leave it alone for now as a reminder that we need to addres
good to know !
how should we handle this within mtt ?
decrease nseconds to 570 ?
Cheers,
Gilles
On Thu, May 29, 2014 at 12:03 AM, Ralph Castain wrote:
> Ah, that satisfied it!
>
> Sorry for the chase - I'll update my test.
>
>
> On May 28, 2014, at 7:55 AM, Gilles Gouaillardet <
> gilles.gou
Ah, that satisfied it!
Sorry for the chase - I'll update my test.
On May 28, 2014, at 7:55 AM, Gilles Gouaillardet
wrote:
> Ralph,
>
> what if ?
>
> the parent :
> MPI_Comm_free(&merged);
> MPI_Comm_disconnect(&comm);
>
> and the child
> MPI_Comm_free(&merged);
> MPI_Comm_disconnect(&paren
Ralph,
what if ?
the parent :
MPI_Comm_free(&merged);
MPI_Comm_disconnect(&comm);
and the child
MPI_Comm_free(&merged);
MPI_Comm_disconnect(&parent);
Gilles
> Good point - however, that doesn't fix it. Changing the Comm_free calls to
> Comm_disconnect results in the same error messages when t
On May 28, 2014, at 7:50 AM, Gilles Gouaillardet
wrote:
> Ralph,
>
> thanks for the info
>
>> can you detail your full mpirun command line, the number of servers you are
>> using, the btl involved and the ompi release that can be used to reproduce
>> the issue ?
>
> Running on only one ser
Ralph,
thanks for the info
can you detail your full mpirun command line, the number of servers you are
> using, the btl involved and the ompi release that can be used to reproduce
> the issue ?
>
>
> Running on only one server, using the current head of the svn repo. My
> cluster only has Etherne
On May 28, 2014, at 7:34 AM, George Bosilca wrote:
> Calling MPI_Comm_free is not enough from MPI perspective to clean up
> all knowledge about remote processes, nor to sever the links between
> the local and remote groups. One MUST call MPI_Comm_disconnect in
> order to achieve this.
>
> Look
Calling MPI_Comm_free is not enough from MPI perspective to clean up
all knowledge about remote processes, nor to sever the links between
the local and remote groups. One MUST call MPI_Comm_disconnect in
order to achieve this.
Look at the code in ompi/mpi/c and see the difference between
MPI_Comm_
On May 28, 2014, at 6:41 AM, Gilles Gouaillardet
wrote:
> Ralph,
>
>
> On Wed, May 28, 2014 at 9:33 PM, Ralph Castain wrote:
>> This is definetly what happens : only some tasks call MPI_Comm_free()
>
> Really? I don't see how that can happen in loop_spawn - every process is
> clearly calli
Ralph,
On Wed, May 28, 2014 at 9:33 PM, Ralph Castain wrote:
> This is definetly what happens : only some tasks call MPI_Comm_free()
>
>
> Really? I don't see how that can happen in loop_spawn - every process is
> clearly calling comm_free. Or are you referring to the intercomm_create
> test?
>
On May 28, 2014, at 4:31 AM, Jeff Squyres (jsquyres) wrote:
> On May 27, 2014, at 9:11 PM, Gilles Gouaillardet
> wrote:
>
>> in the case of intercomm_create, the children free all the communicators and
>> then MPI_Disconnect() and then MPI_Finalize() and exits.
>> the parent only MPI_Disconn
On May 28, 2014, at 4:45 AM, Gilles Gouaillardet
wrote:
> Jeff,
>
> On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres)
> > To be totally clear: MPI says it is erroneous for only some (not all)
> > processes in a communicator to call MPI_COMM_FREE. So if that's the real
> > problem, t
You can adjust the number of iterations so the parent reaches the end - in my
case, I run it in a non-managed environment, and so there is no timeout. If you
run it that way, you'll see the end result when the parent attempts to finalize.
On May 27, 2014, at 11:18 PM, Gilles Gouaillardet
wrote
Jeff,
On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres)
> To be totally clear: MPI says it is erroneous for only some (not all)
processes in a communicator to call MPI_COMM_FREE. So if that's the real
problem, then the discussion about why the parent(s) is(are) trying to
contact the childr
On May 27, 2014, at 9:11 PM, Gilles Gouaillardet
wrote:
> in the case of intercomm_create, the children free all the communicators and
> then MPI_Disconnect() and then MPI_Finalize() and exits.
> the parent only MPI_Disconnect() without freeing all the communicators.
> MPI_Finalize() tries to
Ralph,
i could not find anything wrong with loop_spawn and unless i am missing
something obvious :
from mtt http://mtt.open-mpi.org/index.php?do_redir=2196
all tests ran this month (both trunk and v1.8) failed (timeout) and there
was no error message such as
dpm_base_disconnect_init: error -12 i
Ralph,
On 2014/05/28 12:10, Ralph Castain wrote:
> my understanding is that there are two ways of seeing things :
> a) the "R-way" : the problem is the parent should not try to communicate to
> already exited processes
> b) the "J-way" : the problem is the children should have waited either in
On May 27, 2014, at 6:11 PM, Gilles Gouaillardet
wrote:
> Ralph,
>
> in the case of intercomm_create, the children free all the communicators and
> then MPI_Disconnect() and then MPI_Finalize() and exits.
> the parent only MPI_Disconnect() without freeing all the communicators.
> MPI_Finaliz
Ralph,
in the case of intercomm_create, the children free all the communicators
and then MPI_Disconnect() and then MPI_Finalize() and exits.
the parent only MPI_Disconnect() without freeing all the communicators.
MPI_Finalize() tries to disconnect and communicate with already exited
processes.
my
Since you ignored my response, I'll reiterate and clarify it here. The problem
in the case of loop_spawn is that the parent process remains "connected" to
children after the child has finalized and died. Hence, when the parent
attempts to finalize, it tries to "disconnect" itself from processes
Note that MPI says that COMM_DISCONNECT simply disconnects that individual
communicator. It does *not* guarantee that the processes involved will be
fully disconnected.
So I think that the freeing of communicators is good app behavior, but it is
not required by the MPI spec.
If OMPI is requir
FWIW: this now appears true for *any* case where a parent connects to more than
one child - i.e., if a process calls connect-accept more than once (e.g., in
loop_spawn)
This didn't used to be true, so something has changed in OMPI's underlying
behavior.
On May 26, 2014, at 11:27 PM, Gilles Go
Folks,
currently, the dynamic/intercomm_create test from the ibm test suite output
the following messages :
dpm_base_disconnect_init: error -12 in isend to process 1
the root cause it task 0 tries to send messages to already exited tasks.
one way of seeing things is that this is an application
23 matches
Mail list logo