On Wed, May 28, 2014 at 03:28:59PM +, Dave Goodell (dgoodell) wrote:
> On May 13, 2014, at 4:01 PM, Nathan Hjelm wrote:
>
> > While tracking down memory leaks in components I ran into an interesting
> > issue. osc/rdma uses an opal_free_list_t (not an ompi_free_list_t) for
>
good to know !
how should we handle this within mtt ?
decrease nseconds to 570 ?
Cheers,
Gilles
On Thu, May 29, 2014 at 12:03 AM, Ralph Castain wrote:
> Ah, that satisfied it!
>
> Sorry for the chase - I'll update my test.
>
>
> On May 28, 2014, at 7:55 AM, Gilles
Ah, that satisfied it!
Sorry for the chase - I'll update my test.
On May 28, 2014, at 7:55 AM, Gilles Gouaillardet
wrote:
> Ralph,
>
> what if ?
>
> the parent :
> MPI_Comm_free();
> MPI_Comm_disconnect();
>
> and the child
> MPI_Comm_free();
>
On May 28, 2014, at 7:50 AM, Gilles Gouaillardet
wrote:
> Ralph,
>
> thanks for the info
>
>> can you detail your full mpirun command line, the number of servers you are
>> using, the btl involved and the ompi release that can be used to reproduce
>> the
On May 28, 2014, at 7:34 AM, George Bosilca wrote:
> Calling MPI_Comm_free is not enough from MPI perspective to clean up
> all knowledge about remote processes, nor to sever the links between
> the local and remote groups. One MUST call MPI_Comm_disconnect in
> order to
Calling MPI_Comm_free is not enough from MPI perspective to clean up
all knowledge about remote processes, nor to sever the links between
the local and remote groups. One MUST call MPI_Comm_disconnect in
order to achieve this.
Look at the code in ompi/mpi/c and see the difference between
On May 28, 2014, at 6:41 AM, Gilles Gouaillardet
wrote:
> Ralph,
>
>
> On Wed, May 28, 2014 at 9:33 PM, Ralph Castain wrote:
>> This is definetly what happens : only some tasks call MPI_Comm_free()
>
> Really? I don't see how that can
Ralph,
On Wed, May 28, 2014 at 9:33 PM, Ralph Castain wrote:
> This is definetly what happens : only some tasks call MPI_Comm_free()
>
>
> Really? I don't see how that can happen in loop_spawn - every process is
> clearly calling comm_free. Or are you referring to the
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
./configure --prefix=/home/common/openmpi/build/svn-trunk --enable-mpi-java
--enable-orterun-prefix-by-default
More inline below
On May 27, 2014, at 9:29 PM, Gilles Gouaillardet
wrote:
> Ralph,
>
> can you please
On May 28, 2014, at 1:18 AM, Gilles Gouaillardet
wrote:
> i finally got it :-)
Hooray! Thanks for digging deeper.
>
> /* i previously got it "almost" right ... */
>
> here is what happens on job 2 (with trunk) :
> MPI_Intercomm_create calls
On May 28, 2014, at 4:31 AM, Jeff Squyres (jsquyres) wrote:
> On May 27, 2014, at 9:11 PM, Gilles Gouaillardet
> wrote:
>
>> in the case of intercomm_create, the children free all the communicators and
>> then MPI_Disconnect() and then
On May 28, 2014, at 4:45 AM, Gilles Gouaillardet
wrote:
> Jeff,
>
> On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres)
> > To be totally clear: MPI says it is erroneous for only some (not all)
> > processes in a communicator to call MPI_COMM_FREE. So if
You can adjust the number of iterations so the parent reaches the end - in my
case, I run it in a non-managed environment, and so there is no timeout. If you
run it that way, you'll see the end result when the parent attempts to finalize.
On May 27, 2014, at 11:18 PM, Gilles Gouaillardet
Jeff,
On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres)
> To be totally clear: MPI says it is erroneous for only some (not all)
processes in a communicator to call MPI_COMM_FREE. So if that's the real
problem, then the discussion about why the parent(s) is(are) trying to
contact the
i finally got it :-)
/* i previously got it "almost" right ... */
here is what happens on job 2 (with trunk) :
MPI_Intercomm_create calls ompi_comm_get_rprocs that calls ompi_proc_unpack
=> ompi_proc_unpack store job 3 info into opal_dstore_peer
then ompi_comm_get_rprocs calls
Ralph,
can you please describe your environment (at least compiler (and version) +
configure command line)
i checked osc_rdma_data_move.c only :
size_t incoming_length; is used to improve readability.
it is used only in an assert clause and in OPAL_OUTPUT_VERBOSE
one way to silence the warning
Ralph,
On 2014/05/28 12:10, Ralph Castain wrote:
> my understanding is that there are two ways of seeing things :
> a) the "R-way" : the problem is the parent should not try to communicate to
> already exited processes
> b) the "J-way" : the problem is the children should have waited either in
17 matches
Mail list logo