On Mon, Jul 21, 2014 at 1:41 PM, Yossi Etigin wrote:
> Right, but:
>
> 1. IMHO the rte_barrier in the wrong place (in the trunk)
>
In the trunk we have the rte_barrier prior to del_proc, which is what I
would have expected: quiescence the BTLs by reaching a point where
everybody agree tha
Right, but:
1. IMHO the rte_barrier in the wrong place (in the trunk)
2. In addition to the rte_barrier, need also mpi_barrier
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca
Sent: Monday, July 21, 2014 8:19 PM
To: Open MPI Developers
Subject: Re: [OMPI de
There was a long thread of discussion on why we must use an rte_barrier and
not an mpi_barrier during the finalize. Basically, we long as we have
connectionless unreliable BTLs we need an external mechanism to ensure
complete tear-down of the entire infrastructure. Thus, we need to rely on
an rte_b
I see. But in branch v1.8, in 31869, Ralph reverted the commit which moved
del_procs after the barrier:
"Revert r31851 until we can resolve how to close these leaks without causing
the usnic BTL to fail during disconnect of intercommunicators
Refs #4643"
Also, we need an rte barrier after de
I should add that it is an rte barrier and not an MPI barrier for
technical reasons.
-Nathan
On Mon, Jul 21, 2014 at 09:42:53AM -0700, Ralph Castain wrote:
>We already have an rte barrier before del procs
>
>Sent from my iPhone
>On Jul 21, 2014, at 8:21 AM, Yossi Etigin wrote:
>
>
We already have an rte barrier before del procs
Sent from my iPhone
> On Jul 21, 2014, at 8:21 AM, Yossi Etigin wrote:
>
> Hi,
>
> We get occasional hangs with MTL/MXM during finalize, because a global
> synchronization is needed before calling del_procs.
> e.g rank A may call del_procs() a
Hi,
We get occasional hangs with MTL/MXM during finalize, because a global
synchronization is needed before calling del_procs.
e.g rank A may call del_procs() and disconnect from rank B, while rank B is
still working.
What do you think about adding an MPI barrier on COMM_WORLD before calling
de