Re: [OMPI devel] barrier before calling del_procs

2014-07-21 Thread George Bosilca
On Mon, Jul 21, 2014 at 1:41 PM, Yossi Etigin wrote: > Right, but: > > 1. IMHO the rte_barrier in the wrong place (in the trunk) > In the trunk we have the rte_barrier prior to del_proc, which is what I would have expected: quiescence the BTLs by reaching a point where everybody agree tha

Re: [OMPI devel] barrier before calling del_procs

2014-07-21 Thread Yossi Etigin
Right, but: 1. IMHO the rte_barrier in the wrong place (in the trunk) 2. In addition to the rte_barrier, need also mpi_barrier From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca Sent: Monday, July 21, 2014 8:19 PM To: Open MPI Developers Subject: Re: [OMPI de

Re: [OMPI devel] barrier before calling del_procs

2014-07-21 Thread George Bosilca
There was a long thread of discussion on why we must use an rte_barrier and not an mpi_barrier during the finalize. Basically, we long as we have connectionless unreliable BTLs we need an external mechanism to ensure complete tear-down of the entire infrastructure. Thus, we need to rely on an rte_b

Re: [OMPI devel] barrier before calling del_procs

2014-07-21 Thread Yossi Etigin
I see. But in branch v1.8, in 31869, Ralph reverted the commit which moved del_procs after the barrier: "Revert r31851 until we can resolve how to close these leaks without causing the usnic BTL to fail during disconnect of intercommunicators Refs #4643" Also, we need an rte barrier after de

Re: [OMPI devel] barrier before calling del_procs

2014-07-21 Thread Nathan Hjelm
I should add that it is an rte barrier and not an MPI barrier for technical reasons. -Nathan On Mon, Jul 21, 2014 at 09:42:53AM -0700, Ralph Castain wrote: >We already have an rte barrier before del procs > >Sent from my iPhone >On Jul 21, 2014, at 8:21 AM, Yossi Etigin wrote: > >

Re: [OMPI devel] barrier before calling del_procs

2014-07-21 Thread Ralph Castain
We already have an rte barrier before del procs Sent from my iPhone > On Jul 21, 2014, at 8:21 AM, Yossi Etigin wrote: > > Hi, > > We get occasional hangs with MTL/MXM during finalize, because a global > synchronization is needed before calling del_procs. > e.g rank A may call del_procs() a

[OMPI devel] barrier before calling del_procs

2014-07-21 Thread Yossi Etigin
Hi, We get occasional hangs with MTL/MXM during finalize, because a global synchronization is needed before calling del_procs. e.g rank A may call del_procs() and disconnect from rank B, while rank B is still working. What do you think about adding an MPI barrier on COMM_WORLD before calling de