Ok figured it out. There were three problems with the del_procs code:
1) ompi_mpi_finalize used ompi_proc_all to get the list of procs but
never released the reference to them (ompi_proc_all called
OBJ_RETAIN on all the procs returned). When calling del_procs at
finalize it should
On Thu, May 15, 2014 at 11:44:05AM -0600, Nathan Hjelm wrote:
> On Thu, May 15, 2014 at 01:33:31PM -0400, George Bosilca wrote:
> > The solution you propose here is definitively not OK. It is 1) ugly and 2)
> > break the separation barrier that we hold dear.
>
> Which is why I asked :)
>
> >
On Thu, May 15, 2014 at 01:33:31PM -0400, George Bosilca wrote:
> The solution you propose here is definitively not OK. It is 1) ugly and 2)
> break the separation barrier that we hold dear.
Which is why I asked :)
> Regarding your other suggestion I don’t see any reasons not to call the
>
The solution you propose here is definitively not OK. It is 1) ugly and 2)
break the separation barrier that we hold dear.
Regarding your other suggestion I don’t see any reasons not to call the
delete_proc on MPI_COMM_WORLD as the last action we do before tearing down
everything else.
I fixed this by reverting r31765 in r31775. Annotated ticket with explanation.
On May 15, 2014, at 1:20 AM, Gilles Gouaillardet
wrote:
> Folks,
>
> since r31765 (opal/event: release the opal event context when closing
> the event base)
> mpirun crashes at the
What: We never call del_procs in the procs in comm world. This leads us
to leak the bml endpoints created by r2.
The proposed solution is not idea but it avoids adding a call to del
procs for comm world. Something I know would require more discussion
since there is likely a reason for that. I
Nathan,
this had no effect on my environment :-(
i am not sure you can reuse mca_btl_scif_module.scif_fd with connect()
i had to use a new scif fd for that.
then i ran into an other glitch : if the listen thread does not
scif_accept() the connection,
the scif_connect() will take 30 seconds
Folks,
since r31765 (opal/event: release the opal event context when closing
the event base)
mpirun crashes at the end of the job.
for example :
$ mpirun --mca btl tcp,self -n 4 `pwd`/src/MPI_Allreduce_user_c
MPITEST info (0): Starting MPI_Allreduce_user() test
MPITEST_results: