Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-16 Thread Ralph Castain
I found the reason for the notification and fixed that as well - should all be done now > On Jul 16, 2016, at 10:37 AM, Ralph Castain wrote: > > Kewl - thanks! I will take care of this, but to me the most pressing issue is > why this event notification is being generated at all. It shouldn’t b

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-16 Thread Ralph Castain
Kewl - thanks! I will take care of this, but to me the most pressing issue is why this event notification is being generated at all. It shouldn’t be. > On Jul 16, 2016, at 9:11 AM, Gilles Gouaillardet > wrote: > > I finally got it :-) > > in send_notification() from orted_submit.c, info is >

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-16 Thread Gilles Gouaillardet
I finally got it :-) in send_notification() from orted_submit.c, info is OPAL_PMIX_EVENT_NON_DEFAULT, but in pmix2x.c and pmix_ext20.c, PMIX_EVENT_NON_DEFAULT is tested. If I use OPAL_PMIX_EVENT_NON_DEFAULT in pmix*, that fixes the issue Cheers, Gilles On Sunday, July 17, 2016, Ralph Castain w

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-16 Thread Ralph Castain
Okay, I’ll investigate why that is happening - thanks! > On Jul 16, 2016, at 7:45 AM, Gilles Gouaillardet > wrote: > > The parent job (e.g. the task that calls MPI_Comm_spawn) receives it. > I cannot tell whether the child (e.g. the spawned task) receives it too or not > > Cheers, > > Gilles

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-16 Thread Gilles Gouaillardet
The parent job (e.g. the task that calls MPI_Comm_spawn) receives it. I cannot tell whether the child (e.g. the spawned task) receives it too or not Cheers, Gilles On Saturday, July 16, 2016, Ralph Castain wrote: > I can fix the initialization. What puzzles me is that no debugger_release > me

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-16 Thread Ralph Castain
I can fix the initialization. What puzzles me is that no debugger_release message should be sent unless a debugger is attached - in which case, the event should be registered. So why is it being sent? Is it the child job that is receiving it? Or is it the parent? > On Jul 16, 2016, at 7:19 AM

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-16 Thread Gilles Gouaillardet
I found some time to investigate this. tscon should initialize nondefault to false in both pmix2x.c and pmix_ext20.c A better workaround is to update ompi_errhandler_callback, so it does not invoke ompi_mpi_abort if status is OPAL_ERR_DEBUGGER_RELEASE That still seems counter intuitive to me ...

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-15 Thread Ralph Castain
Okay, I’ll take a look - thanks! > On Jul 15, 2016, at 7:08 AM, Gilles Gouaillardet > wrote: > > > Yep, > > The constructor of pmix2x_threadshift_t (tscon) does not initialize > nondefault to false. > I won't be able to investigate this until Monday, but so far, my guess is > that if the co

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-15 Thread Gilles Gouaillardet
Yep, The constructor of pmix2x_threadshift_t (tscon) does not initialize nondefault to false. I won't be able to investigate this until Monday, but so far, my guess is that if the constructor is fixed, then RHEL6 will fail like RHEL7 ... fwiw, the intercomm_create used to fail in Cisco mtt becaus

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-15 Thread Ralph Castain
That would break debugger attach. Sounds to me like it’s just an uninitialized variable for in_event_hdlr? > On Jul 15, 2016, at 1:20 AM, Gilles Gouaillardet wrote: > > Ralph, > > i noticed MPI_Comm_spawn is broken on master and on RHEL7 > > for some reason i cannot yet explain, it works just

[OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-15 Thread Gilles Gouaillardet
Ralph, i noticed MPI_Comm_spawn is broken on master and on RHEL7 for some reason i cannot yet explain, it works just fine on RHEL6 (!) mpirun -np 1 ./dynamic/intercomm_create from the ibm test suite can be used to reproduce the issue. i digged a bit and i found OPAL_ERR_DEBUGGER_RELEASE is