Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread Ralph Castain
I may have an idea of what’s going on here - I just need to finish something else first and then I’ll take a look. > On Jun 4, 2016, at 4:20 PM, George Bosilca wrote: > >> >> On Jun 5, 2016, at 07:53 , Ralph Castain > > wrote: >> >>> >>> On Jun 4, 2016, at 1:11 PM,

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread George Bosilca
> On Jun 5, 2016, at 07:53 , Ralph Castain wrote: > >> >> On Jun 4, 2016, at 1:11 PM, George Bosilca > > wrote: >> >> >> >> On Sat, Jun 4, 2016 at 11:05 PM, Ralph Castain > > wrote: >> He can try adding "-mca state_base_verbose 5”, but if

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread Ralph Castain
> On Jun 4, 2016, at 1:11 PM, George Bosilca wrote: > > > > On Sat, Jun 4, 2016 at 11:05 PM, Ralph Castain > wrote: > He can try adding "-mca state_base_verbose 5”, but if we are failing to catch > sigchld, I’m not sure what debugging info is going to help resolve t

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread George Bosilca
On Sat, Jun 4, 2016 at 11:05 PM, Ralph Castain wrote: > He can try adding "-mca state_base_verbose 5”, but if we are failing to > catch sigchld, I’m not sure what debugging info is going to help resolve > that problem. These aren’t even fast-running apps, so there was plenty of > time to register

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread Ralph Castain
He can try adding "-mca state_base_verbose 5”, but if we are failing to catch sigchld, I’m not sure what debugging info is going to help resolve that problem. These aren’t even fast-running apps, so there was plenty of time to register for the signal prior to termination. I vaguely recollect th

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread Jeff Squyres (jsquyres)
Meh. Ok. Should George run with some verbose level to get more info? > On Jun 4, 2016, at 6:43 AM, Ralph Castain wrote: > > Neither of those threads have anything to do with catching the sigchld - > threads 4-5 are listening for OOB and PMIx connection requests. It looks more > like mpirun t

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread Ralph Castain
Neither of those threads have anything to do with catching the sigchld - threads 4-5 are listening for OOB and PMIx connection requests. It looks more like mpirun thought it had picked everything up and has begun shutting down, but I can’t really tell for certain. > On Jun 4, 2016, at 6:29 AM,

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread Jeff Squyres (jsquyres)
On Jun 3, 2016, at 11:07 PM, George Bosilca wrote: > > After finalize. As I said in my original email I se all the output the > application is generating, and all processes (which are local as this happens > on my laptop) are in zombie mode (Z+). This basically means whoever was > supposed to

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread George Bosilca
On Fri, Jun 3, 2016 at 11:10 PM, Jeff Squyres (jsquyres) wrote: > That's disappointing / puzzling. > > Threads 4 and 5 look like they're in the PMIX / ORTE progress threads, > respectively. > > But I don'tt see any obvious signs of what thread 1, 2, 3 are for. Huh. > > When is this hang happenin