Re: [OMPI devel] Master won't build

2015-11-09 Thread Ralph Castain
Yeah, I’m sure it does have an ancient lustre install on it - odin has become very long in the tooth. :-/ I think it would be wise to have that configure logic. I don’t know how common it is to have an old install out there, but a little defensive programming would be best. If you don’t have a

Re: [OMPI devel] PMIX deadlock

2015-11-09 Thread Ralph Castain
Those are certainly valid suggestions, and I’ll incorporate them. However, I doubt that’s the root cause here as the thread gets started well before the first client is fork/exec’d by the daemon. > On Nov 8, 2015, at 8:53 PM, Nysal Jan K A wrote: > > In listen_thread(): > 194 while (pmix_

Re: [OMPI devel] PMIX deadlock

2015-11-09 Thread Artem Polyakov
This is the very good point, Nysal! This is definitely a problem and I can say even more: avg. 3 from every 10 tasks was affected by this bug. Once the PR ( https://github.com/pmix/master/pull/8) was applied I was able to run 100 testing tasks without any hangs. Here some more information on my s

Re: [OMPI devel] PMIX deadlock

2015-11-09 Thread Artem Polyakov
2015-11-09 22:42 GMT+06:00 Artem Polyakov : > This is the very good point, Nysal! > > This is definitely a problem and I can say even more: avg. 3 from every 10 > tasks was affected by this bug. Once the PR ( > https://github.com/pmix/master/pull/8) was applied I was able to run 100 > testing task

Re: [OMPI devel] PMIX deadlock

2015-11-09 Thread George Bosilca
Clearly Nyal got a valid point there. I launched a stress test with Nysal suggestion in the code, and so far it's up to few hundreds iterations without deadlock. I would not claim victory yet, I launched a 10k cycle to see where we stand (btw this never passed before). I'll let you know the outcome

Re: [OMPI devel] PMIX deadlock

2015-11-09 Thread Ralph Castain
Looking at it, I think I see what was happening. The thread would start, but then immediately see that the active flag was false and would exit. This left the server without any listening thread - but it wouldn’t detect this had happened. It was therefore a race between whether the thread checke

Re: [OMPI devel] PMIX deadlock

2015-11-09 Thread George Bosilca
All 10k tests completed successfully. Nysal pinpointed the real problem behind the deadlocks. :+1: George. On Mon, Nov 9, 2015 at 1:13 PM, Ralph Castain wrote: > Looking at it, I think I see what was happening. The thread would start, > but then immediately see that the active flag was false

Re: [OMPI devel] PMIX deadlock

2015-11-09 Thread Mark Santcroos
It seems the change suggested by Nysal also allows me to run into the next problem ;-) Mark > On 09 Nov 2015, at 20:19 , George Bosilca wrote: > > All 10k tests completed successfully. Nysal pinpointed the real problem > behind the deadlocks. :+1: > > George. > > > On Mon, Nov 9, 2015 at

Re: [OMPI devel] PMIX deadlock

2015-11-09 Thread Joshua Ladd
Thanks, Nysal!! Good catch! Josh On Mon, Nov 9, 2015 at 2:27 PM, Mark Santcroos wrote: > It seems the change suggested by Nysal also allows me to run into the next > problem ;-) > > Mark > > > On 09 Nov 2015, at 20:19 , George Bosilca wrote: > > > > All 10k tests completed successfully. Nysal