Yeah, I’m sure it does have an ancient lustre install on it - odin has become
very long in the tooth. :-/
I think it would be wise to have that configure logic. I don’t know how common
it is to have an old install out there, but a little defensive programming
would be best.
If you don’t have a
Those are certainly valid suggestions, and I’ll incorporate them. However, I
doubt that’s the root cause here as the thread gets started well before the
first client is fork/exec’d by the daemon.
> On Nov 8, 2015, at 8:53 PM, Nysal Jan K A wrote:
>
> In listen_thread():
> 194 while (pmix_
This is the very good point, Nysal!
This is definitely a problem and I can say even more: avg. 3 from every 10
tasks was affected by this bug. Once the PR (
https://github.com/pmix/master/pull/8) was applied I was able to run 100
testing tasks without any hangs.
Here some more information on my s
2015-11-09 22:42 GMT+06:00 Artem Polyakov :
> This is the very good point, Nysal!
>
> This is definitely a problem and I can say even more: avg. 3 from every 10
> tasks was affected by this bug. Once the PR (
> https://github.com/pmix/master/pull/8) was applied I was able to run 100
> testing task
Clearly Nyal got a valid point there. I launched a stress test with Nysal
suggestion in the code, and so far it's up to few hundreds iterations
without deadlock. I would not claim victory yet, I launched a 10k cycle to
see where we stand (btw this never passed before).
I'll let you know the outcome
Looking at it, I think I see what was happening. The thread would start, but
then immediately see that the active flag was false and would exit. This left
the server without any listening thread - but it wouldn’t detect this had
happened. It was therefore a race between whether the thread checke
All 10k tests completed successfully. Nysal pinpointed the real problem
behind the deadlocks. :+1:
George.
On Mon, Nov 9, 2015 at 1:13 PM, Ralph Castain wrote:
> Looking at it, I think I see what was happening. The thread would start,
> but then immediately see that the active flag was false
It seems the change suggested by Nysal also allows me to run into the next
problem ;-)
Mark
> On 09 Nov 2015, at 20:19 , George Bosilca wrote:
>
> All 10k tests completed successfully. Nysal pinpointed the real problem
> behind the deadlocks. :+1:
>
> George.
>
>
> On Mon, Nov 9, 2015 at
Thanks, Nysal!! Good catch!
Josh
On Mon, Nov 9, 2015 at 2:27 PM, Mark Santcroos
wrote:
> It seems the change suggested by Nysal also allows me to run into the next
> problem ;-)
>
> Mark
>
> > On 09 Nov 2015, at 20:19 , George Bosilca wrote:
> >
> > All 10k tests completed successfully. Nysal