Re: Mesos rare TASK_LOST scenario v 0.21.0

2018-01-10 Thread Vinod Kone
The command executor was probably fixed somewhere between 0.21 and 1.3. The only reason I mentioned 1.3+ is because any releases before that are out of support period. If you can repro the issue with 1.3+ and paste the logs here or in a JIRA, we can help debug it for you. On Wed, Jan 10, 2018 at 9

Re: Mesos rare TASK_LOST scenario v 0.21.0

2018-01-10 Thread Ajay V
Thanks for getting back Vinod. So, does that mean that even for v1.2, these race conditions (where the command executor doesn't stay long enough ) existed and that 1.3 versions fixes them ?. Reason for asking is because I did try an upgrade to v1.2 and still found very similar issues. Regards, Aja

Re: Mesos rare TASK_LOST scenario v 0.21.0

2018-01-09 Thread Vinod Kone
0.21 is really old and not supported. I highly recommend you upgrade to 1.3+. Regarding what you are seeing, we definitely had issues in the past where the command executor didn't stay up long enough to guarantee that TASK_FINISHED was delivered to the agent; so races like above were possible. On

Mesos rare TASK_LOST scenario v 0.21.0

2018-01-09 Thread Ajay V
Hello, I'm trying to debug a TASK_LOST thats generated on the agent that I see on rare occasions. Following is a log that I'm trying to understand. This is happening after the driver.sendStatusUpdate() has been called with a task state of TASK_FINISHED from a java executor. It looks to me like th