Re: Status acknowledgements in MesosExecutor

2016-06-06 Thread Anand Mazumdar
Hi Evers, Thanks for taking this on. Vinod has agreed to shepherd this and I would be happy to be the initial reviewer for the patches. -anand > On Jun 1, 2016, at 10:27 AM, Evers Benno wrote: > > Some more context about this bug: > > We did some tests with a framework that does nothing but

Re: Status acknowledgements in MesosExecutor

2016-06-01 Thread Evers Benno
Some more context about this bug: We did some tests with a framework that does nothing but send empty tasks and sample executor that does nothing but send TASK_FINISHED and shut itself down. Running on two virtual machines on the same host (i.e. no network involved), we see TASK_FAILED in about 3

Re: Status acknowledgements in MesosExecutor

2016-05-03 Thread Evers Benno
Alex, thanks, that put me on the right track Seems like the executor driver is indeed not waiting for acknowledgements before stopping, so, as observed by Yan Xu and Vinod Kone in MESOS-243 : > The right fix for this for stop() to wait for ACKs. W

Re: Status acknowledgements in MesosExecutor

2016-05-03 Thread Anand Mazumdar
Also, we would be modifying the agent to always acknowledge status updates from the executor. (MESOS-5262 ) Once, that is done, it should be sufficient for an executor to terminate itself on receiving an acknowledgment message from the agent, ins

Re: Status acknowledgements in MesosExecutor

2016-05-03 Thread Alex Rukletsov
Benno— you may be seeing MESOS-4111 . Also, have a look at this comment: https://github.com/apache/mesos/blob/9f472b1eff904d0d96063d3bed535a8e81263d69/src/launcher/executor.cpp#L611-L617 On Tue, May 3, 2016 at 2:49 PM, Evers Benno wrote: > Hi, >

Status acknowledgements in MesosExecutor

2016-05-03 Thread Evers Benno
Hi, I was wondering about the semantics of the Executor::sendStatusUpdate() method. It is described as // Sends a status update to the framework scheduler, retrying as // necessary until an acknowledgement has been received or the // executor is terminated (in which case, a TASK_LOST