Re: Trying to debug an issue in mesos task tracking

2015-01-26 Thread Sharma Podila
I deal with Java programs running in my executor that spawn various "service/daemon threads". So, I tend to explicitly call TASK_FINISHED and call System.exit() (with a sleep to allow Mesos to communicate the task update) when I know the task is complete instead of waiting for natural exit of all t

Re: Trying to debug an issue in mesos task tracking

2015-01-26 Thread Itamar Ostricher
Thanks Alex. I agree that it looks like it's not mesos-related. It's probably some dead-lock. On Mon, Jan 26, 2015 at 1:31 PM, Alex Rukletsov wrote: > Itamar, > > you are right, Mesos executor and containerizer cannot distinguish > between "busy" and "stuck" processes. However, since you use you

Re: Trying to debug an issue in mesos task tracking

2015-01-26 Thread Alex Rukletsov
Itamar, you are right, Mesos executor and containerizer cannot distinguish between "busy" and "stuck" processes. However, since you use your own custom executor, you may want to implement a sort of health checks. It depends on what your task processes are doing. There are hundreds of reasons why

Re: Trying to debug an issue in mesos task tracking

2015-01-24 Thread Itamar Ostricher
Alex, Sharma, thanks for your input! Trying to recreate the issue with a small cluster for the last few days, I was not able to observe a scenario that I can be sure that my executor sent the TASK_FINISHED update, but the scheduler did not receive it. I did observe multiple times a scenario that a

Re: Trying to debug an issue in mesos task tracking

2015-01-23 Thread Alex Rukletsov
Itamar, beyond checking master and slave logs, could you pleasse verify your executor does send the TASK_FINISHED update? You may want to add some logging and the check executor log. Mesos guarantees the delivery of status updates, so I suspect the problem is on the executor's side. On Wed, Jan 2

Re: Trying to debug an issue in mesos task tracking

2015-01-21 Thread Sharma Podila
Have you checked the mesos-slave and mesos-master logs for that task id? There should be logs in there for task state updates, including FINISHED. There can be specific cases where sometimes the task status is not reliably sent to your scheduler (due to mesos-master restarts, leader election change

Trying to debug an issue in mesos task tracking

2015-01-21 Thread Itamar Ostricher
I'm using a custom internal framework, loosely based on MesosSubmit. The phenomenon I'm seeing is something like this: 1. Task X is assigned to slave S. 2. I know this task should run for ~10minutes. 3. On the master dashboard, I see that task X is in the "Running" state for several *hours*. 4. I S