Re: orphan executor

2017-10-27 Thread Mohit Jaggi
Here are some relevant logs. Aurora scheduler logs shows the task going from: INIT ->PENDING ->ASSIGNED ->STARTING ->RUNNING for a long time ->FAILED due to health check error, OSError: Resource temporarily unavailable (I think this is referring to running out of PID space, see thermos logs below)

Re: orphan executor

2017-10-27 Thread Vinod Kone
Can you share the agent and executor logs of an example orphaned executor? That would help us diagnose the issue. On Fri, Oct 27, 2017 at 8:19 PM, Mohit Jaggi wrote: > Folks, > Often I see some orphaned executors in my cluster. These are cases where > the framework was

orphan executor

2017-10-27 Thread Mohit Jaggi
Folks, Often I see some orphaned executors in my cluster. These are cases where the framework was informed of task loss, so has forgotten about them as expected, but the container(docker) is still around. AFAIK, Mesos agent is the only entity that has knowledge of these containers. How do I ensure