Hi Anindya, The problem occurred again. The following is the log of the scheduler driver log at Chronos side:
I0812 08:15:43.902712 96 sched.cpp:1937] Asked to abort the driver I0812 08:15:43.902763 96 sched.cpp:981] Scheduler::statusUpdate took 1.436378441secs I0812 08:15:43.902788 96 sched.cpp:988] Not sending status update acknowledgment message b\ ecause the driver is not running! I0812 08:15:43.902866 96 sched.cpp:919] Ignoring task status update message because the dr\ iver is not running! However from the earlier log I don't see the clue of why scheduler driver be aborted. Thankds, Zhichang Yu ________________________________ 发件人: 志昌 余 <yuzhichang_...@hotmail.com> 发送时间: 2016年8月9日 18:03:31 收件人: user@mesos.apache.org 主题: 答复: Deactivationg framework unexpectly Hi Anindys, Thanks for the info. I'll enable scheduler driver log to see what happen. Regards, Zhichang Yu ________________________________ 发件人: anindya_si...@apple.com <anindya_si...@apple.com> 代表 Anindya Sinha <anindya_si...@apple.com> 发送时间: 2016年8月8日 23:50:10 收件人: user@mesos.apache.org 主题: Re: Deactivationg framework unexpectly Looks like your framework (chronos) is sending a DeactivateFrameworkMessage message to the master. The scheduler driver would also send a DeativateFramework message if it is aborted (https://github.com/apache/mesos/blob/master/src/sched/sched.cpp#L1224). Also, master can deactivate your framework if your framework disconnects or fails over. Please check logs in master or see if your framework received a FrameworkErrorMessage. Thanks Anindya On Aug 8, 2016, at 3:35 AM, 志昌 余 <yuzhichang_...@hotmail.com<mailto:yuzhichang_...@hotmail.com>> wrote: Hi, I recently faced a wired problem. I'm running mesos + chronos. Chronos often (once every several days) stops scheduling tasks due to mesos deactived the framework. As following is the log of mesos master leader: # grep -iP "activat|disconnected" /var/log/mesos/mesos-master.INFO I0806 13:40:33.143658 30 master.cpp:2551] Deactivating framework 90a6a7dc-7256-4e55-bd7e-573233c5df74-0000 (chronos-2.5.0-SNAPSHOT) at scheduler-86a64d22-5201-4bb0-8a2c-70d3e97afae6@10.8.139.246<mailto:scheduler-86a64d22-5201-4bb0-8a2c-70d3e97afae6@10.8.139.246>:34544 I0806 13:40:33.143908 23 hierarchical.cpp:375] Deactivated framework 90a6a7dc-7256-4e55-bd7e-573233c5df74-0000 The fix is to manually reboot the chronos leader. My env: There are 3 physical machines, on each are running containerized mesos master and chronos. When the issue occurred, the mesos leader and chronos leader were both running on the same machine. Software Version: mesos-master:0.28.0-2.0.16.ubuntu1404 chronos:2.5.0-ce4469d.ubuntu1404-mesos-0.28.0-2.0.16.ubuntu1404 Can anyone give insight for this problem? Thanks, Zhichang Yu