Hi Anindya,

    The problem occurred again. The following is the log of the scheduler 
driver log at Chronos side:


I0812 08:15:43.902712    96 sched.cpp:1937] Asked to abort the driver
I0812 08:15:43.902763    96 sched.cpp:981] Scheduler::statusUpdate took 
1.436378441secs
I0812 08:15:43.902788    96 sched.cpp:988] Not sending status update 
acknowledgment message b\
ecause the driver is not running!
I0812 08:15:43.902866    96 sched.cpp:919] Ignoring task status update message 
because the dr\
iver is not running!

    However from the earlier log I don't see the clue of why scheduler driver 
be aborted.



    Thankds,

Zhichang Yu



________________________________
发件人: 志昌 余 <yuzhichang_...@hotmail.com>
发送时间: 2016年8月9日 18:03:31
收件人: user@mesos.apache.org
主题: 答复: Deactivationg framework unexpectly


Hi Anindys,

    Thanks for the info. I'll enable  scheduler driver log to see what happen.

Regards,

Zhichang Yu

________________________________
发件人: anindya_si...@apple.com <anindya_si...@apple.com> 代表 Anindya Sinha 
<anindya_si...@apple.com>
发送时间: 2016年8月8日 23:50:10
收件人: user@mesos.apache.org
主题: Re: Deactivationg framework unexpectly

Looks like your framework (chronos) is sending a DeactivateFrameworkMessage 
message to the master. The scheduler driver would also send a 
DeativateFramework message if it is aborted 
(https://github.com/apache/mesos/blob/master/src/sched/sched.cpp#L1224).

Also, master can deactivate your framework if your framework disconnects or 
fails over. Please check logs in master or see if your framework received a 
FrameworkErrorMessage.

Thanks
Anindya

On Aug 8, 2016, at 3:35 AM, 志昌 余 
<yuzhichang_...@hotmail.com<mailto:yuzhichang_...@hotmail.com>> wrote:

Hi,
    I recently faced a wired problem. I'm running mesos + chronos. Chronos 
often (once every several days) stops scheduling tasks due to mesos deactived 
the framework.
As following is the log of mesos master leader:


# grep -iP "activat|disconnected" /var/log/mesos/mesos-master.INFO
I0806 13:40:33.143658    30 master.cpp:2551] Deactivating framework 
90a6a7dc-7256-4e55-bd7e-573233c5df74-0000 (chronos-2.5.0-SNAPSHOT) at 
scheduler-86a64d22-5201-4bb0-8a2c-70d3e97afae6@10.8.139.246<mailto:scheduler-86a64d22-5201-4bb0-8a2c-70d3e97afae6@10.8.139.246>:34544
I0806 13:40:33.143908    23 hierarchical.cpp:375] Deactivated framework 
90a6a7dc-7256-4e55-bd7e-573233c5df74-0000

The fix is to manually reboot the chronos leader.


My env:
There are 3 physical machines, on each are running containerized mesos master 
and chronos. When the issue occurred,  the mesos leader and chronos leader were 
both running on the same machine.

Software Version:
mesos-master:0.28.0-2.0.16.ubuntu1404

chronos:2.5.0-ce4469d.ubuntu1404-mesos-0.28.0-2.0.16.ubuntu1404

    Can anyone give insight for this problem?
    Thanks,
Zhichang Yu

Reply via email to