Re: Consequences of health-check timeouts?

haosdent Wed, 18 May 2016 10:45:06 -0700

>In re executor_shutdown_grace_period: how would this enable the task
(MongoDB) to terminate gracefully? (BTW: I am fairly certain that the mongo
STDOUT as captured by Mesos shows that it received signal 15 just before it
said good-bye). My naive understanding of this grace period is that it
simply delays the termination of the executor.


If you use DockerContainerizer, mesos use executor_shutdown_grace_period as
the shutdown gracefully timeout for task as well. If you use
MesosContainerizer, it would send SIGTERM(15) first. After 3 seconds, if
the task is still alive, Mesos would send SIGKILL(9) to the task again.

>I'm not sure what the java task is. This took place on the mesos-master
node and none of our applications runs there. It runs master, Marathon, and
ZK. Maybe the java task is Marathon or ZK?

Not sure about this, maybe others have similar experience on this, do
Marathon or Zookeeper abnormal at that time? Could you provide the log of
mesos-master/mesos-slave when accident happened as well?


On Wed, May 18, 2016 at 7:11 PM, Paul Bell <arach...@gmail.com> wrote:

> Hi Hasodent,
>
> Thanks for your reply.
>
> In re executor_shutdown_grace_period: how would this enable the task
> (MongoDB) to terminate gracefully? (BTW: I am fairly certain that the mongo
> STDOUT as captured by Mesos shows that it received signal 15 just before it
> said good-bye). My naive understanding of this grace period is that it
> simply delays the termination of the executor.
>
> The following snippet is rom /var/log/syslog. I believe it shows the stack
> trace (largely in the kernel) that led to mesos-master being blocked for
> more than 120 seconds. Please note that immediately above (before) the
> blocked mesos-master is a blocked jbd2/dm. Immediately below (after) the
> blocked mesos-master is a blocked java task. I'm not sure what the java
> task is. This took place on the mesos-master node and none of our
> applications runs there. It runs master, Marathon, and ZK. Maybe the java
> task is Marathon or ZK?
>
> Thanks again.
>
> -Paul
>
> May 16 20:06:53 71 kernel: [193339.890848] INFO: task mesos-master:4013
> blocked for more than 120 seconds.
>
> May 16 20:06:53 71 kernel: [193339.890873]       Not tainted
> 3.13.0-32-generic #57-Ubuntu
>
> May 16 20:06:53 71 kernel: [193339.890889] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>
> May 16 20:06:53 71 kernel: [193339.890912] mesos-master    D
> ffff88013fd94440     0  4013      1 0x00000000
>
> May 16 20:06:53 71 kernel: [193339.890914]  ffff880137429a28
> 0000000000000002 ffff880135778000 ffff880137429fd8
>
> May 16 20:06:53 71 kernel: [193339.890916]  0000000000014440
> 0000000000014440 ffff880135778000 ffff88013fd94cd8
>
> May 16 20:06:53 71 kernel: [193339.890918]  ffff88013ffd34b0
> 0000000000000002 ffffffff81284630 ffff880137429aa0
>
> May 16 20:06:53 71 kernel: [193339.890919] Call Trace:
>
> May 16 20:06:53 71 kernel: [193339.890922]  [<ffffffff81284630>] ?
> start_this_handle+0x590/0x590
>
> May 16 20:06:53 71 kernel: [193339.890924]  [<ffffffff817203fd>]
> io_schedule+0x9d/0x140
>
> May 16 20:06:53 71 kernel: [193339.890925]  [<ffffffff8128463e>]
> sleep_on_shadow_bh+0xe/0x20
>
> May 16 20:06:53 71 kernel: [193339.890927]  [<ffffffff81720882>]
> __wait_on_bit+0x62/0x90
>
> May 16 20:06:53 71 kernel: [193339.890929]  [<ffffffff81284630>] ?
> start_this_handle+0x590/0x590
>
> May 16 20:06:53 71 kernel: [193339.890930]  [<ffffffff81720927>]
> out_of_line_wait_on_bit+0x77/0x90
>
> May 16 20:06:53 71 kernel: [193339.890932]  [<ffffffff810aaf40>] ?
> autoremove_wake_function+0x40/0x40
>
> May 16 20:06:53 71 kernel: [193339.890934]  [<ffffffff810aab15>] ?
> wake_up_bit+0x25/0x30
>
> May 16 20:06:53 71 kernel: [193339.890936]  [<ffffffff81285a3d>]
> do_get_write_access+0x2ad/0x4f0
>
> May 16 20:06:53 71 kernel: [193339.890938]  [<ffffffff811ef90d>] ?
> __getblk+0x2d/0x2e0
>
> May 16 20:06:53 71 kernel: [193339.890939]  [<ffffffff81285ca7>]
> jbd2_journal_get_write_access+0x27/0x40
>
> May 16 20:06:53 71 kernel: [193339.890942]  [<ffffffff8126c37b>]
> __ext4_journal_get_write_access+0x3b/0x80
>
> May 16 20:06:53 71 kernel: [193339.890946]  [<ffffffff81242200>]
> ext4_reserve_inode_write+0x70/0xa0
>
> May 16 20:06:53 71 kernel: [193339.890948]  [<ffffffff81245490>] ?
> ext4_dirty_inode+0x40/0x60
>
> May 16 20:06:53 71 kernel: [193339.890949]  [<ffffffff81242274>]
> ext4_mark_inode_dirty+0x44/0x1f0
>
> May 16 20:06:53 71 kernel: [193339.890951]  [<ffffffff81245490>]
> ext4_dirty_inode+0x40/0x60
>
> May 16 20:06:53 71 kernel: [193339.890953]  [<ffffffff811e65aa>]
> __mark_inode_dirty+0x10a/0x2d0
>
> May 16 20:06:53 71 kernel: [193339.890956]  [<ffffffff811d7761>]
> update_time+0x81/0xd0
>
> May 16 20:06:53 71 kernel: [193339.890957]  [<ffffffff811d7970>]
> file_update_time+0x80/0xd0
>
> May 16 20:06:53 71 kernel: [193339.890961]  [<ffffffff8114fc40>]
> __generic_file_aio_write+0x180/0x3d0
>
> May 16 20:06:53 71 kernel: [193339.890963]  [<ffffffff8114fee8>]
> generic_file_aio_write+0x58/0xa0
>
> May 16 20:06:53 71 kernel: [193339.890965]  [<ffffffff81239ea9>]
> ext4_file_write+0x99/0x400
>
> May 16 20:06:53 71 kernel: [193339.890967]  [<ffffffff8109a7f0>] ?
> wake_up_state+0x10/0x20
>
> May 16 20:06:53 71 kernel: [193339.890970]  [<ffffffff810d7c56>] ?
> wake_futex+0x66/0x90
>
> May 16 20:06:53 71 kernel: [193339.890972]  [<ffffffff810d8c01>] ?
> futex_wake+0x1b1/0x1d0
>
> May 16 20:06:53 71 kernel: [193339.890974]  [<ffffffff811bc3da>]
> do_sync_write+0x5a/0x90
>
> May 16 20:06:53 71 kernel: [193339.890976]  [<ffffffff811bcb64>]
> vfs_write+0xb4/0x1f0
>
> May 16 20:06:53 71 kernel: [193339.890978]  [<ffffffff811bd599>]
> SyS_write+0x49/0xa0
>
> May 16 20:06:53 71 kernel: [193339.890980]  [<ffffffff8172c87f>]
> tracesys+0xe1/0xe6
>
>
>
> On Wed, May 18, 2016 at 2:33 AM, haosdent <haosd...@gmail.com> wrote:
>
>> >Is there some way to be given control (a callback, or an "exit" routine)
>> so that the container about to be nuked can be given a chance to exit
>> gracefully?
>> The default value of executor_shutdown_grace_period is 5 seconds, you
>> could change it by specify the `--executor_shutdown_grace_period` flag when
>> launch mesos agent.
>>
>> >Are there other steps I can take to avoid this mildly calamitous
>> occurrence?
>> >mesos-slaves get shutdown
>> Do you know where your mesos-master stuck when it happens? Any error log
>> or related log about this? In addition, is there any log when mesos-slave
>> shut down?
>>
>> On Wed, May 18, 2016 at 6:12 AM, Paul Bell <arach...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I probably have the following account partly wrong, but let me present
>>> it just the same and those who know better can correct me as needed.
>>>
>>> I've an application that runs several MongoDB shards, each a Dockerized
>>> container, each on a distinct node (VM); in fact, some of the VMs are on
>>> separate ESXi hosts.
>>>
>>> I've lately seen situations where, because of very slow disks for the
>>> database, the following sequence occurs (I think):
>>>
>>>    1. Linux (Ubuntu 14.04 LTS) virtual memory manager hits thresholds
>>>    defined by vm.dirty_background_ratio and/or vm.dirty_ratio (probably 
>>> both)
>>>    2. Synchronous flushing of many, many pages occurs, writing to a
>>>    slow disk
>>>    3. (Around this time one might see in /var/log/syslog "task X
>>>    blocked for more than 120 seconds" for all kinds of tasks, including
>>>    mesos-master)
>>>    4. mesos-slaves get shutdown (this is the part I'm unclear about;
>>>    but I am quite certain that on 2 nodes the executors and their in-flight
>>>    MongoDB tasks got zapped because I can see that Marathon restarted them).
>>>
>>> The consequences of this are a corrupt MongoDB database. In the case at
>>> hand, the job had run for over 50 hours, processing close to 120 million
>>> files.
>>>
>>> Steps I've taken so far to remedy include:
>>>
>>>    - tune vm.dirty_background_ratio and vm.dirty_ratio down,
>>>    respectively, to 5 and 10 (from 10 and 20). The intent here is to 
>>> tolerate
>>>    more frequent, smaller flushes and thus avoid less frequent massive 
>>> flushes
>>>    that suspend threads for very long periods.
>>>    - increase agent ping timeout to 10 minutes (every 30 seconds, 20
>>>    times)
>>>
>>> So the questions are:
>>>
>>>    - Is there some way to be given control (a callback, or an "exit"
>>>    routine) so that the container about to be nuked can be given a chance to
>>>    exit gracefully?
>>>    - Are there other steps I can take to avoid this mildly calamitous
>>>    occurrence?
>>>    - (Also, I'd be grateful for more clarity on anything in steps 1-4
>>>    above that is a bit hand-wavy!)
>>>
>>> As always, thanks.
>>>
>>> -Paul
>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


-- 
Best Regards,
Haosdent Huang

Re: Consequences of health-check timeouts?

Reply via email to