Hi Hasodent,

Thanks for your reply.

In re executor_shutdown_grace_period: how would this enable the task
(MongoDB) to terminate gracefully? (BTW: I am fairly certain that the mongo
STDOUT as captured by Mesos shows that it received signal 15 just before it
said good-bye). My naive understanding of this grace period is that it
simply delays the termination of the executor.

The following snippet is rom /var/log/syslog. I believe it shows the stack
trace (largely in the kernel) that led to mesos-master being blocked for
more than 120 seconds. Please note that immediately above (before) the
blocked mesos-master is a blocked jbd2/dm. Immediately below (after) the
blocked mesos-master is a blocked java task. I'm not sure what the java
task is. This took place on the mesos-master node and none of our
applications runs there. It runs master, Marathon, and ZK. Maybe the java
task is Marathon or ZK?

Thanks again.

-Paul

May 16 20:06:53 71 kernel: [193339.890848] INFO: task mesos-master:4013
blocked for more than 120 seconds.

May 16 20:06:53 71 kernel: [193339.890873]       Not tainted
3.13.0-32-generic #57-Ubuntu

May 16 20:06:53 71 kernel: [193339.890889] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.

May 16 20:06:53 71 kernel: [193339.890912] mesos-master    D
ffff88013fd94440     0  4013      1 0x00000000

May 16 20:06:53 71 kernel: [193339.890914]  ffff880137429a28
0000000000000002 ffff880135778000 ffff880137429fd8

May 16 20:06:53 71 kernel: [193339.890916]  0000000000014440
0000000000014440 ffff880135778000 ffff88013fd94cd8

May 16 20:06:53 71 kernel: [193339.890918]  ffff88013ffd34b0
0000000000000002 ffffffff81284630 ffff880137429aa0

May 16 20:06:53 71 kernel: [193339.890919] Call Trace:

May 16 20:06:53 71 kernel: [193339.890922]  [<ffffffff81284630>] ?
start_this_handle+0x590/0x590

May 16 20:06:53 71 kernel: [193339.890924]  [<ffffffff817203fd>]
io_schedule+0x9d/0x140

May 16 20:06:53 71 kernel: [193339.890925]  [<ffffffff8128463e>]
sleep_on_shadow_bh+0xe/0x20

May 16 20:06:53 71 kernel: [193339.890927]  [<ffffffff81720882>]
__wait_on_bit+0x62/0x90

May 16 20:06:53 71 kernel: [193339.890929]  [<ffffffff81284630>] ?
start_this_handle+0x590/0x590

May 16 20:06:53 71 kernel: [193339.890930]  [<ffffffff81720927>]
out_of_line_wait_on_bit+0x77/0x90

May 16 20:06:53 71 kernel: [193339.890932]  [<ffffffff810aaf40>] ?
autoremove_wake_function+0x40/0x40

May 16 20:06:53 71 kernel: [193339.890934]  [<ffffffff810aab15>] ?
wake_up_bit+0x25/0x30

May 16 20:06:53 71 kernel: [193339.890936]  [<ffffffff81285a3d>]
do_get_write_access+0x2ad/0x4f0

May 16 20:06:53 71 kernel: [193339.890938]  [<ffffffff811ef90d>] ?
__getblk+0x2d/0x2e0

May 16 20:06:53 71 kernel: [193339.890939]  [<ffffffff81285ca7>]
jbd2_journal_get_write_access+0x27/0x40

May 16 20:06:53 71 kernel: [193339.890942]  [<ffffffff8126c37b>]
__ext4_journal_get_write_access+0x3b/0x80

May 16 20:06:53 71 kernel: [193339.890946]  [<ffffffff81242200>]
ext4_reserve_inode_write+0x70/0xa0

May 16 20:06:53 71 kernel: [193339.890948]  [<ffffffff81245490>] ?
ext4_dirty_inode+0x40/0x60

May 16 20:06:53 71 kernel: [193339.890949]  [<ffffffff81242274>]
ext4_mark_inode_dirty+0x44/0x1f0

May 16 20:06:53 71 kernel: [193339.890951]  [<ffffffff81245490>]
ext4_dirty_inode+0x40/0x60

May 16 20:06:53 71 kernel: [193339.890953]  [<ffffffff811e65aa>]
__mark_inode_dirty+0x10a/0x2d0

May 16 20:06:53 71 kernel: [193339.890956]  [<ffffffff811d7761>]
update_time+0x81/0xd0

May 16 20:06:53 71 kernel: [193339.890957]  [<ffffffff811d7970>]
file_update_time+0x80/0xd0

May 16 20:06:53 71 kernel: [193339.890961]  [<ffffffff8114fc40>]
__generic_file_aio_write+0x180/0x3d0

May 16 20:06:53 71 kernel: [193339.890963]  [<ffffffff8114fee8>]
generic_file_aio_write+0x58/0xa0

May 16 20:06:53 71 kernel: [193339.890965]  [<ffffffff81239ea9>]
ext4_file_write+0x99/0x400

May 16 20:06:53 71 kernel: [193339.890967]  [<ffffffff8109a7f0>] ?
wake_up_state+0x10/0x20

May 16 20:06:53 71 kernel: [193339.890970]  [<ffffffff810d7c56>] ?
wake_futex+0x66/0x90

May 16 20:06:53 71 kernel: [193339.890972]  [<ffffffff810d8c01>] ?
futex_wake+0x1b1/0x1d0

May 16 20:06:53 71 kernel: [193339.890974]  [<ffffffff811bc3da>]
do_sync_write+0x5a/0x90

May 16 20:06:53 71 kernel: [193339.890976]  [<ffffffff811bcb64>]
vfs_write+0xb4/0x1f0

May 16 20:06:53 71 kernel: [193339.890978]  [<ffffffff811bd599>]
SyS_write+0x49/0xa0

May 16 20:06:53 71 kernel: [193339.890980]  [<ffffffff8172c87f>]
tracesys+0xe1/0xe6



On Wed, May 18, 2016 at 2:33 AM, haosdent <haosd...@gmail.com> wrote:

> >Is there some way to be given control (a callback, or an "exit" routine)
> so that the container about to be nuked can be given a chance to exit
> gracefully?
> The default value of executor_shutdown_grace_period is 5 seconds, you
> could change it by specify the `--executor_shutdown_grace_period` flag when
> launch mesos agent.
>
> >Are there other steps I can take to avoid this mildly calamitous
> occurrence?
> >mesos-slaves get shutdown
> Do you know where your mesos-master stuck when it happens? Any error log
> or related log about this? In addition, is there any log when mesos-slave
> shut down?
>
> On Wed, May 18, 2016 at 6:12 AM, Paul Bell <arach...@gmail.com> wrote:
>
>> Hi All,
>>
>> I probably have the following account partly wrong, but let me present it
>> just the same and those who know better can correct me as needed.
>>
>> I've an application that runs several MongoDB shards, each a Dockerized
>> container, each on a distinct node (VM); in fact, some of the VMs are on
>> separate ESXi hosts.
>>
>> I've lately seen situations where, because of very slow disks for the
>> database, the following sequence occurs (I think):
>>
>>    1. Linux (Ubuntu 14.04 LTS) virtual memory manager hits thresholds
>>    defined by vm.dirty_background_ratio and/or vm.dirty_ratio (probably both)
>>    2. Synchronous flushing of many, many pages occurs, writing to a slow
>>    disk
>>    3. (Around this time one might see in /var/log/syslog "task X blocked
>>    for more than 120 seconds" for all kinds of tasks, including mesos-master)
>>    4. mesos-slaves get shutdown (this is the part I'm unclear about; but
>>    I am quite certain that on 2 nodes the executors and their in-flight
>>    MongoDB tasks got zapped because I can see that Marathon restarted them).
>>
>> The consequences of this are a corrupt MongoDB database. In the case at
>> hand, the job had run for over 50 hours, processing close to 120 million
>> files.
>>
>> Steps I've taken so far to remedy include:
>>
>>    - tune vm.dirty_background_ratio and vm.dirty_ratio down,
>>    respectively, to 5 and 10 (from 10 and 20). The intent here is to tolerate
>>    more frequent, smaller flushes and thus avoid less frequent massive 
>> flushes
>>    that suspend threads for very long periods.
>>    - increase agent ping timeout to 10 minutes (every 30 seconds, 20
>>    times)
>>
>> So the questions are:
>>
>>    - Is there some way to be given control (a callback, or an "exit"
>>    routine) so that the container about to be nuked can be given a chance to
>>    exit gracefully?
>>    - Are there other steps I can take to avoid this mildly calamitous
>>    occurrence?
>>    - (Also, I'd be grateful for more clarity on anything in steps 1-4
>>    above that is a bit hand-wavy!)
>>
>> As always, thanks.
>>
>> -Paul
>>
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Reply via email to