good catch! Sorry, the docs are right I just had a brain fart :)

On 17 October 2014 13:46, Nils De Moor <nils.de.m...@gmail.com> wrote:
> Hi guys,
>
> Thanks for the swift feedback. I can confirm that tweaking the
> task_launch_timeout setting in marathon and setting it to a value bigger
> that the executor_registration_timeout setting in mesos fixed our problem.
>
> One sidenote though: the task_launch_timeout setting is in milli-seconds, so
> for 5 minutes it's 300000 (vs 300 in seconds).
> It will save you some hair pulling when seeing your tasks being killed
> immediately after being launched. ;)
>
> Thanks again!
>
> Kr,
> Nils
>
> On Thu, Oct 16, 2014 at 4:27 PM, Michael Babineau
> <michael.babin...@gmail.com> wrote:
>>
>> See also https://issues.apache.org/jira/browse/MESOS-1915
>>
>> On Thu, Oct 16, 2014 at 2:59 AM, Dick Davies <d...@hellooperator.net>
>> wrote:
>>>
>>> One gotcha - the marathon timeout is in seconds, so pass '300' in your
>>> case.
>>>
>>> let us know if it works, I spotted this the other day and anecdotally
>>> it addresses
>>> the issue for some users, be good to get more feedback.
>>>
>>> On 16 October 2014 09:49, Grzegorz Graczyk <gregor...@gmail.com> wrote:
>>> > Make sure you have --task_launch_timeout in marathon set to same value
>>> > as
>>> > executor_registration_timeout.
>>> >
>>> > https://github.com/mesosphere/marathon/blob/master/docs/docs/native-docker.md#configure-marathon
>>> >
>>> > On 16 October 2014 10:37, Nils De Moor <nils.de.m...@gmail.com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> Environment:
>>> >> - Clean vagrant install, 1 master, 1 slave (same behaviour on
>>> >> production
>>> >> cluster with 3 masters, 6 slaves)
>>> >> - Mesos 0.20.1
>>> >> - Marathon 0.7.3
>>> >> - Docker 1.2.0
>>> >>
>>> >> Slave config:
>>> >> - containerizers: "docker,mesos"
>>> >> - executor_registration_timeout: 5mins
>>> >>
>>> >> When is start docker container tasks, they start being pulled from the
>>> >> HUB, but after 1 minute mesos kills them.
>>> >> In the background though the pull is still finishing and when
>>> >> everything
>>> >> is pulled in the docker container is started, without mesos knowing
>>> >> about
>>> >> it.
>>> >> When I start the same task in mesos again (after I know the pull of
>>> >> the
>>> >> image is done), they run normally.
>>> >>
>>> >> So this leaves slaves with 'dirty' docker containers, as mesos has no
>>> >> knowledge about them.
>>> >>
>>> >> From the logs I get this:
>>> >> ---
>>> >> I1009 15:30:02.990291  1414 slave.cpp:1002] Got assigned task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
>>> >> 20140904-160348-185204746-5050-27588-0000
>>> >> I1009 15:30:02.990979  1414 slave.cpp:1112] Launching task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
>>> >> 20140904-160348-185204746-5050-27588-0000
>>> >> I1009 15:30:02.993341  1414 slave.cpp:1222] Queuing task
>>> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> '20140904-160348-185204746-5050-27588-0000
>>> >> I1009 15:30:02.995818  1409 docker.cpp:743] Starting container
>>> >> '25ac3310-71e4-4d10-8a4b-38add4537308' for task
>>> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor
>>> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework
>>> >> '20140904-160348-185204746-5050-27588-0000'
>>> >>
>>> >> I1009 15:31:07.033287  1413 slave.cpp:1278] Asked to kill task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> 20140904-160348-185204746-5050-27588-0000
>>> >> I1009 15:31:07.034742  1413 slave.cpp:2088] Handling status update
>>> >> TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> 20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0
>>> >> W1009 15:31:07.034881  1413 slave.cpp:1354] Killing the unregistered
>>> >> executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of framework
>>> >> 20140904-160348-185204746-5050-27588-0000 because it has no tasks
>>> >> E1009 15:31:07.034945  1413 slave.cpp:2205] Failed to update resources
>>> >> for
>>> >> container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a running task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for
>>> >> terminal
>>> >> task, destroying container: No container found
>>> >> I1009 15:31:07.035133  1413 status_update_manager.cpp:320] Received
>>> >> status
>>> >> update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
>>> >> task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> 20140904-160348-185204746-5050-27588-0000
>>> >> I1009 15:31:07.035210  1413 status_update_manager.cpp:373] Forwarding
>>> >> status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41)
>>> >> for
>>> >> task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> 20140904-160348-185204746-5050-27588-0000 to master@10.0.10.11:5050
>>> >> I1009 15:31:07.046167  1408 status_update_manager.cpp:398] Received
>>> >> status
>>> >> update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41)
>>> >> for task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> 20140904-160348-185204746-5050-27588-0000
>>> >>
>>> >> I1009 15:35:02.993736  1414 slave.cpp:3010] Terminating executor
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> 20140904-160348-185204746-5050-27588-0000 because it did not register
>>> >> within
>>> >> 5mins
>>> >> ---
>>> >>
>>> >> I already posted my question on the marathon board, as I first thought
>>> >> it
>>> >> was an issue on marathon's end:
>>> >> https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY
>>> >>
>>> >>
>>> >> Kind regards,
>>> >> Nils
>>> >>
>>> >
>>
>>
>

Reply via email to