One gotcha - the marathon timeout is in seconds, so pass '300' in your case.

let us know if it works, I spotted this the other day and anecdotally
it addresses
the issue for some users, be good to get more feedback.

On 16 October 2014 09:49, Grzegorz Graczyk <gregor...@gmail.com> wrote:
> Make sure you have --task_launch_timeout in marathon set to same value as
> executor_registration_timeout.
> https://github.com/mesosphere/marathon/blob/master/docs/docs/native-docker.md#configure-marathon
>
> On 16 October 2014 10:37, Nils De Moor <nils.de.m...@gmail.com> wrote:
>>
>> Hi,
>>
>> Environment:
>> - Clean vagrant install, 1 master, 1 slave (same behaviour on production
>> cluster with 3 masters, 6 slaves)
>> - Mesos 0.20.1
>> - Marathon 0.7.3
>> - Docker 1.2.0
>>
>> Slave config:
>> - containerizers: "docker,mesos"
>> - executor_registration_timeout: 5mins
>>
>> When is start docker container tasks, they start being pulled from the
>> HUB, but after 1 minute mesos kills them.
>> In the background though the pull is still finishing and when everything
>> is pulled in the docker container is started, without mesos knowing about
>> it.
>> When I start the same task in mesos again (after I know the pull of the
>> image is done), they run normally.
>>
>> So this leaves slaves with 'dirty' docker containers, as mesos has no
>> knowledge about them.
>>
>> From the logs I get this:
>> ---
>> I1009 15:30:02.990291  1414 slave.cpp:1002] Got assigned task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
>> 20140904-160348-185204746-5050-27588-0000
>> I1009 15:30:02.990979  1414 slave.cpp:1112] Launching task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
>> 20140904-160348-185204746-5050-27588-0000
>> I1009 15:30:02.993341  1414 slave.cpp:1222] Queuing task
>> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor
>> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> '20140904-160348-185204746-5050-27588-0000
>> I1009 15:30:02.995818  1409 docker.cpp:743] Starting container
>> '25ac3310-71e4-4d10-8a4b-38add4537308' for task
>> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor
>> 'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework
>> '20140904-160348-185204746-5050-27588-0000'
>>
>> I1009 15:31:07.033287  1413 slave.cpp:1278] Asked to kill task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> 20140904-160348-185204746-5050-27588-0000
>> I1009 15:31:07.034742  1413 slave.cpp:2088] Handling status update
>> TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> 20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0
>> W1009 15:31:07.034881  1413 slave.cpp:1354] Killing the unregistered
>> executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of framework
>> 20140904-160348-185204746-5050-27588-0000 because it has no tasks
>> E1009 15:31:07.034945  1413 slave.cpp:2205] Failed to update resources for
>> container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor
>> test-app.23755452-4fc9-11e4-839b-080027c4337a running task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for terminal
>> task, destroying container: No container found
>> I1009 15:31:07.035133  1413 status_update_manager.cpp:320] Received status
>> update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> 20140904-160348-185204746-5050-27588-0000
>> I1009 15:31:07.035210  1413 status_update_manager.cpp:373] Forwarding
>> status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
>> task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> 20140904-160348-185204746-5050-27588-0000 to master@10.0.10.11:5050
>> I1009 15:31:07.046167  1408 status_update_manager.cpp:398] Received status
>> update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> 20140904-160348-185204746-5050-27588-0000
>>
>> I1009 15:35:02.993736  1414 slave.cpp:3010] Terminating executor
>> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> 20140904-160348-185204746-5050-27588-0000 because it did not register within
>> 5mins
>> ---
>>
>> I already posted my question on the marathon board, as I first thought it
>> was an issue on marathon's end:
>> https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY
>>
>>
>> Kind regards,
>> Nils
>>
>

Reply via email to