One gotcha - the marathon timeout is in seconds, so pass '300' in your case.
let us know if it works, I spotted this the other day and anecdotally it addresses the issue for some users, be good to get more feedback. On 16 October 2014 09:49, Grzegorz Graczyk <gregor...@gmail.com> wrote: > Make sure you have --task_launch_timeout in marathon set to same value as > executor_registration_timeout. > https://github.com/mesosphere/marathon/blob/master/docs/docs/native-docker.md#configure-marathon > > On 16 October 2014 10:37, Nils De Moor <nils.de.m...@gmail.com> wrote: >> >> Hi, >> >> Environment: >> - Clean vagrant install, 1 master, 1 slave (same behaviour on production >> cluster with 3 masters, 6 slaves) >> - Mesos 0.20.1 >> - Marathon 0.7.3 >> - Docker 1.2.0 >> >> Slave config: >> - containerizers: "docker,mesos" >> - executor_registration_timeout: 5mins >> >> When is start docker container tasks, they start being pulled from the >> HUB, but after 1 minute mesos kills them. >> In the background though the pull is still finishing and when everything >> is pulled in the docker container is started, without mesos knowing about >> it. >> When I start the same task in mesos again (after I know the pull of the >> image is done), they run normally. >> >> So this leaves slaves with 'dirty' docker containers, as mesos has no >> knowledge about them. >> >> From the logs I get this: >> --- >> I1009 15:30:02.990291 1414 slave.cpp:1002] Got assigned task >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework >> 20140904-160348-185204746-5050-27588-0000 >> I1009 15:30:02.990979 1414 slave.cpp:1112] Launching task >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework >> 20140904-160348-185204746-5050-27588-0000 >> I1009 15:30:02.993341 1414 slave.cpp:1222] Queuing task >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework >> '20140904-160348-185204746-5050-27588-0000 >> I1009 15:30:02.995818 1409 docker.cpp:743] Starting container >> '25ac3310-71e4-4d10-8a4b-38add4537308' for task >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework >> '20140904-160348-185204746-5050-27588-0000' >> >> I1009 15:31:07.033287 1413 slave.cpp:1278] Asked to kill task >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework >> 20140904-160348-185204746-5050-27588-0000 >> I1009 15:31:07.034742 1413 slave.cpp:2088] Handling status update >> TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework >> 20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0 >> W1009 15:31:07.034881 1413 slave.cpp:1354] Killing the unregistered >> executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of framework >> 20140904-160348-185204746-5050-27588-0000 because it has no tasks >> E1009 15:31:07.034945 1413 slave.cpp:2205] Failed to update resources for >> container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor >> test-app.23755452-4fc9-11e4-839b-080027c4337a running task >> test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for terminal >> task, destroying container: No container found >> I1009 15:31:07.035133 1413 status_update_manager.cpp:320] Received status >> update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework >> 20140904-160348-185204746-5050-27588-0000 >> I1009 15:31:07.035210 1413 status_update_manager.cpp:373] Forwarding >> status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for >> task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework >> 20140904-160348-185204746-5050-27588-0000 to master@10.0.10.11:5050 >> I1009 15:31:07.046167 1408 status_update_manager.cpp:398] Received status >> update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework >> 20140904-160348-185204746-5050-27588-0000 >> >> I1009 15:35:02.993736 1414 slave.cpp:3010] Terminating executor >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework >> 20140904-160348-185204746-5050-27588-0000 because it did not register within >> 5mins >> --- >> >> I already posted my question on the marathon board, as I first thought it >> was an issue on marathon's end: >> https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY >> >> >> Kind regards, >> Nils >> >