Hi all.

Mesos version = 0.23.0-1.0.ubuntu1404 (mesosphere APT repo)
Marathon version = 0.10.1 (mesosphere APT repo)

Hopefully this is a simple one for someone to answer, though I couldn't
find anything immediately
obvious in the documentation. We're trialling Mesos in a cloud (EC2/GCE)
environment and the one
thing that continues to bite us in the ass is this; continued task failures
until the docker image is
fully downloaded! Why is this!? Some of our images a small (say 200MB),
some much larger (2GB)
due to the nature of the software packages we're containerising. Regardless
of this size, they fail the
first dozen (or more) times until one of the slaves has pulled the image.
Why is there an apparent
hard time-out and how can I avoid it? I don't want the task to register as
a fail - it hasn't even had a
chance to run yet! Up until now we've just been tolerating the bouncing
around of these tasks but it's
now reached a point where it's darn annoying ;)

I've tried setting executor_registration_timeout to '5mins' but this made
no apparent difference (every
minute the task is killed still). I should note that these tasks are
launched using the Marathon
framework and I've tried setting 'task_launch_timeout' to '3000' and again,
it makes no difference.

Based on a brief glance of a mesos slave log file it seems the master
instructs the slave to kill the task off after 1 minute.

Please advise.

Cheers,

Jim

--
Senior Code Pig
Industrial Light & Magic

Reply via email to