Managed to take out a mesos slave today with a typo while launching
a marathon app, and wondered if there are throttles/limits that can be
applied to repeated launches to limit the risk of such mistakes in the future.

I started a thread on the marathon list
 (  https://groups.google.com/forum/?hl=en#!topic/marathon-framework/4iWLqTYTvgM
)

[ TL:DR: marathon throws an app that will never deploy correctly at slaves
until the disk fills with debris and the slave dies ]

but I suppose this could be something available in mesos itself.

I can't find a lot of advice about operational aspects of Mesos admin;
could others here provide some good advice about their experience in
preventing failed task deploys from causing trouble on their clusters?

Thanks!

Reply via email to