In our environment, we run a lot of batch jobs, some of which have tight
timeline. If any tasks in the job runs longer than x hours, it does not
make sense to run it anymore.

For instance, a team would submit a job which builds a weekly index and
repeats every Monday. If the job does not finish before next Monday for
whatever reason, there is no point to keep any task running.

We believe that implementing deadline tracking distributed across our
cluster makes more sense as it makes the system more scalable and also
makes our centralized state machine simpler.

One idea I have right now is to add an  *optional* *TimeInfo deadline* to
TaskInfo field, and all default executors in Mesos can simply terminate the
task and send a proper *StatusUpdate.*

I summarized above idea in MESOS-8725
<https://issues.apache.org/jira/browse/MESOS-8725>.

Please let me know what you think. Thanks!

-- 
Cheers,

Zhitao Li

Reply via email to