Hi Tim,

Thanks again. Inline.


From: Tim Chen [mailto:t...@mesosphere.io]
Sent: 08 August 2015 23:23
To: user@mesos.apache.org
Subject: Re: Custom docker executor

Hi Kapil,

Thanks these information are very useful.

About failure callbacks in Chronos, I agree it's indeed something that Chronos 
should just provide as it's a common feature for job dependency managers. We 
can have more discussions on your github issue.
[kmalik] : Agreed. I am also thinking of raising a PR with marathon like 
functionality. This might be simplest for now.

Marathon doesn't have a failure callback as it simply keeps retry for a long 
running service, do you need a callback for every failed attempt? And for the 
event stream if too much information perhaps Marathon can even provide a filter 
so you can only subscribe to events you care about. Feel free to leave a issue 
on github and we can discuss there as well.
[kmalik] : Hmm, right actually I realize in my case it’s more of a health 
check. A filter will be good indeed nevertheless, sure will open an issue.

About custom health checks, we've already provided the ability in Mesos to 
provide us a command and you can specfiy the time interval and grace period 
that health check can tolerate, frameworks just need to integrate with this 
feature. Marathon already has, which is using a command to health check a job. 
However this feature is not yet supported for Docker containers, and I will be 
looking into this soon. Does providing a command that runs in the docker 
container for health checks sufficient for you?
[kmalik] : True, mesos has good options for health check impl. But marathon / 
chronos integrate in limited capacity today. Since I don’t want to write a new 
framework, I thought of achieving it via custom executor atleast.
Yes, custom health check support via docker exec will be nice. Something on 
lines of kubernetes health probes (being greedy ☺ ).

About spark jobs, have you considered using the new Cluster mode on Spark Mesos 
instead (in Spark 1.4)? I've worked on that feature and it launches
all your Spark jobs as Mesos tasks similar to Marathon and Chronos, but it's 
built into Spark and can speak Spark submit protocol natively.
[kmalik] : I see, sure will go through this. Thanks
Anyhow, can you specify more exactly what orphan Spark workers are you 
referring to? I believe if you launch Spark jobs with talking to Mesos there 
shouldn't be any need for Spark workers since that's for Standalone mode.
[kmalik] : Pardon my use of incorrect terminology. I meant – the chronos job 
(running spark driver inside a docker) was killed, but the ‘spark framework’ 
which it registered with mesos was still alive. Since my API server deals only 
with chronos / marathon, it had no way to identify and cleanup the spark 
framework registered by the docker.

Tim







On Sat, Aug 8, 2015 at 3:45 AM, Kapil Malik 
<kma...@adobe.com<mailto:kma...@adobe.com>> wrote:
Hi Tim,

Thank you for the quick reply. As I mentioned, we need to run short lived 
(using Chronos currently) and long lived (using Marathon currently) jobs.
While I cannot provide elaborate details, objectively, our requirements include 
the following –


1.       Failure callback for jobs scheduled on chronos / marathon

a.       Chronos doesn’t provide decent callback hooks today (please correct me 
if I am wrong). https://github.com/mesos/chronos/issues/473 . Even for a 
successfully completed job, I need to have a dependent job which makes a call 
to my service.

b.      Marathon has option of subscribing to event bus 
https://mesosphere.github.io/marathon/docs/event-bus.html , but we are afraid 
it might result in information overload, sending all sorts of events.

2.       Custom health checks
This is not a pre/post hook per se. But for a long running (finite = chronos/ 
infinite = marathon) job, we need some periodic health checks. Marathon has a 
basic HTTP health check, good for start, but we may need slightly more 
elaborate health checks. Again, Chronos doesn’t have them at all.

3.       Managing spark jobs
Users of our API can submit docker images, which run a spark job on the mesos. 
Thus, the spark driver runs inside the user docker on Chronos / marathon, and 
registers another mesos framework for spark, running on other mesos slaves.
Now, in real world with huge amounts of data, it often happens that Spark job 
fails for one reason or another. This leaves some orphaned spark workers and 
needs manual clean up. With a custom executor, we may pass a hint that it’s a 
spark job docker so need to ensure appropriate cleanup in case of failure.

So when you mentioned “hooks that can be performed pre and post container 
launch”, can you provide some examples? Are they available as plugins / 
extensions on mesos or docker?

@Mike Michel, thank you for the 
powerstrip<https://github.com/ClusterHQ/powerstrip> link. Looks quite useful, 
will go through it in detail and see whether it can serve some of our 
requirements.

Thanks and regards,

Kapil Malik | kma...@adobe.com<mailto:kma...@adobe.com> | 33430 / 8800836581

From: Tim Chen [mailto:t...@mesosphere.io<mailto:t...@mesosphere.io>]
Sent: 08 August 2015 13:42
To: user@mesos.apache.org<mailto:user@mesos.apache.org>
Subject: Re: Custom docker executor

Hi Kapil,

What kind of pre/post actions do you like to perform?

The community has been contributing hooks that can be performed pre and post 
container launch, so like to see what your use cases are
and perhaps the new hooks can satisfy your need, or maybe even some other way 
that can already do what you like to achieve.

Tim

On Sat, Aug 8, 2015 at 1:01 AM, Kapil Malik 
<kma...@adobe.com<mailto:kma...@adobe.com>> wrote:
… posting in a fresh thread
Hi,

We have a usecase to run multi-user workloads on mesos. Users provide docker 
images encapsulating application logic, which we (we = say some “Central API”) 
schedule on Chronos / Marathon. However, we need to run some standard pre / 
post steps for every docker submitted by users. We have following options –


1.       Ask every user to embed their logic inside a pre-defined docker 
template which will perform pre/post steps.

==> This is error prone, makes us dependent on whether the users followed 
template, and not very popular with users either.



2.       Extend every user docker (FROM <>) and find a way to add pre-post 
steps in our docker. Refer this docker when scheduling on chronos / marathon.

==> Building new dockers does not scale as users and applications grow



3.       Write a custom executor which will perform the pre-post steps and 
manage the user docker lifetime.

==> Deals with user docker lifetime and is obviously complex.

Is there a standard / openly available DockerExecutor which manages the docker 
lifetime and which I can extend to build my custom executor?
For instance, do you suggest extending 
https://github.com/apache/mesos/blob/master/src/docker/executor.cpp as a 
starting point? Can I access it in Java?

This way I will be concerned only with my custom logic (pre/post steps) and 
still get benefits of a standard way to manage docker containers.


Thanks and regards,

Kapil Malik | kma...@adobe.com<mailto:kma...@adobe.com> | 33430 / 8800836581



Reply via email to