Thanks Sharma and Bill! This is exactly the input I was looking for.

We will start by using an existing service scheduler and see where this
leads us. 

Best Regards,
Stephan

On Di, 2014-09-02 at 10:14 -0700, Bill Farner wrote:
> I'll echo Sharma's points.  While it seems simple enough to see which
> moving parts you need to implement here, the long-term effort is
> large.  I've been working on Aurora for 4.5 years, and still know of a
> lot of work we need to do.  If your use case can fit into an existing
> framework (perhaps mod a feature request/contribution here and there),
> you'll free up a lot of time to focus on the problem you're actually
> trying to solve.
> 
> -=Bill
> 
> 
> On Mon, Sep 1, 2014 at 10:45 AM, Sharma Podila <spod...@netflix.com>
> wrote:
>         I am tempted to say that the short answer is, if your option B
>         works, why bother writing your own scheduler/framework?
>         
>         
>         Writing a Mesos framework can be easy. However, writing a
>         fault tolerant Mesos framework that has good scalability, is
>         performant, and is highly available can be relatively hard.
>         Here's a few things, off the top of my head, that helped us
>         make the decision to write our own:
>               * There must be a good long term reason to write your
>                 own framework. The scheduling/preemption/allocation
>                 model you spoke of may be a good reason. For us, it
>                 was specific scheduling optimizations that are not
>                 generic and are absent in other frameworks.
>               * Fault tolerance is a combination of a few things,
>                 Here's a few to consider:
>                       * Task reconciliation with Mesos master
>                         currently will involve more than just using
>                         the reconcile feature. We augment it with
>                         heartbeats from tasks, Aurora does GC task,
>                         etc.. I believe it will take another Mesos
>                         release (or two?) before we can rely solely on
>                         Mesos task reconciliation.
>                       * Framework itself must be highly available, for
>                         example, using ZooKeeper leader election among
>                         multiple framework instances. 
>                       * Fault tolerant persistence of task states. For
>                         example, when Mesos calls your framework with
>                         a status update of a task, that state must be
>                         reliably persisted.
>               * It sounds like achieving fair share allocation via
>                 preemptions is important to you. That "external
>                 entity" you refer to may be non-trivial in the long
>                 run. If you were to embark on writing your own
>                 framework, another model to consider is to just have
>                 one framework scheduler instance for all users. Then,
>                 put the preemptions and fair share logic inside it.
>                 There could be complexities such as,
>                 for heterogeneous mix of task and slave resource
>                 sizes, scaling down an arbitrary number of tasks from
>                 user A doesn't imply they will benefit user B. The
>                 scheduler can perform this better than an external
>                 entity, by only preempting the right ones, etc.
>                       * That said, for simpler use cases, it may work
>                         just fine to have an external entity.
>               * Scheduling itself is a hard problem. And can slow down
>                 quickly when doing anything more than first-fit style,
>                 by adding a few constraints and SLAs. Preemptions, for
>                 example, can slow down the scheduler in figuring out
>                 the right tasks to preempt to honor the fair share
>                 SLAs. That is, assuming you have more than a few
>                 hundred tasks. 
>               * There were a few talks at MesosCon, ten days ago, on
>                 this topic including one from us. The video/slides
>                 from the conference should be available from MesosCon
>                 sometime soon. 
>         
>         
>         
>         
>         
>         
>         On Sun, Aug 31, 2014 at 7:51 AM, Stephan Erb
>         <step...@dev.static-void.de> wrote:
>                 Hi everybody,
>                 
>                 I would like to assess the effort required to write a
>                 custom framework.
>                 
>                 Background: We have an application where we can start
>                 a flexible number
>                 of long-running worker processes performing
>                 number-crunching. The more
>                 processes the better. However, we have multiple users,
>                 each running an
>                 instance of the application and therefore competing
>                 for resources (as
>                 each tries to run as many worker processes as
>                 possible).
>                 
>                 For various reasons, we would like to run our
>                 application instances on
>                 top of mesos. There seem to be two ways to achieve
>                 this:
>                 
>                      A. Write a custom framework for our application
>                 that spawns the
>                         worker processes on demand. Each user gets to
>                 run one framework
>                         instance. We also need preemption of workers
>                 to achieve equality
>                         among frameworks. We could achieve this using
>                 an external entity
>                         monitoring all frameworks and telling to worst
>                 offenders to
>                         scale down a little.
>                      B. Instead of writing a framework, use a
>                 Service-Scheduler like
>                         Marathon, Aurora or Singularity to spawn the
>                 worker processes.
>                         Instead of just performing the scale-down, the
>                 external entity
>                         would dictate the number of worker processes
>                 for each
>                         application depending on its demand.
>                 
>                 
>                 The first choice seems to be the natural fit for
>                 Mesos. However,
>                 existing framework like Aurora seem to be
>                 battle-tested in regard to
>                 high availability, race conditions and issues like
>                 state reconciliation
>                 where the world view of scheduler and slaves are
>                 drifting apart.
>                 
>                 So this question boils down to: When considering to
>                 write a custom
>                 framework, which pitfalls do I have to be aware of?
>                 Can I come away with
>                 blindly implementing the scheduler API? Or do I always
>                 have to implement
>                 stuff like custom state-reconciliation in order to
>                 prevent orphaned
>                 tasks on slaves (for example, when my framework
>                 scheduler crashes or is
>                 temporarily unavailable)?
>                 
>                 Thanks for your input!
>                 
>                 Best Regards,
>                 Stephan
>                 
>                 
>                 
>                 
>         
>         
> 
> 


Reply via email to