I'll echo Sharma's points.  While it seems simple enough to see which
moving parts you need to implement here, the long-term effort is large.
 I've been working on Aurora for 4.5 years, and still know of a lot of work
we need to do.  If your use case can fit into an existing framework
(perhaps mod a feature request/contribution here and there), you'll free up
a lot of time to focus on the problem you're actually trying to solve.

-=Bill


On Mon, Sep 1, 2014 at 10:45 AM, Sharma Podila <spod...@netflix.com> wrote:

> I am tempted to say that the short answer is, if your option B works, why
> bother writing your own scheduler/framework?
>
> Writing a Mesos framework can be easy. However, writing a fault tolerant
> Mesos framework that has good scalability, is performant, and is highly
> available can be relatively hard. Here's a few things, off the top of my
> head, that helped us make the decision to write our own:
>
>    - There must be a good long term reason to write your own framework.
>    The scheduling/preemption/allocation model you spoke of may be a good
>    reason. For us, it was specific scheduling optimizations that are not
>    generic and are absent in other frameworks.
>    - Fault tolerance is a combination of a few things, Here's a few to
>    consider:
>       - Task reconciliation with Mesos master currently will involve more
>       than just using the reconcile feature. We augment it with heartbeats 
> from
>       tasks, Aurora does GC task, etc.. I believe it will take another Mesos
>       release (or two?) before we can rely solely on Mesos task 
> reconciliation.
>       - Framework itself must be highly available, for example, using
>       ZooKeeper leader election among multiple framework instances.
>       - Fault tolerant persistence of task states. For example, when
>       Mesos calls your framework with a status update of a task, that state 
> must
>       be reliably persisted.
>    - It sounds like achieving fair share allocation via preemptions is
>    important to you. That "external entity" you refer to may be non-trivial in
>    the long run. If you were to embark on writing your own framework, another
>    model to consider is to just have one framework scheduler instance for all
>    users. Then, put the preemptions and fair share logic inside it. There
>    could be complexities such as, for heterogeneous mix of task and slave
>    resource sizes, scaling down an arbitrary number of tasks from user A
>    doesn't imply they will benefit user B. The scheduler can perform this
>    better than an external entity, by only preempting the right ones, etc.
>       - That said, for simpler use cases, it may work just fine to have
>       an external entity.
>    - Scheduling itself is a hard problem. And can slow down quickly when
>    doing anything more than first-fit style, by adding a few constraints and
>    SLAs. Preemptions, for example, can slow down the scheduler in figuring out
>    the right tasks to preempt to honor the fair share SLAs. That is, assuming
>    you have more than a few hundred tasks.
>    - There were a few talks at MesosCon, ten days ago, on this topic
>    including one from us. The video/slides from the conference should be
>    available from MesosCon sometime soon.
>
>
>
>
>
> On Sun, Aug 31, 2014 at 7:51 AM, Stephan Erb <step...@dev.static-void.de>
> wrote:
>
>> Hi everybody,
>>
>> I would like to assess the effort required to write a custom framework.
>>
>> Background: We have an application where we can start a flexible number
>> of long-running worker processes performing number-crunching. The more
>> processes the better. However, we have multiple users, each running an
>> instance of the application and therefore competing for resources (as
>> each tries to run as many worker processes as possible).
>>
>> For various reasons, we would like to run our application instances on
>> top of mesos. There seem to be two ways to achieve this:
>>
>>      A. Write a custom framework for our application that spawns the
>>         worker processes on demand. Each user gets to run one framework
>>         instance. We also need preemption of workers to achieve equality
>>         among frameworks. We could achieve this using an external entity
>>         monitoring all frameworks and telling to worst offenders to
>>         scale down a little.
>>      B. Instead of writing a framework, use a Service-Scheduler like
>>         Marathon, Aurora or Singularity to spawn the worker processes.
>>         Instead of just performing the scale-down, the external entity
>>         would dictate the number of worker processes for each
>>         application depending on its demand.
>>
>>
>> The first choice seems to be the natural fit for Mesos. However,
>> existing framework like Aurora seem to be battle-tested in regard to
>> high availability, race conditions and issues like state reconciliation
>> where the world view of scheduler and slaves are drifting apart.
>>
>> So this question boils down to: When considering to write a custom
>> framework, which pitfalls do I have to be aware of? Can I come away with
>> blindly implementing the scheduler API? Or do I always have to implement
>> stuff like custom state-reconciliation in order to prevent orphaned
>> tasks on slaves (for example, when my framework scheduler crashes or is
>> temporarily unavailable)?
>>
>> Thanks for your input!
>>
>> Best Regards,
>> Stephan
>>
>>
>>
>>
>>
>

Reply via email to