Hi Neil,

I really like the idea of incorporating the concept of fault domains in
Mesos, however I feel like the implementation proposed is a bit narrow to
be actually useful for most users.

I feel like we could make the fault domains definition more generic. As an
example in our setup we would like to have something like Region > Building
> Cage > Pod > Rack. Failure domains would be hierarchically arranged
(meaning one domain in a lower level can only be included in one domain
above).

As a concrete example, we could have the mesos masters be aware of the
fault domain hierarchy (with a config map for example), and slaves would
just need to declare their lowest-level domain (for example their rack id).
Then frameworks could use this domain hierarchy at will. If they need to
"spread" their tasks for a very highly available setup, they could first
spread using the highest fault domain (like the region), then if they have
enough tasks to launch they could spread within each sub-domain recursively
until they run out of tasks to spread. We do not need to artificially limit
the number of levels of fault domains and the name of the fault domains.
Schedulers do not need to know the names either, just the hierarchy.

Then, to provide the other feature of "remote" slaves that you describe, we
could configure the mesos master to only send offers from a "default" local
fault domain, and frameworks would need to advertise a certain capability
to receive offers for other remote fault domains.

I feel we could implement this by identifying a fault domain with a simple
list of ids like ["US-WEST-1", "Building 2", "Cage 3", "POD 12", "Rack 3"]
or ["US-EAST-2", "Building 1"]. Slaves would advertise their lowest-level
fault domains and schedulers could use this arbitrarily as a hierarchical
list.

Thanks,
Maxime

On Mon, Apr 17, 2017 at 6:45 PM Neil Conway <neil.con...@gmail.com> wrote:

> Folks,
>
> I'd like to enhance Mesos to support a first-class notion of "fault
> domains" -- i.e., identifying the "rack" and "region" (DC) where a
> Mesos agent or master is located. The goal is to enable two main
> features:
>
> (1) To make it easier to write "rack-aware" Mesos frameworks that are
> portable to different Mesos clusters.
>
> (2) To improve the experience of configuring Mesos with a set of
> masters and agents in one DC, and another pool of "remote" agents in a
> different DC.
>
> For more information, please see the design doc:
>
>
> https://docs.google.com/document/d/1gEugdkLRbBsqsiFv3urRPRNrHwUC-i1HwfFfHR_MvC8
>
> I'd love any feedback, either directly on the Google doc or via email.
>
> Thanks,
> Neil
>

Reply via email to