Hi Neil, I really like the idea of incorporating the concept of fault domains in Mesos, however I feel like the implementation proposed is a bit narrow to be actually useful for most users.
I feel like we could make the fault domains definition more generic. As an example in our setup we would like to have something like Region > Building > Cage > Pod > Rack. Failure domains would be hierarchically arranged (meaning one domain in a lower level can only be included in one domain above). As a concrete example, we could have the mesos masters be aware of the fault domain hierarchy (with a config map for example), and slaves would just need to declare their lowest-level domain (for example their rack id). Then frameworks could use this domain hierarchy at will. If they need to "spread" their tasks for a very highly available setup, they could first spread using the highest fault domain (like the region), then if they have enough tasks to launch they could spread within each sub-domain recursively until they run out of tasks to spread. We do not need to artificially limit the number of levels of fault domains and the name of the fault domains. Schedulers do not need to know the names either, just the hierarchy. Then, to provide the other feature of "remote" slaves that you describe, we could configure the mesos master to only send offers from a "default" local fault domain, and frameworks would need to advertise a certain capability to receive offers for other remote fault domains. I feel we could implement this by identifying a fault domain with a simple list of ids like ["US-WEST-1", "Building 2", "Cage 3", "POD 12", "Rack 3"] or ["US-EAST-2", "Building 1"]. Slaves would advertise their lowest-level fault domains and schedulers could use this arbitrarily as a hierarchical list. Thanks, Maxime On Mon, Apr 17, 2017 at 6:45 PM Neil Conway <neil.con...@gmail.com> wrote: > Folks, > > I'd like to enhance Mesos to support a first-class notion of "fault > domains" -- i.e., identifying the "rack" and "region" (DC) where a > Mesos agent or master is located. The goal is to enable two main > features: > > (1) To make it easier to write "rack-aware" Mesos frameworks that are > portable to different Mesos clusters. > > (2) To improve the experience of configuring Mesos with a set of > masters and agents in one DC, and another pool of "remote" agents in a > different DC. > > For more information, please see the design doc: > > > https://docs.google.com/document/d/1gEugdkLRbBsqsiFv3urRPRNrHwUC-i1HwfFfHR_MvC8 > > I'd love any feedback, either directly on the Google doc or via email. > > Thanks, > Neil >