Bernerd, You should really out Marathon https://github.com/mesosphere/marathon This fits closely for what you've described ;)
On Wed, Sep 18, 2013 at 4:36 AM, Bernerd Schaefer <bern...@soundcloud.com>wrote: > I'm curious to learn what's been going on in Mesos (and the general > ecosystem) around > service scheduling. In particular, I'm curious about how Mesos might work > in a > cluster where service tasks are more common than batch tasks, e.g., a > cluster > with a single framework for running stateless tasks and many frameworks for > running stateful tasks. > > I haven't been able to find much information about how exactly service > scheduling fits with Mesos -- the dialogue is certainly skewed towards > ephemeral / batch scheduling at the moment. With that in mind, I've tried > to > outline some topics I've been thinking about recently. What I'm really > curious > to know is: > > 1. Am I way off track? > 2. For a service scheduler built today, how much is Mesos responsible for > and > how much the framework? What about going forward? > 3. Are there already some patterns/idioms for these kinds of things in > existing > frameworks? > > # Balancing tasks within a framework > > For this, imagine a framework that schedules long-lived (service), > stateless > tasks. > > - If asked to schedule a task with comparatively large resource > requirements, > the task may never get scheduled if it waits for a sufficiently large > resource offer. Instead, it should attempt to reschedule existing tasks > to > "make room" for it. How might that work? > > - If asked to schedule multiple copies of a task across different machines, > some copies may never get scheduled if it waits for a sufficiently > diverse > set of resource offers. Instead, it should reschedule existing tasks to > meet the availability requirements of the task. What might that look > like? > > Maybe both of these could be accomplished by using some combination of: > > - using `requestResources` when large tasks are requested to try and get > bigger > offers. > > - using saved offers to relaunch existing tasks, and then hoarding the > freed > resources for scheduling new tasks. > > # Resource contention / balancing tasks across frameworks > > For this, imagine there are two frameworks, one like above, running > stateless > service tasks, the other responsible for a single stateful task. Again, the > cluster is relatively full. > > - If the stateful scheduler wants to run its task on a particular machine, > but > that machine's resources are currently consumed by the other framework, > what > happens? > > - If the stateful scheduler can run its task on any machine, but there > exists > no single offer sufficiently large to run the task, what does it do? > > Some possible ways to approach this: > > - The ability to request that other frameworks release their saved offers, > as > the resources may actually be available, but currently hoarded. I think > `requestResources` on the scheduler might do this? > > - The ability to request that other frameworks reschedule existing tasks. > This > could be a "user-land" feature? If I have a particular slave in mind to > run > my task and there is a way to find frameworks with tasks on that slave, I > could randomly send some kind of "reschedule" message to one of the > frameworks. This message might include the slave, my requested > resources, and > a priority understood by all of my frameworks. The other framework could > then > compare its priority with the message, and decide whether it should > reschedule. > > Cheers, > > Bernerd > Engineer @ SoundCloud >