Thanks for the replies! @Sharma - yes, I was talking about multiple tasks on multiples slaves, with each task assigned to a single slave. Indeed, our framework has a thread that puts lots of (mostly) small tasks into a queue, so the thread that handles resourceOffers can pop tasks from the queue to fill up the offers. I agree that this is not limiting. It was just surprising to get all these TASK_LOST statuses, given the lacking docstring.
@Michael - I did my best following the "contributing code" (or documentation in this case) guidelines. hope it's OK. opened a JIRA issue <https://issues.apache.org/jira/browse/MESOS-2525> and a review request <https://reviews.apache.org/r/32306/>. @Adam - wow, that's fascinating! I would love to get some validation about this point - can anyone say how the actual messages from scheduler to master are handled by mesos in case of multiple calls to launchTasks in the context of a single resourceOffers invocation? can I simply not think about optimizing for network when calling launchTasks? @All So, if I understand correctly now, the multiple offers feature is meant to allow a scheduler to hold on to offers (across resourceOffers calls), as long as they are not rescinded, and eventually launch a "big task" that uses the sum of all collected offers from a specific slave? If this is the case: 1. cool :-) 2. I would imagine there could be a "cleaner" way (from a framework author perspective) to do that, by setting a policy or a filter or something, that communicates to the master that the scheduler would like to receive only offers that meet some criteria (e.g. min_cpu, min_mem, etc.). effectively, moving the complexity of holding on to offers from the framework to the master. Is such a thing possible in mesos? Was it an explicit design decision to keep such logic at framework level? Thanks, - Itamar. On Fri, Mar 20, 2015 at 10:24 AM, Adam Bordelon <a...@mesosphere.io> wrote: > Keep in mind that you can call launchTasks() multiple times (once per > slave) within the same resourceOffers callback in your scheduler, and due > to the actor nature of libprocess, they will all be sent at the same time > when resourceOffers returns to the SchedulerDriver. I'm not familiar enough > with the internals of libprocess to know if/how it batches all of those > messages together when transferring them to the master's libprocess actor, > but it appears they are split again by the time they reach the master's > mailbox, since the master will get one launchTasks callback per original > launchTasks call. > > On Thu, Mar 19, 2015 at 12:03 PM, Michael Park <mcyp...@gmail.com> wrote: > >> Hi Itamar, >> >> Wow, thanks for bringing this up! >> >> The intended behavior is for *launchTasks* to take a set of tasks to be >> launched on a *single *slave. This means that the multiple offers passed >> to *launchTasks* must be from the *same *slave. The Python documentation >> absolutely should state this explicitly as it does for acceptOffers >> <https://github.com/apache/mesos/blob/master/src/python/interface/src/mesos/interface/__init__.py#L204-L213> >> as >> well as C++ *launchTasks >> <https://github.com/apache/mesos/blob/2985ae05634038b70f974bbfed6b52fe47231418/include/mesos/scheduler.hpp#L226-L237>*. >> If you would create a review request for this that would be awesome! >> >> Now, if this *is* the intended behavior, it raises the question - why >>> does launchTasks() support a set of tasks? doesn't mesos already aggregate >>> resources from the same slave to a single offer? >> >> >> The primary use case of this feature is to allow frameworks to hold onto >> offers and use them in conjunction with other offers from the same slave >> later on. >> >> MPark. >> >> On 19 March 2015 at 14:28, Sharma Podila <spod...@netflix.com> wrote: >> >>> I will assume that you are not talking of the case that a task actually >>> is being launched on multiple salves, since a task can only be launched on >>> one slave with existing concepts. >>> >>> Yes, that call is for one or more tasks on a single slave. That call >>> (since 0.18, I believe) also takes multiple offers of the same slave, which >>> can happen due to tasks finishing at different times on the host. >>> >>> I have seen discussion on batching status updates/acks. But, not on >>> batching launching of tasks across multiple slaves. From a user >>> perspective, I'd imagine that this should be possible. It would be useful >>> for frameworks with high rate of task dispatching. >>> >>> I suspect (purely my opinion) that this model may have come up in the >>> beginning when most frameworks were scheduling one task at a time before >>> moving to the next pending task. My framework, for example, runs a >>> scheduling loop/iteration and comes up with schedules for multiple tasks >>> across one more slaves. I would find it useful as well to batch up task >>> launches across multiple hosts. >>> >>> That said, I haven't found the existing method to be limiting in >>> performance/latency for our needs at this time. >>> >>> >>> >>> On Thu, Mar 19, 2015 at 8:19 AM, Itamar Ostricher <ita...@yowza3d.com> >>> wrote: >>> >>>> Hi, >>>> >>>> According to the Python interface docstring >>>> <https://github.com/apache/mesos/blob/master/src/python/interface/src/mesos/interface/__init__.py#L184-L193>, >>>> launchTasks() may be called with a set of tasks. >>>> >>>> In our framework, we thought this is used to issue a single RPC for >>>> launching many tasks onto many offers (potentially from many slaves), as an >>>> optimization (e.g., less communication overhead). >>>> >>>> But, when running with multiple slaves, we saw that tasks are lost when >>>> they are assigned to different slaves with the same launchTasks() call. >>>> >>>> Reading the docstring of launchTasks carefully, I still couldn't figure >>>> out that this is the intended behavior, so I'm here to verify that. >>>> If that's by design, it should be stated clearly in the docstring (I'd >>>> be happy to provide a documentation pull request for this). >>>> >>>> Now, if this *is* the intended behavior, it raises the question - why >>>> does launchTasks() support a set of tasks? doesn't mesos already aggregate >>>> resources from the same slave to a single offer? >>>> >>>> Thanks, >>>> - Itamar. >>>> >>> >>> >> >