Thanks for the replies!

@Sharma - yes, I was talking about multiple tasks on multiples slaves, with
each task assigned to a single slave. Indeed, our framework has a thread
that puts lots of (mostly) small tasks into a queue, so the thread that
handles resourceOffers can pop tasks from the queue to fill up the offers.
I agree that this is not limiting. It was just surprising to get all these
TASK_LOST statuses, given the lacking docstring.

@Michael - I did my best following the "contributing code" (or
documentation in this case) guidelines. hope it's OK. opened a JIRA issue
<https://issues.apache.org/jira/browse/MESOS-2525> and a review request
<https://reviews.apache.org/r/32306/>.

@Adam - wow, that's fascinating! I would love to get some validation about
this point - can anyone say how the actual messages from scheduler to
master are handled by mesos in case of multiple calls to launchTasks in the
context of a single resourceOffers invocation? can I simply not think about
optimizing for network when calling launchTasks?

@All
So, if I understand correctly now, the multiple offers feature is meant to
allow a scheduler to hold on to offers (across resourceOffers calls), as
long as they are not rescinded, and eventually launch a "big task" that
uses the sum of all collected offers from a specific slave?
If this is the case:
1. cool :-)
2. I would imagine there could be a "cleaner" way (from a framework author
perspective) to do that, by setting a policy or a filter or something, that
communicates to the master that the scheduler would like to receive only
offers that meet some criteria (e.g. min_cpu, min_mem, etc.). effectively,
moving the complexity of holding on to offers from the framework to the
master.
Is such a thing possible in mesos? Was it an explicit design decision to
keep such logic at framework level?

Thanks,
- Itamar.

On Fri, Mar 20, 2015 at 10:24 AM, Adam Bordelon <a...@mesosphere.io> wrote:

> Keep in mind that you can call launchTasks() multiple times (once per
> slave) within the same resourceOffers callback in your scheduler, and due
> to the actor nature of libprocess, they will all be sent at the same time
> when resourceOffers returns to the SchedulerDriver. I'm not familiar enough
> with the internals of libprocess to know if/how it batches all of those
> messages together when transferring them to the master's libprocess actor,
> but it appears they are split again by the time they reach the master's
> mailbox, since the master will get one launchTasks callback per original
> launchTasks call.
>
> On Thu, Mar 19, 2015 at 12:03 PM, Michael Park <mcyp...@gmail.com> wrote:
>
>> Hi Itamar,
>>
>> Wow, thanks for bringing this up!
>>
>> The intended behavior is for *launchTasks* to take a set of tasks to be
>> launched on a *single *slave. This means that the multiple offers passed
>> to *launchTasks* must be from the *same *slave. The Python documentation
>> absolutely should state this explicitly as it does for acceptOffers
>> <https://github.com/apache/mesos/blob/master/src/python/interface/src/mesos/interface/__init__.py#L204-L213>
>>  as
>> well as C++ *launchTasks
>> <https://github.com/apache/mesos/blob/2985ae05634038b70f974bbfed6b52fe47231418/include/mesos/scheduler.hpp#L226-L237>*.
>> If you would create a review request for this that would be awesome!
>>
>> Now, if this *is* the intended behavior, it raises the question - why
>>> does launchTasks() support a set of tasks? doesn't mesos already aggregate
>>> resources from the same slave to a single offer?
>>
>>
>> The primary use case of this feature is to allow frameworks to hold onto
>> offers and use them in conjunction with other offers from the same slave
>> later on.
>>
>> MPark.
>>
>> On 19 March 2015 at 14:28, Sharma Podila <spod...@netflix.com> wrote:
>>
>>> I will assume that you are not talking of the case that a task actually
>>> is being launched on multiple salves, since a task can only be launched on
>>> one slave with existing concepts.
>>>
>>> Yes, that call is for one or more tasks on a single slave. That call
>>> (since 0.18, I believe) also takes multiple offers of the same slave, which
>>> can happen due to tasks finishing at different times on the host.
>>>
>>> I have seen discussion on batching status updates/acks. But, not on
>>> batching launching of tasks across multiple slaves. From a user
>>> perspective, I'd imagine that this should be possible. It would be useful
>>> for frameworks with high rate of task dispatching.
>>>
>>> I suspect (purely my opinion) that this model may have come up in the
>>> beginning when most frameworks were scheduling one task at a time before
>>> moving to the next pending task. My framework, for example, runs a
>>> scheduling loop/iteration and comes up with schedules for multiple tasks
>>> across one more slaves. I would find it useful as well to batch up task
>>> launches across multiple hosts.
>>>
>>> That said, I haven't found the existing method to be limiting in
>>> performance/latency for our needs at this time.
>>>
>>>
>>>
>>> On Thu, Mar 19, 2015 at 8:19 AM, Itamar Ostricher <ita...@yowza3d.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> According to the Python interface docstring
>>>> <https://github.com/apache/mesos/blob/master/src/python/interface/src/mesos/interface/__init__.py#L184-L193>,
>>>> launchTasks() may be called with a set of tasks.
>>>>
>>>> In our framework, we thought this is used to issue a single RPC for
>>>> launching many tasks onto many offers (potentially from many slaves), as an
>>>> optimization (e.g., less communication overhead).
>>>>
>>>> But, when running with multiple slaves, we saw that tasks are lost when
>>>> they are assigned to different slaves with the same launchTasks() call.
>>>>
>>>> Reading the docstring of launchTasks carefully, I still couldn't figure
>>>> out that this is the intended behavior, so I'm here to verify that.
>>>> If that's by design, it should be stated clearly in the docstring (I'd
>>>> be happy to provide a documentation pull request for this).
>>>>
>>>> Now, if this *is* the intended behavior, it raises the question - why
>>>> does launchTasks() support a set of tasks? doesn't mesos already aggregate
>>>> resources from the same slave to a single offer?
>>>>
>>>> Thanks,
>>>> - Itamar.
>>>>
>>>
>>>
>>
>

Reply via email to