Hi Meng,

What would be the recommendation for framework authors on when to use
UNSUPPRESS vs CLEAR_FILTER?

Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?

On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <m...@mesosphere.com> wrote:

> Hi:
>
> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
> clear_filter in order to decouple the dual-semantics of the current revive
> call.
>
> As pointed out in the Mesos framework scalability guide
> <http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability>,
> utilizing the suppress
> <http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> call is the key to get your cluster to a large number of frameworks
> <https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf>.
> In short, when a framework is idling with no intention to launch any tasks,
> it should suppress to inform the Mesos to stop sending any more offers. And
> the framework should revive
> <http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> when new work arrives. This way, the allocator will skip the framework when
> performing resource allocations. As a result, thorny issues such as offer
> starvation and resource fragmentation would be greatly mitigated.
>
> That being said. The suppress/revive calls currently are a little bit
> unwieldy due to MESOS-9028
> <https://issues.apache.org/jira/browse/MESOS-9028>:
>
> The revive call has two semantics. It unsuppresses the framework AND
> clears all the existing filters. The later makes the revive call
> non-idempotent. And sometimes users may want to keep the existing filters
> when reiving which is not possible atm.
>
> To decouple the semantics, as suggested in the ticket, we propose to add
> two new V1 scheduler calls:
>
> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> (2) `CLEAR_FILTER` call will explicitly clear all the existing filters.
>
> To make life easier, both calls will return 200 OK (as opposed to 202
> returned by most existing scheduler calls, including `SUPPRESS` and
> `REVIVE`).
>
> We will keep the revive call and its semantics (i.e. unsupppress AND clear
> filters) for backward compatibility.
>
> Note, the changes are proposed for V1 API only. Thus, once the changes are
> landed, framework developers are encouraged to move to V1 API to take
> advantage of the new calls (among many other benefits).
>
> Any feedback/comments are welcome.
>
> -Meng
>

Reply via email to