Hi Meng, What would be the recommendation for framework authors on when to use UNSUPPRESS vs CLEAR_FILTER?
Also, should it CLEAR_FILTERS instead of CLEAR_FILTER? On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <m...@mesosphere.com> wrote: > Hi: > > tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and > clear_filter in order to decouple the dual-semantics of the current revive > call. > > As pointed out in the Mesos framework scalability guide > <http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability>, > utilizing the suppress > <http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress> > call is the key to get your cluster to a large number of frameworks > <https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf>. > In short, when a framework is idling with no intention to launch any tasks, > it should suppress to inform the Mesos to stop sending any more offers. And > the framework should revive > <http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive> > when new work arrives. This way, the allocator will skip the framework when > performing resource allocations. As a result, thorny issues such as offer > starvation and resource fragmentation would be greatly mitigated. > > That being said. The suppress/revive calls currently are a little bit > unwieldy due to MESOS-9028 > <https://issues.apache.org/jira/browse/MESOS-9028>: > > The revive call has two semantics. It unsuppresses the framework AND > clears all the existing filters. The later makes the revive call > non-idempotent. And sometimes users may want to keep the existing filters > when reiving which is not possible atm. > > To decouple the semantics, as suggested in the ticket, we propose to add > two new V1 scheduler calls: > > (1) `UNSUPPRESS` call requests the Mesos to resume sending offers; > (2) `CLEAR_FILTER` call will explicitly clear all the existing filters. > > To make life easier, both calls will return 200 OK (as opposed to 202 > returned by most existing scheduler calls, including `SUPPRESS` and > `REVIVE`). > > We will keep the revive call and its semantics (i.e. unsupppress AND clear > filters) for backward compatibility. > > Note, the changes are proposed for V1 API only. Thus, once the changes are > landed, framework developers are encouraged to move to V1 API to take > advantage of the new calls (among many other benefits). > > Any feedback/comments are welcome. > > -Meng >