[ https://issues.apache.org/jira/browse/YARN-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wangda Tan updated YARN-4189: ----------------------------- Attachment: YARN-4189 design v1.pdf > Capacity Scheduler : Improve location preference waiting mechanism > ------------------------------------------------------------------ > > Key: YARN-4189 > URL: https://issues.apache.org/jira/browse/YARN-4189 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler > Reporter: Wangda Tan > Assignee: Wangda Tan > Attachments: YARN-4189 design v1.pdf > > > There're some issues with current Capacity Scheduler implementation of delay > scheduling: > *1) Waiting time to allocate each container highly depends on cluster > availability* > Currently, app can only increase missed-opportunity when a node has available > resource AND it gets traversed by a scheduler. There’re lots of possibilities > that an app doesn’t get traversed by a scheduler, for example: > A cluster has 2 racks (rack1/2), each rack has 40 nodes. > Node-locality-delay=40. An application prefers rack1. > Node-heartbeat-interval=1s. > Assume there are 2 nodes available on rack1, delay to allocate one container > = 40 sec. > If there are 20 nodes available on rack1, delay of allocating one container = > 2 sec. > *2) It could violate scheduling policies (Fifo/Priority/Fair)* > Assume a cluster is highly utilized, an app (app1) has higher priority, it > wants locality. And there’s another app (app2) has lower priority, but it > doesn’t care about locality. When node heartbeats with available resource, > app1 decides to wait, so app2 gets the available slot. This should be > considered as a bug that we need to fix. > The same problem could happen when we use FIFO/Fair queue policies. > Another problem similar to this is related to preemption: when preemption > policy preempts some resources from queue-A for queue-B (queue-A is > over-satisfied and queue-B is under-satisfied). But queue-B is waiting for > the node-locality-delay so queue-A will get resources back. In next round, > preemption policy could preempt this resources again from queue-A. > This JIRA is target to solve these problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)