[ https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054629#comment-14054629 ]
Sandy Ryza commented on YARN-2026: ---------------------------------- I had a conversation with [~kkambatl] about this, and he convinced me that we should turn this on in all cases - i.e. modify FairSharePolicy and DominantResourceFairnessPolicy instead of creating additional policies. Sorry to vacillate on this. Some additional comments on the code: {code} + return this.getNumRunnableApps() > 0; {code} {code} + || (sched instanceof FSQueue && ((FSQueue) sched).isActive())) { {code} Instead of using instanceof, can we add an isActive method to Schedulable, and always return true for it in AppSchedulable? {code} + out.println(" <queue name=\"childA1\" />"); + out.println(" <queue name=\"childA2\" />"); + out.println(" <queue name=\"childA3\" />"); + out.println(" <queue name=\"childA4\" />"); + out.println(" <queue name=\"childA5\" />"); + out.println(" <queue name=\"childA6\" />"); + out.println(" <queue name=\"childA7\" />"); + out.println(" <queue name=\"childA8\" />"); {code} Do we need this many children? {code} + out.println("</queue>"); + + out.println("</allocations>"); {code} Unnecessary newline {code} + public void testFairShareActiveOnly_ShareResetsToZeroWhenAppsComplete() {code} Take out underscore. {code} + private void setupCluster(int mem, int vCores) throws IOException { {code} Give this method a name that's more descriptive of the kind of configuration it's setting up. {code} + private void setupCluster(int nodeMem) throws IOException { {code} Can this call the setupCluster that takes two arguments? To help with the fight against TestFairScheduler becoming a monstrosity, the tests should go into a new test file. TestFairSchedulerPreemption is a good example of how to do this. {code} + int nodeVcores = 10; {code} Nit: "nodeVCores" > Fair scheduler : Fair share for inactive queues causes unfair allocation in > some scenarios > ------------------------------------------------------------------------------------------ > > Key: YARN-2026 > URL: https://issues.apache.org/jira/browse/YARN-2026 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler > Reporter: Ashwin Shankar > Assignee: Ashwin Shankar > Labels: scheduler > Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt > > > Problem1- While using hierarchical queues in fair scheduler,there are few > scenarios where we have seen a leaf queue with least fair share can take > majority of the cluster and starve a sibling parent queue which has greater > weight/fair share and preemption doesn’t kick in to reclaim resources. > The root cause seems to be that fair share of a parent queue is distributed > to all its children irrespective of whether its an active or an inactive(no > apps running) queue. Preemption based on fair share kicks in only if the > usage of a queue is less than 50% of its fair share and if it has demands > greater than that. When there are many queues under a parent queue(with high > fair share),the child queue’s fair share becomes really low. As a result when > only few of these child queues have apps running,they reach their *tiny* fair > share quickly and preemption doesn’t happen even if other leaf > queues(non-sibling) are hogging the cluster. > This can be solved by dividing fair share of parent queue only to active > child queues. > Here is an example describing the problem and proposed solution: > root.lowPriorityQueue is a leaf queue with weight 2 > root.HighPriorityQueue is parent queue with weight 8 > root.HighPriorityQueue has 10 child leaf queues : > root.HighPriorityQueue.childQ(1..10) > Above config,results in root.HighPriorityQueue having 80% fair share > and each of its ten child queue would have 8% fair share. Preemption would > happen only if the child queue is <4% (0.5*8=4). > Lets say at the moment no apps are running in any of the > root.HighPriorityQueue.childQ(1..10) and few apps are running in > root.lowPriorityQueue which is taking up 95% of the cluster. > Up till this point,the behavior of FS is correct. > Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% > of the cluster. It would get only the available 5% in the cluster and > preemption wouldn't kick in since its above 4%(half fair share).This is bad > considering childQ1 is under a highPriority parent queue which has *80% fair > share*. > Until root.lowPriorityQueue starts relinquishing containers,we would see the > following allocation on the scheduler page: > *root.lowPriorityQueue = 95%* > *root.HighPriorityQueue.childQ1=5%* > This can be solved by distributing a parent’s fair share only to active > queues. > So in the example above,since childQ1 is the only active queue > under root.HighPriorityQueue, it would get all its parent’s fair share i.e. > 80%. > This would cause preemption to reclaim the 30% needed by childQ1 from > root.lowPriorityQueue after fairSharePreemptionTimeout seconds. > Problem2 - Also note that similar situation can happen between > root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 > hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck > at 5%,until childQ2 starts relinquishing containers. We would like each of > childQ1 and childQ2 to get half of root.HighPriorityQueue fair share ie > 40%,which would ensure childQ1 gets upto 40% resource if needed through > preemption. -- This message was sent by Atlassian JIRA (v6.2#6252)