[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608947#comment-14608947 ]
Wangda Tan commented on YARN-3849: ---------------------------------- Thanks for working on this, [~sunilg]! The fix of Proportion..Policy looks good to me, some comments about test changes: - The string syntax to define resources looks great! :) - Instead of changing all test cases in TestProportional..Policy, could you make another overload method can take String[][]? This can avoid lots of changes for test cases - Initialization of ResourceCalculator should be a part of buildPolicy, for example, add a "boolean useDominateResourceCalculator" to buildPolicy - Could you change TestPro..PolicyForNodePartitions to accept CPU when doing queue/application mocking as well? > Too much of preemption activity causing continuos killing of containers > across queues > ------------------------------------------------------------------------------------- > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 2.7.0 > Reporter: Sunil G > Assignee: Sunil G > Priority: Critical > Attachments: 0001-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)