[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sunil G updated YARN-3849: -------------------------- Attachment: 0003-YARN-3849.patch Thank you [~leftnoteasy] for the comments. Uploading a patch addressing the issues. Regarding one comment, bq.testPreemptionWithVCoreResource seems not correct, root.used != A.used + b.used {noformat} "root(=[100:200 100:200 100:200 100:200],x=[100:200 100:200 100:200 100:200]);" "-a(=[50:100 100:200 20:40 50:100],x=[50:100 100:200 80:160 50:100]);" + // a "-b(=[50:100 100:200 80:160 50:100],x=[50:100 100:200 20:40 50:100])"; {noformat} Here now root.used = a.used+b.used. Please help to check. > Too much of preemption activity causing continuos killing of containers > across queues > ------------------------------------------------------------------------------------- > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 2.7.0 > Reporter: Sunil G > Assignee: Sunil G > Priority: Critical > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)