[ https://issues.apache.org/jira/browse/YARN-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252643#comment-16252643 ]
Eric Payne commented on YARN-7496: ---------------------------------- Cluster Configuration: - Cluster Memory: 20GB - Queue1 capacity and max capacity: 50% : 100% - Queue2 capacity and max capacity: 50% : 100% - Queue1: Intra-queue preemption: enabled - Default container size: 0.5GB Use Case: - User1 submits App1 in Queue1 and consumes 12.5GB - User2 submits App2 in Queue1 and consumes 7.5GB - User3 submits App3 in Queue1 - Preemption monitor calculates user limit to be {{((total used resources in Queue1) / (number of all users)) + (1 container) = normalizeup((20GB/3),0.5GB) + 0.5GB = 7GB + 0.5GB = 7.5GB}} - Preemption monitor sees that App1 is the only one that has resources, so it tries to preempts containers down to 7.5GB from {{App1}}. - The problem comes here: Capacity Scheduler calculates user limit to be {{((total used resources in Queue1) / (number of active users)) + (1 container) = normalizeup((20GB/2),0.5GB) + 0.5GB = 10GB + 0.5GB = 10.5GB}} - Therefore, once {{App1}} gets to 10.5GB, the preemption monitor will try to preempt 2.5GB more resources from {{App1}}, but the Capacity Scheduler gives them back. This creates oscillation. > CS Intra-queue preemption user-limit calculations are not in line with > LeafQueue user-limit calculations > -------------------------------------------------------------------------------------------------------- > > Key: YARN-7496 > URL: https://issues.apache.org/jira/browse/YARN-7496 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.8.2 > Reporter: Eric Payne > Assignee: Eric Payne > > Only a problem in 2.8. > Preemption could oscillate due to the difference in how user limit is > calculated between 2.8 and later releases. > Basically (ignoring ULF, MULP, and maybe others), the calculation for user > limit on the Capacity Scheduler side in 2.8 is {{total used resources / > number of active users}} while the calculation in later releases is {{total > active resources / number of active users}}. When intra-queue preemption was > backported to 2.8, it's calculations for user limit were more aligned with > the latter algorithm, which is in 2.9 and later releases. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org