[ https://issues.apache.org/jira/browse/YARN-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086662#comment-16086662 ]
Jonathan Hung commented on YARN-6818: ------------------------------------- Hi, [~Naganarasimha], attached a patch for branch-2.7. > User limit per partition is not honored in branch-2.7 >= > -------------------------------------------------------- > > Key: YARN-6818 > URL: https://issues.apache.org/jira/browse/YARN-6818 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Jonathan Hung > Assignee: Jonathan Hung > Attachments: YARN-6818-branch-2.7.001.patch > > > We are seeing an issue where user limit factor does not cap the amount of > resources a user can consume in a queue in a partition. Suppose you have a > queue with access to partition X, used resources in default partition is 0, > and used resources in partition X is at the partition's user limit. This is > the problematic code as far as I can tell: (in LeafQueue.java){noformat} > if (Resources > .greaterThan(resourceCalculator, clusterResource, > user.getUsed(label), > limit)) { > // if enabled, check to see if could we potentially use this node > instead > // of a reserved node if the application has reserved containers > if (this.reservationsContinueLooking) { > if (Resources.lessThanOrEqual( > resourceCalculator, > clusterResource, > Resources.subtract(user.getUsed(), > application.getCurrentReservation()), > limit)) { > if (LOG.isDebugEnabled()) { > LOG.debug("User " + userName + " in queue " + getQueueName() > + " will exceed limit based on reservations - " + " consumed: > " > + user.getUsed() + " reserved: " > + application.getCurrentReservation() + " limit: " + limit); > } > Resource amountNeededToUnreserve = > Resources.subtract(user.getUsed(label), limit); > // we can only acquire a new container if we unreserve first since > we ignored the > // user limit. Choose the max of user limit or what was previously > set by max > // capacity. > > currentResoureLimits.setAmountNeededUnreserve(Resources.max(resourceCalculator, > clusterResource, > currentResoureLimits.getAmountNeededUnreserve(), > amountNeededToUnreserve)); > return true; > } > } > if (LOG.isDebugEnabled()) { > LOG.debug("User " + userName + " in queue " + getQueueName() > + " will exceed limit - " + " consumed: " > + user.getUsed() + " limit: " + limit); > } > return false; > } > {noformat} > First it sees the used resources in partition X is greater than partition's > user limit. Then the reservation check also succeeds because it is checking > {{user.getUsed() - application.getCurrentReservation() <= limit}} and returns > true. > One fix is to just set {{Resources.subtract(user.getUsed(), > application.getCurrentReservation())}} to > {{Resources.subtract(user.getUsed(label), > application.getCurrentReservation())}}. > This doesn't seem to be a problem in branch-2.8 and higher since YARN-3356 > introduces this check: {noformat} if (this.reservationsContinueLooking > && checkReservations > && label.equals(CommonNodeLabelsManager.NO_LABEL)) {{noformat} > so in this case getting the used resources in default partition seems to be > correct. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org