Jonathan Hung created YARN-6818:
-----------------------------------
Summary: User limit per partition is not honored in branch-2.7 >=
Key: YARN-6818
URL: https://issues.apache.org/jira/browse/YARN-6818
Project: Hadoop YARN
Issue Type: Bug
Reporter: Jonathan Hung
Assignee: Jonathan Hung
We are seeing an issue where user limit factor does not cap the amount of
resources a user can consume in a queue in a partition. Suppose you have a
queue with access to partition X, used resources in default partition is 0, and
used resources in partition X is at the partition's user limit. This is the
problematic code as far as I can tell: (in LeafQueue.java){noformat} if
(Resources
.greaterThan(resourceCalculator, clusterResource,
user.getUsed(label),
limit)) {
// if enabled, check to see if could we potentially use this node instead
// of a reserved node if the application has reserved containers
if (this.reservationsContinueLooking) {
if (Resources.lessThanOrEqual(
resourceCalculator,
clusterResource,
Resources.subtract(user.getUsed(),
application.getCurrentReservation()),
limit)) {
if (LOG.isDebugEnabled()) {
LOG.debug("User " + userName + " in queue " + getQueueName()
+ " will exceed limit based on reservations - " + " consumed: "
+ user.getUsed() + " reserved: "
+ application.getCurrentReservation() + " limit: " + limit);
}
Resource amountNeededToUnreserve =
Resources.subtract(user.getUsed(label), limit);
// we can only acquire a new container if we unreserve first since we
ignored the
// user limit. Choose the max of user limit or what was previously
set by max
// capacity.
currentResoureLimits.setAmountNeededUnreserve(Resources.max(resourceCalculator,
clusterResource, currentResoureLimits.getAmountNeededUnreserve(),
amountNeededToUnreserve));
return true;
}
}
if (LOG.isDebugEnabled()) {
LOG.debug("User " + userName + " in queue " + getQueueName()
+ " will exceed limit - " + " consumed: "
+ user.getUsed() + " limit: " + limit);
}
return false;
}
{noformat}
First it sees the used resources in partition X is greater than partition's
user limit. Then the reservation check also succeeds because it is checking
{{user.getUsed() - application.getCurrentReservation() <= limit}} and returns
true.
One fix is to just set {{Resources.subtract(user.getUsed(),
application.getCurrentReservation())}} to
{{Resources.subtract(user.getUsed(label),
application.getCurrentReservation())}}.
This doesn't seem to be a problem in branch-2.8 and higher since YARN-3356
introduces this check: {noformat} if (this.reservationsContinueLooking &&
checkReservations
&& label.equals(CommonNodeLabelsManager.NO_LABEL)) {{noformat}
so in this case getting the used resources in default partition seems to be
correct.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]