[ https://issues.apache.org/jira/browse/YARN-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231126#comment-15231126 ]
Wangda Tan commented on YARN-4865: ---------------------------------- [~sunilg], It seems this patch needs one more fix: {code} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java index 9a74c22..df57787 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java @@ -1322,14 +1322,6 @@ public void completedContainer(Resource clusterResource, // Book-keeping if (removed) { - - // track reserved resource for metrics, for normal container - // getReservedResource will be null. - Resource reservedRes = rmContainer.getReservedResource(); - if (reservedRes != null && !reservedRes.equals(Resources.none())) { - decReservedResource(node.getPartition(), reservedRes); - } - // Inform the ordering policy orderingPolicy.containerReleased(application, rmContainer); diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java index cf1b3e0..558fc53 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java @@ -247,6 +247,8 @@ public synchronized boolean unreserve(Priority priority, // Update reserved metrics queue.getMetrics().unreserveResource(getUser(), rmContainer.getReservedResource()); + + queue.decReservedResource(node.getPartition(), rmContainer.getReservedResource()); return true; } return false; {code} We need above change to make sure allocation from reserved container will correctly deduct reserved resource. [~sunilg], could you add few tests also? And some other cases in my mind that we need to consider: - Nodes lost / disconnected, we need to deduct reserved resources on such nodes. (I think it should covered by completedContainer code path) Above can be addressed in a separate JIRA. > Track Reserved resources in ResourceUsage and QueueCapacities > -------------------------------------------------------------- > > Key: YARN-4865 > URL: https://issues.apache.org/jira/browse/YARN-4865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.7.2 > Reporter: Sunil G > Assignee: Sunil G > Fix For: 2.9.0 > > Attachments: 0001-YARN-4865.patch, 0002-YARN-4865.patch, > 0003-YARN-4865.patch > > > As discussed in YARN-4678, capture reserved capacity separately in > QueueCapcities for better tracking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)