[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970078#comment-14970078 ]
Wangda Tan commented on YARN-3223: ---------------------------------- [~brookz], Thanks for working on this JIRA, I took a look at the patch, some comments: I think the general approach is fine, SchedulerNode keep updating total resource to used resource if it is decommissioning state, so available resource will be 0. But I think there're some other places need to take care: - Suggest to use {{CapacityScheduler#updateNodeAndQueueResource}} to update resources, we need to update queue's resource, cluster metrics as well. - When async scheduling enabled, we need to make sure decommissioing node's total resource is updated so no new container will be allocated on these nodes. And after this patch, I think we need to add total decommisioning nodes resources to cluster metrics (equals to sum(decommisioning-node.used-resource)). > Resource update during NM graceful decommission > ----------------------------------------------- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager > Affects Versions: 2.7.1 > Reporter: Junping Du > Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)