[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257315#comment-15257315 ]
Daniel Zhi commented on YARN-4676: ---------------------------------- 1. For client-side timeout tracking, I assume you are talking about the "private int refreshNodes(long timeout)" method in RMAdminCLI.java, where the code continuously (every second) checks and waits for all decommissioning nodes to become decommissioned. For any remaining decommissioning nodes upon timeout, the client will send FORCEFUL decommission request. The code remains mostly the same except as RM also got the timeout (through RefreshNodesRequest()) and enforces such timeout, normally the client won't need to explicitly invoke FORCEFUL decommission as nodes will become DECOMMISSIONED by then. (Should server for some reason didn't turn the node into DECOMMISSIONED, client will force it). Does the combined behavior appear fine to you? 2. I am not very familiar with YARN internals when yarn.resourcemanager.recovery.enabled is true. My understanding of current (pre YARN-4676) behavior is: when RM restarts, NodesListManager creates a pseudo RMNodeImpl for excluded node and DECOMMISSION the node right away. Further, any invalid node will be rejected and told to SHUTDOWN inside registerNodeManager(). So when recovery is enabled, and RM restart during DECOMMISSIONING, although applications and containers are likely resumed, DECOMMISSIONING nodes will be DECOMMISSIONed right away. RM state store does not appear to serialize and restore RmNode but instead, after RM restart, RESYNC is replied in nodeHeartbeat() and new RmNode is created in following registerNodeManager(). So decommissioning start time get lost. To possibly resume the DECOMMISSIONING nodes, the decommissioning start time, possibly the DECOMMISSIONING state need to be stored and restored. I am not very familiar with RM state store but it appears non-trivial work involved, the cost-benefits justifications would also depend on how essential to resume DECOMMISSIONING nodes after RM restart. And if so whether it's better to create and handle it in a separate task/JIRA. > Automatic and Asynchronous Decommissioning Nodes Status Tracking > ---------------------------------------------------------------- > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Affects Versions: 2.8.0 > Reporter: Daniel Zhi > Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, > YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, > YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch, > YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch > > > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)