[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257315#comment-15257315
 ] 

Daniel Zhi commented on YARN-4676:
----------------------------------

1. For client-side timeout tracking, I assume you are talking about the 
"private int refreshNodes(long timeout)" method in RMAdminCLI.java, where the 
code continuously (every second) checks and waits for all decommissioning nodes 
to become decommissioned. For any remaining decommissioning nodes upon timeout, 
the client will send FORCEFUL decommission request. The code remains mostly the 
same except as RM also got the timeout (through RefreshNodesRequest()) and 
enforces such timeout, normally the client won't need to explicitly invoke 
FORCEFUL decommission as nodes will become DECOMMISSIONED by then. (Should 
server for some reason didn't turn the node into DECOMMISSIONED, client will 
force it). Does the combined behavior appear fine to you?

2. I am not very familiar with YARN internals when 
yarn.resourcemanager.recovery.enabled is true. My understanding of current (pre 
YARN-4676) behavior is: when RM restarts, NodesListManager creates a pseudo 
RMNodeImpl for excluded node and DECOMMISSION the node right away. Further, any 
invalid node will be rejected and told to SHUTDOWN inside 
registerNodeManager(). So when recovery is enabled, and RM restart during 
DECOMMISSIONING, although applications and containers are likely resumed, 
DECOMMISSIONING nodes will be DECOMMISSIONed right away.

RM state store does not appear to serialize and restore RmNode but instead, 
after RM restart, RESYNC is replied in nodeHeartbeat() and new RmNode is 
created in following registerNodeManager(). So decommissioning start time get 
lost. To possibly resume the DECOMMISSIONING nodes, the decommissioning start 
time, possibly the DECOMMISSIONING state need to be stored and restored. 
I am not very familiar with RM state store but it appears non-trivial work 
involved, the cost-benefits justifications would also depend on how essential 
to resume DECOMMISSIONING nodes after RM restart. And if so whether it's better 
to create and handle it in a separate task/JIRA.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>         Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, 
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, 
> YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch, 
> YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to