[
https://issues.apache.org/jira/browse/YARN-10896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032969#comment-18032969
]
ASF GitHub Bot commented on YARN-10896:
---------------------------------------
github-actions[bot] closed pull request #5436: YARN-10896 RM fail over is not
reporting the nodes DECOMMISSIONED
URL: https://github.com/apache/hadoop/pull/5436
> RM fail over is not reporting the nodes DECOMMISSIONED
> -------------------------------------------------------
>
> Key: YARN-10896
> URL: https://issues.apache.org/jira/browse/YARN-10896
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Sushil Ks
> Assignee: Sushil Ks
> Priority: Major
> Labels: pull-request-available
> Attachments: YARN-10896.001.patch
>
>
> Whenever we add the host entries into the exclude file in order to
> DECOMMISSION the Nodemanager, we would issue the *yarn rmadmin -refreshNodes*
> command to transition the nodes from RUNNING to DECOMMISSIONED state. However
> if the fail over to standby resource manager happens and the exclude file has
> the list of hosts to be disallowed, then these disallowed nodes are never
> seen through the Cluster Metrics on the new active resource manager.
> Whatever host entries that are present in the exclude files are being listed
> in the Cluster Metrics whenever resource manager is restarted, i.e as part of
> the service init of *NodeListManager* , however during fail over this info is
> lost. Hence this patch tries to set the *DECOMMISSIONED* nodes inside the RM
> Context so that its available through Cluster Metrics whenever we issue the
> *yarn rmadmin -refreshNodes* command.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]