[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:
----------------------------
    Description: 
improve node decommission latency in RM. 
Currently the node decommission only happened after RM received nodeHeartbeat 
from the Node Manager. The node heartbeat interval is configurable. The default 
value is 1 second.
It will be better to do the decommission during RM Refresh(NodesListManager) 
instead of nodeHeartbeat(ResourceTrackerService).
This will be a much more serious issue:
After RM is refreshed (refreshNodes), If the NM to be decommissioned is killed 
before NM sent heartbeat to RM. The RMNode will never be decommissioned in RM. 
The RMNode will only expire in RM after  
"yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time.

  was:
improve node decommission latency in RM. 
Currently the node decommission only happened after RM received nodeHeartbeat 
from the Node Manager. The node heartbeat interval is configurable. The default 
value is 1 second.
It will be better to do the decommission during RM Refresh(NodesListManager) 
instead of nodeHeartbeat(ResourceTrackerService).


> improve node decommission latency in RM.
> ----------------------------------------
>
>                 Key: YARN-2641
>                 URL: https://issues.apache.org/jira/browse/YARN-2641
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-2641.000.patch, YARN-2641.001.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).
> This will be a much more serious issue:
> After RM is refreshed (refreshNodes), If the NM to be decommissioned is 
> killed before NM sent heartbeat to RM. The RMNode will never be 
> decommissioned in RM. The RMNode will only expire in RM after  
> "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to