[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

zhihai xu (JIRA) Fri, 10 Oct 2014 21:08:52 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167985#comment-14167985
 ]


zhihai xu commented on YARN-2641:
---------------------------------

Sorry, I didn't describe clearly the second scenario: We need first kill the NM 
process then call refreshNodes CLI to put the node in the blacklist. To make 
the refreshNodes CLI work correctly, we need create a file for example 
"exclude_host.txt" which should have the node name to decommission, then we 
should set the "yarn.resourcemanager.nodes.exclude-path" to the file 
"exclude_host.txt".
See the code in TestResourceTrackerService.java for the configuration and node 
list file:
{code}
    conf.set(YarnConfiguration.RM_NODES_EXCLUDE_FILE_PATH, hostFile
        .getAbsolutePath());
  private void writeToHostsFile(String... hosts) throws IOException {
    if (!hostFile.exists()) {
      TEMP_DIR.mkdirs();
      hostFile.createNewFile();
    }
    FileOutputStream fStream = null;
    try {
      fStream = new FileOutputStream(hostFile);
      for (int i = 0; i < hosts.length; i++) {
        fStream.write(hosts[i].getBytes());
        fStream.write("\n".getBytes());
      }
    } finally {
      if (fStream != null) {
        IOUtils.closeStream(fStream);
        fStream = null;
      }
    }
  }
{code}

> improve node decommission latency in RM.
> ----------------------------------------
>
>                 Key: YARN-2641
>                 URL: https://issues.apache.org/jira/browse/YARN-2641
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-2641.000.patch, YARN-2641.001.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).
> This will be a much more serious issue:
> After RM is refreshed (refreshNodes), If the NM to be decommissioned is 
> killed before NM sent heartbeat to RM. The RMNode will never be 
> decommissioned in RM. The RMNode will only expire in RM after  
> "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

Reply via email to