[ 
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Alexandre Rosa updated YARN-2299:
---------------------------------------
    Affects Version/s: 2.5.0
                       2.5.2

> inconsistency at identifying node
> ---------------------------------
>
>                 Key: YARN-2299
>                 URL: https://issues.apache.org/jira/browse/YARN-2299
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.5.0, 2.5.2
>            Reporter: Hong Zhiguo
>            Assignee: Hong Zhiguo
>            Priority: Critical
>
> If port of "yarn.nodemanager.address" is not specified at NM, NM will choose 
> random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
> and then restarted within "yarn.nm.liveness-monitor.expiry-interval-ms", 
> "host:port1" and "host:port2" will both be present in "Active Nodes" on WebUI 
> for a while, and after host:port1 expiration, we get host:port1 in "Lost 
> Nodes" and host:port2 in "Active Nodes". If the NM is ungracefully dead 
> again, we get only host:port1 in "Lost Nodes". "host:port2" is neither in 
> "Active Nodes" nor in  "Lost Nodes".
> Another case, two NM is running on same host(miniYarnCluster or other test 
> purpose), if both of them are lost, we get only one "Lost Nodes" in WebUI.
> In both case, sum of "Active Nodes" and "Lost Nodes" is not the number of 
> nodes we expected.
> The root cause is due to inconsistency at how we think two Nodes are 
> identical.
> When we manager active nodes(RMContextImpl.nodes), we use NodeId which 
> contains port. Two nodes with same host but different port are thought to be 
> different node.
> But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
> use host. Two nodes with same host but different port are thought to 
> identical.
> To fix the inconsistency, we should differentiate below 2 cases and be 
> consistent for both of them:
>  - intentionally multiple NMs per host
>  - NM instances one after another on same host
> Two possible solutions:
> 1) Introduce a boolean config like "one-node-per-host"(default as "true"), 
> and use host to differentiate nodes on RM if it's true.
> 2) Make it mandatory to have valid port in "yarn.nodemanager.address" config. 
>  In this sutiation, NM instances one after another on same host will have 
> same NodeId, while intentionally multiple NMs per host will have different 
> NodeId.
> Personally I prefer option 1 because it's easier for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to