[jira] [Updated] (YARN-2299) inconsistency at identifying node

2014-11-27 Thread Bruno Alexandre Rosa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Alexandre Rosa updated YARN-2299:
---
Affects Version/s: 2.5.0
   2.5.2

> inconsistency at identifying node
> -
>
> Key: YARN-2299
> URL: https://issues.apache.org/jira/browse/YARN-2299
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.5.2
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
>
> If port of "yarn.nodemanager.address" is not specified at NM, NM will choose 
> random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
> and then restarted within "yarn.nm.liveness-monitor.expiry-interval-ms", 
> "host:port1" and "host:port2" will both be present in "Active Nodes" on WebUI 
> for a while, and after host:port1 expiration, we get host:port1 in "Lost 
> Nodes" and host:port2 in "Active Nodes". If the NM is ungracefully dead 
> again, we get only host:port1 in "Lost Nodes". "host:port2" is neither in 
> "Active Nodes" nor in  "Lost Nodes".
> Another case, two NM is running on same host(miniYarnCluster or other test 
> purpose), if both of them are lost, we get only one "Lost Nodes" in WebUI.
> In both case, sum of "Active Nodes" and "Lost Nodes" is not the number of 
> nodes we expected.
> The root cause is due to inconsistency at how we think two Nodes are 
> identical.
> When we manager active nodes(RMContextImpl.nodes), we use NodeId which 
> contains port. Two nodes with same host but different port are thought to be 
> different node.
> But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
> use host. Two nodes with same host but different port are thought to 
> identical.
> To fix the inconsistency, we should differentiate below 2 cases and be 
> consistent for both of them:
>  - intentionally multiple NMs per host
>  - NM instances one after another on same host
> Two possible solutions:
> 1) Introduce a boolean config like "one-node-per-host"(default as "true"), 
> and use host to differentiate nodes on RM if it's true.
> 2) Make it mandatory to have valid port in "yarn.nodemanager.address" config. 
>  In this sutiation, NM instances one after another on same host will have 
> same NodeId, while intentionally multiple NMs per host will have different 
> NodeId.
> Personally I prefer option 1 because it's easier for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2299) inconsistency at identifying node

2014-11-25 Thread Bruno Alexandre Rosa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225312#comment-14225312
 ] 

Bruno Alexandre Rosa commented on YARN-2299:


I tried to reproduce the first case on version 2.5.2 and the bug it is still 
present. However, instead of host:port1 showing on Lost Nodes, I got 
host:port2. In the same fashion, I lost track of host:port1. The sum of Lost 
Nodes remains inconsistent.

> inconsistency at identifying node
> -
>
> Key: YARN-2299
> URL: https://issues.apache.org/jira/browse/YARN-2299
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
>
> If port of "yarn.nodemanager.address" is not specified at NM, NM will choose 
> random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
> and then restarted within "yarn.nm.liveness-monitor.expiry-interval-ms", 
> "host:port1" and "host:port2" will both be present in "Active Nodes" on WebUI 
> for a while, and after host:port1 expiration, we get host:port1 in "Lost 
> Nodes" and host:port2 in "Active Nodes". If the NM is ungracefully dead 
> again, we get only host:port1 in "Lost Nodes". "host:port2" is neither in 
> "Active Nodes" nor in  "Lost Nodes".
> Another case, two NM is running on same host(miniYarnCluster or other test 
> purpose), if both of them are lost, we get only one "Lost Nodes" in WebUI.
> In both case, sum of "Active Nodes" and "Lost Nodes" is not the number of 
> nodes we expected.
> The root cause is due to inconsistency at how we think two Nodes are 
> identical.
> When we manager active nodes(RMContextImpl.nodes), we use NodeId which 
> contains port. Two nodes with same host but different port are thought to be 
> different node.
> But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
> use host. Two nodes with same host but different port are thought to 
> identical.
> To fix the inconsistency, we should differentiate below 2 cases and be 
> consistent for both of them:
>  - intentionally multiple NMs per host
>  - NM instances one after another on same host
> Two possible solutions:
> 1) Introduce a boolean config like "one-node-per-host"(default as "true"), 
> and use host to differentiate nodes on RM if it's true.
> 2) Make it mandatory to have valid port in "yarn.nodemanager.address" config. 
>  In this sutiation, NM instances one after another on same host will have 
> same NodeId, while intentionally multiple NMs per host will have different 
> NodeId.
> Personally I prefer option 1 because it's easier for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2299) inconsistency at identifying node

2014-11-19 Thread Bruno Alexandre Rosa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218207#comment-14218207
 ] 

Bruno Alexandre Rosa commented on YARN-2299:


Which*

> inconsistency at identifying node
> -
>
> Key: YARN-2299
> URL: https://issues.apache.org/jira/browse/YARN-2299
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
>
> If port of "yarn.nodemanager.address" is not specified at NM, NM will choose 
> random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
> and then restarted within "yarn.nm.liveness-monitor.expiry-interval-ms", 
> "host:port1" and "host:port2" will both be present in "Active Nodes" on WebUI 
> for a while, and after host:port1 expiration, we get host:port1 in "Lost 
> Nodes" and host:port2 in "Active Nodes". If the NM is ungracefully dead 
> again, we get only host:port1 in "Lost Nodes". "host:port2" is neither in 
> "Active Nodes" nor in  "Lost Nodes".
> Another case, two NM is running on same host(miniYarnCluster or other test 
> purpose), if both of them are lost, we get only one "Lost Nodes" in WebUI.
> In both case, sum of "Active Nodes" and "Lost Nodes" is not the number of 
> nodes we expected.
> The root cause is due to inconsistency at how we think two Nodes are 
> identical.
> When we manager active nodes(RMContextImpl.nodes), we use NodeId which 
> contains port. Two nodes with same host but different port are thought to be 
> different node.
> But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
> use host. Two nodes with same host but different port are thought to 
> identical.
> To fix the inconsistency, we should differentiate below 2 cases and be 
> consistent for both of them:
>  - intentionally multiple NMs per host
>  - NM instances one after another on same host
> Two possible solutions:
> 1) Introduce a boolean config like "one-node-per-host"(default as "true"), 
> and use host to differentiate nodes on RM if it's true.
> 2) Make it mandatory to have valid port in "yarn.nodemanager.address" config. 
>  In this sutiation, NM instances one after another on same host will have 
> same NodeId, while intentionally multiple NMs per host will have different 
> NodeId.
> Personally I prefer option 1 because it's easier for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2299) inconsistency at identifying node

2014-11-19 Thread Bruno Alexandre Rosa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218195#comment-14218195
 ] 

Bruno Alexandre Rosa commented on YARN-2299:


What are the affected versions?

> inconsistency at identifying node
> -
>
> Key: YARN-2299
> URL: https://issues.apache.org/jira/browse/YARN-2299
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
>
> If port of "yarn.nodemanager.address" is not specified at NM, NM will choose 
> random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
> and then restarted within "yarn.nm.liveness-monitor.expiry-interval-ms", 
> "host:port1" and "host:port2" will both be present in "Active Nodes" on WebUI 
> for a while, and after host:port1 expiration, we get host:port1 in "Lost 
> Nodes" and host:port2 in "Active Nodes". If the NM is ungracefully dead 
> again, we get only host:port1 in "Lost Nodes". "host:port2" is neither in 
> "Active Nodes" nor in  "Lost Nodes".
> Another case, two NM is running on same host(miniYarnCluster or other test 
> purpose), if both of them are lost, we get only one "Lost Nodes" in WebUI.
> In both case, sum of "Active Nodes" and "Lost Nodes" is not the number of 
> nodes we expected.
> The root cause is due to inconsistency at how we think two Nodes are 
> identical.
> When we manager active nodes(RMContextImpl.nodes), we use NodeId which 
> contains port. Two nodes with same host but different port are thought to be 
> different node.
> But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
> use host. Two nodes with same host but different port are thought to 
> identical.
> To fix the inconsistency, we should differentiate below 2 cases and be 
> consistent for both of them:
>  - intentionally multiple NMs per host
>  - NM instances one after another on same host
> Two possible solutions:
> 1) Introduce a boolean config like "one-node-per-host"(default as "true"), 
> and use host to differentiate nodes on RM if it's true.
> 2) Make it mandatory to have valid port in "yarn.nodemanager.address" config. 
>  In this sutiation, NM instances one after another on same host will have 
> same NodeId, while intentionally multiple NMs per host will have different 
> NodeId.
> Personally I prefer option 1 because it's easier for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)