[jira] [Commented] (YARN-4119) Expose the NM bind address as an env, so that AM can make use of it for exposing tracking URL

2015-09-08 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734333#comment-14734333
 ] 

Naganarasimha G R commented on YARN-4119:
-

Synced up offline with [~varun_saxena] , When i modified the description of 
MAPREDUCE-5938, was not able to find the Jira with this issue (MAPREDUCE-6402). 
Anyway as i have started working on it, i will continue to finish it. Thanks 
[~varun_saxena].

>  Expose the NM bind address as an env, so that AM can make use of it for 
> exposing tracking URL
> --
>
> Key: YARN-4119
> URL: https://issues.apache.org/jira/browse/YARN-4119
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> As described in MAPREDUCE-5938, In many security scanning tools its not 
> advisable to bind on all network addresses and would be good to bind only on 
> the desired address. As AM's can run on any of the nodes it would be better 
> for NM to share its bind address as part of Environment variables to the 
> container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734514#comment-14734514
 ] 

Hudson commented on YARN-4121:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1092 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1092/])
YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai 
Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
* hadoop-yarn-project/CHANGES.txt


> Typos in capacity scheduler documentation.
> --
>
> Key: YARN-4121
> URL: https://issues.apache.org/jira/browse/YARN-4121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4121.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734369#comment-14734369
 ] 

Hudson commented on YARN-4121:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #354 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/354/])
YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai 
Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
* hadoop-yarn-project/CHANGES.txt


> Typos in capacity scheduler documentation.
> --
>
> Key: YARN-4121
> URL: https://issues.apache.org/jira/browse/YARN-4121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4121.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml

2015-09-08 Thread forrestchen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

forrestchen updated YARN-4022:
--
Fix Version/s: (was: 2.7.1)

> queue not remove from webpage(/cluster/scheduler) when delete queue in 
> xxx-scheduler.xml
> 
>
> Key: YARN-4022
> URL: https://issues.apache.org/jira/browse/YARN-4022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: forrestchen
>
> When I delete an existing queue by modify the xxx-schedule.xml, I can still 
> see the queue information block in webpage(/cluster/scheduler) though the 
> 'Min Resources' items all become to zero and have no item of 'Max Running 
> Applications'.
> I can still submit an application to the deleted queue and the application 
> will run using 'root.default' queue instead, but submit to an un-exist queue 
> will cause an exception.
> My expectation is the deleted queue will not displayed in webpage and submit 
> application to the deleted queue will act just like the queue doesn't exist.
> PS: There's no application running in the queue I delete.
> Some related config in yarn-site.xml:
> {code}
> 
> yarn.scheduler.fair.user-as-default-queue
> false
> 
> 
> yarn.scheduler.fair.allow-undeclared-pools
> false
> 
> {code}
> a related question is here: 
> http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml

2015-09-08 Thread forrestchen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

forrestchen updated YARN-4022:
--
Affects Version/s: (was: 2.7.1)

> queue not remove from webpage(/cluster/scheduler) when delete queue in 
> xxx-scheduler.xml
> 
>
> Key: YARN-4022
> URL: https://issues.apache.org/jira/browse/YARN-4022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: forrestchen
>
> When I delete an existing queue by modify the xxx-schedule.xml, I can still 
> see the queue information block in webpage(/cluster/scheduler) though the 
> 'Min Resources' items all become to zero and have no item of 'Max Running 
> Applications'.
> I can still submit an application to the deleted queue and the application 
> will run using 'root.default' queue instead, but submit to an un-exist queue 
> will cause an exception.
> My expectation is the deleted queue will not displayed in webpage and submit 
> application to the deleted queue will act just like the queue doesn't exist.
> PS: There's no application running in the queue I delete.
> Some related config in yarn-site.xml:
> {code}
> 
> yarn.scheduler.fair.user-as-default-queue
> false
> 
> 
> yarn.scheduler.fair.allow-undeclared-pools
> false
> 
> {code}
> a related question is here: 
> http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode

2015-09-08 Thread Jian He (JIRA)
Jian He created YARN-4127:
-

 Summary: RM fail with noAuth error if switched from non-failover 
mode to failover mode 
 Key: YARN-4127
 URL: https://issues.apache.org/jira/browse/YARN-4127
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He


The scenario is that RM failover was initially enabled, so the zkRootNodeAcl is 
by default set with the *RM ID* in the ACL string 

If RM failover is then switched to be disabled,  it cannot load data from ZK 
and fail with noAuth error. After I reset the root node ACL, it again can 
access.
{code}
15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to load/recover 
state
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
  at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
  at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
  at 
org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
  at 
org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
  at 
org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
  at 
org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
  at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
  at 
org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
  at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
{code}
 the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
connect with ZK and thus fail with no Auth error.

We should be able to switch failover on and off with no interruption to the 
user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4081) Add support for multiple resource types in the Resource class

2015-09-08 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4081:

Attachment: YARN-4081-YARN-3926.007.patch

Fix the whitespace issue and addressed some checkstyle issues.

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch, 
> YARN-4081-YARN-3926.002.patch, YARN-4081-YARN-3926.003.patch, 
> YARN-4081-YARN-3926.004.patch, YARN-4081-YARN-3926.005.patch, 
> YARN-4081-YARN-3926.006.patch, YARN-4081-YARN-3926.007.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4128) Correct logs in capacity scheduler while printing priority is acceptable for a queue

2015-09-08 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena resolved YARN-4128.

Resolution: Not A Problem

Did not have latest code.
This has been fixed in YARN-3970 which has been recently committed.
So closing it.

> Correct logs in capacity scheduler while printing priority is acceptable for 
> a queue
> 
>
> Key: YARN-4128
> URL: https://issues.apache.org/jira/browse/YARN-4128
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Priority: Trivial
>
> Spaces are missing between queuename and "for" and application id and "for".
> {noformat}
> [IPC Server handler 0 on 33140]: INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Priority '0' is acceptable in queue :varunqfor 
> application:application_1441653547287_0003for the user: varun
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734299#comment-14734299
 ] 

Hudson commented on YARN-2019:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #341 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/341/])
YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: 
rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734298#comment-14734298
 ] 

Hudson commented on YARN-2884:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #341 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/341/])
YARN-2884. Added a proxy service in NM to proxy the the communication between 
AM and RM. Contributed by Kishore Chaliparambil (jianhe: rev 
6f72f1e6003ab11679bebeb96f27f1f62b3b3e02)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestAMRMProxyService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/DefaultRequestInterceptor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AbstractRequestInterceptor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/PassThroughRequestInterceptor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockRequestInterceptor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyTokenSecretManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerSecurityUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/RequestInterceptor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyService.java


> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Fix For: 2.8.0
>
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V11.patch, YARN-2884-V12.patch, YARN-2884-V13.patch, 
> YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, 
> YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, 
> YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4087) Followup fixes after YARN-2019 regarding RM behavior when state-store error occurs

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734300#comment-14734300
 ] 

Hudson commented on YARN-4087:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #341 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/341/])
YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: 
rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


> Followup fixes after YARN-2019 regarding RM behavior when state-store error 
> occurs
> --
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-4087-branch-2.6.patch, YARN-4087.1.patch, 
> YARN-4087.2.patch, YARN-4087.3.patch, YARN-4087.5.patch, YARN-4087.6.patch, 
> YARN-4087.7.patch
>
>
> Several fixes:
> 1. Set YARN_FAIL_FAST to be false by default, since this makes more sense in 
> production environment.
> 2. If HA is enabled and if there's any state-store error, after the retry 
> operation failed, we always transition RM to standby state.  Otherwise, we 
> may see two active RMs running. YARN-4107 is one example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734343#comment-14734343
 ] 

Hudson commented on YARN-4121:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8413 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8413/])
YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai 
Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md


> Typos in capacity scheduler documentation.
> --
>
> Key: YARN-4121
> URL: https://issues.apache.org/jira/browse/YARN-4121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4121.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734399#comment-14734399
 ] 

Hadoop QA commented on YARN-4126:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 36s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 52s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  52m 57s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 38s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754582/0002-YARN-4126.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6f72f1e |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9028/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9028/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9028/console |


This message was automatically generated.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4126:
---
Attachment: 0001-YARN-4126.patch

Uploading patch for the same if you have started working on it please do 
reassign.


> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4126:
---
Attachment: 0002-YARN-4126.patch

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734437#comment-14734437
 ] 

Hudson commented on YARN-4121:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #361 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/361/])
YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai 
Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md


> Typos in capacity scheduler documentation.
> --
>
> Key: YARN-4121
> URL: https://issues.apache.org/jira/browse/YARN-4121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4121.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734292#comment-14734292
 ] 

Naganarasimha G R commented on YARN-4126:
-

Yes [~bibinchundatt] you are right, check is there but else case should return 
false, Missed to see this !.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reassigned YARN-4126:
--

Assignee: Bibin A Chundatt

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: YARN-3943.000.patch

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events

2015-09-08 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-4129:
---

 Summary: Refactor the SystemMetricPublisher in RM to better 
support newer events
 Key: YARN-4129
 URL: https://issues.apache.org/jira/browse/YARN-4129
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R


Currently to add new timeline event/ entity in RM side, one has to add a method 
in publisher and a method in handler and create a new event class which looks 
cumbersome and redundant. also further all the events might not be required to 
be published in V1 & V2. So adopting the approach similar to what was adopted 
in YARN-3045(NM side)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4128) Correct logs in capacity scheduler while printing priority is acceptable for a queue

2015-09-08 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-4128:
--

 Summary: Correct logs in capacity scheduler while printing 
priority is acceptable for a queue
 Key: YARN-4128
 URL: https://issues.apache.org/jira/browse/YARN-4128
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Saxena
Priority: Trivial


Spaces are missing between queuename and "for" and application id and "for".
{noformat}
[IPC Server handler 0 on 33140]: INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Priority '0' is acceptable in queue :varunqfor 
application:application_1441653547287_0003for the user: varun
{noformat}

Relevant log in CapacityScheduler#checkAndGetApplicationPriority
{code}
LOG.info("Priority '" + appPriority.getPriority()
+ "' is acceptable in queue : " + queueName + " for application: "
+ applicationId + " for the user: " + user);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4128) Correct logs in capacity scheduler while printing priority is acceptable for a queue

2015-09-08 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4128:
---
Description: 
Spaces are missing between queuename and "for" and application id and "for".
{noformat}
[IPC Server handler 0 on 33140]: INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Priority '0' is acceptable in queue :varunqfor 
application:application_1441653547287_0003for the user: varun
{noformat}


  was:
Spaces are missing between queuename and "for" and application id and "for".
{noformat}
[IPC Server handler 0 on 33140]: INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Priority '0' is acceptable in queue :varunqfor 
application:application_1441653547287_0003for the user: varun
{noformat}

Relevant log in CapacityScheduler#checkAndGetApplicationPriority
{code}
LOG.info("Priority '" + appPriority.getPriority()
+ "' is acceptable in queue : " + queueName + " for application: "
+ applicationId + " for the user: " + user);
{code}


> Correct logs in capacity scheduler while printing priority is acceptable for 
> a queue
> 
>
> Key: YARN-4128
> URL: https://issues.apache.org/jira/browse/YARN-4128
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Priority: Trivial
>
> Spaces are missing between queuename and "for" and application id and "for".
> {noformat}
> [IPC Server handler 0 on 33140]: INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Priority '0' is acceptable in queue :varunqfor 
> application:application_1441653547287_0003for the user: varun
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events

2015-09-08 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4129:

Attachment: YARN-4129.YARN-2928.001.patch

Hi [~djp] & [~sjlee0],
Similar to the approach followed in NM side (YARN-3045), i have modified for RM 
side too.
Have modified the approach as follows
 * Extract all the public methods of SystemMetricPublisher to a interface and 
kept the interface name same SystemMetricsPublisher (or can modify to any other 
if suggested)
* create 2 implementations of ithe interface, one for V1 and one for V2, thus 
if some events need not be handled then particular version implementation can 
just ignore and return back
* In the specific implementation, if required to publish an timeline 
event/entity, they can create it and set to a common async event and give it to 
Asyncdispatcher to dispatch. 
* specific handlers are created for V1 & V2  to publish the events

I am attaching a patch for this issue. Please review.

> Refactor the SystemMetricPublisher in RM to better support newer events
> ---
>
> Key: YARN-4129
> URL: https://issues.apache.org/jira/browse/YARN-4129
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4129.YARN-2928.001.patch
>
>
> Currently to add new timeline event/ entity in RM side, one has to add a 
> method in publisher and a method in handler and create a new event class 
> which looks cumbersome and redundant. also further all the events might not 
> be required to be published in V1 & V2. So adopting the approach similar to 
> what was adopted in YARN-3045(NM side)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734407#comment-14734407
 ] 

Hudson commented on YARN-2884:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2303 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2303/])
YARN-2884. Added a proxy service in NM to proxy the the communication between 
AM and RM. Contributed by Kishore Chaliparambil (jianhe: rev 
6f72f1e6003ab11679bebeb96f27f1f62b3b3e02)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AbstractRequestInterceptor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerSecurityUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/DefaultRequestInterceptor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestAMRMProxyService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/RequestInterceptor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/PassThroughRequestInterceptor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockRequestInterceptor.java


> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Fix For: 2.8.0
>
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V11.patch, YARN-2884-V12.patch, YARN-2884-V13.patch, 
> YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, 
> YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, 
> YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml

2015-09-08 Thread forrestchen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

forrestchen updated YARN-4022:
--
Labels:   (was: YARN patch)

> queue not remove from webpage(/cluster/scheduler) when delete queue in 
> xxx-scheduler.xml
> 
>
> Key: YARN-4022
> URL: https://issues.apache.org/jira/browse/YARN-4022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: forrestchen
>
> When I delete an existing queue by modify the xxx-schedule.xml, I can still 
> see the queue information block in webpage(/cluster/scheduler) though the 
> 'Min Resources' items all become to zero and have no item of 'Max Running 
> Applications'.
> I can still submit an application to the deleted queue and the application 
> will run using 'root.default' queue instead, but submit to an un-exist queue 
> will cause an exception.
> My expectation is the deleted queue will not displayed in webpage and submit 
> application to the deleted queue will act just like the queue doesn't exist.
> PS: There's no application running in the queue I delete.
> Some related config in yarn-site.xml:
> {code}
> 
> yarn.scheduler.fair.user-as-default-queue
> false
> 
> 
> yarn.scheduler.fair.allow-undeclared-pools
> false
> 
> {code}
> a related question is here: 
> http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734303#comment-14734303
 ] 

Hudson commented on YARN-2884:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #353 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/353/])
YARN-2884. Added a proxy service in NM to proxy the the communication between 
AM and RM. Contributed by Kishore Chaliparambil (jianhe: rev 
6f72f1e6003ab11679bebeb96f27f1f62b3b3e02)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerSecurityUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestAMRMProxyService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/PassThroughRequestInterceptor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockRequestInterceptor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/RequestInterceptor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AbstractRequestInterceptor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/DefaultRequestInterceptor.java


> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Fix For: 2.8.0
>
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V11.patch, YARN-2884-V12.patch, YARN-2884-V13.patch, 
> YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, 
> YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, 
> YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode

2015-09-08 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-4127:
--

Assignee: Varun Saxena

> RM fail with noAuth error if switched from non-failover mode to failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Varun Saxena
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml

2015-09-08 Thread forrestchen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

forrestchen updated YARN-4022:
--
Attachment: YARN-4022.001.patch

> queue not remove from webpage(/cluster/scheduler) when delete queue in 
> xxx-scheduler.xml
> 
>
> Key: YARN-4022
> URL: https://issues.apache.org/jira/browse/YARN-4022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: forrestchen
> Attachments: YARN-4022.001.patch
>
>
> When I delete an existing queue by modify the xxx-schedule.xml, I can still 
> see the queue information block in webpage(/cluster/scheduler) though the 
> 'Min Resources' items all become to zero and have no item of 'Max Running 
> Applications'.
> I can still submit an application to the deleted queue and the application 
> will run using 'root.default' queue instead, but submit to an un-exist queue 
> will cause an exception.
> My expectation is the deleted queue will not displayed in webpage and submit 
> application to the deleted queue will act just like the queue doesn't exist.
> PS: There's no application running in the queue I delete.
> Some related config in yarn-site.xml:
> {code}
> 
> yarn.scheduler.fair.user-as-default-queue
> false
> 
> 
> yarn.scheduler.fair.allow-undeclared-pools
> false
> 
> {code}
> a related question is here: 
> http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734477#comment-14734477
 ] 

Hadoop QA commented on YARN-4081:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 13s | Pre-patch YARN-3926 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 49s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  4s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 56s | The applied patch generated  1 
new checkstyle issues (total was 10, now 3). |
| {color:green}+1{color} | whitespace |   0m 20s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  54m 49s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 104m  2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754588/YARN-4081-YARN-3926.007.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-3926 / 1dbd8e3 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9030/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9030/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9030/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9030/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9030/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9030/console |


This message was automatically generated.

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch, 
> YARN-4081-YARN-3926.002.patch, YARN-4081-YARN-3926.003.patch, 
> YARN-4081-YARN-3926.004.patch, YARN-4081-YARN-3926.005.patch, 
> YARN-4081-YARN-3926.006.patch, YARN-4081-YARN-3926.007.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734377#comment-14734377
 ] 

Hadoop QA commented on YARN-3943:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 31s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |   9m 52s | The applied patch generated  2  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 46s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 22s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 43s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  56m 40s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754583/YARN-3943.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6f72f1e |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/9029/artifact/patchprocess/diffJavadocWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9029/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9029/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9029/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9029/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9029/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9029/console |


This message was automatically generated.

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734529#comment-14734529
 ] 

Bibin A Chundatt commented on YARN-4126:


Hi [~jianhe]
Is this change as expected by you?
Any change required other than the above is required?
Looks like testcases needs lot of correction.


> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734597#comment-14734597
 ] 

Hudson commented on YARN-4121:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #342 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/342/])
YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai 
Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md


> Typos in capacity scheduler documentation.
> --
>
> Key: YARN-4121
> URL: https://issues.apache.org/jira/browse/YARN-4121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4121.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3337) Provide YARN chaos monkey

2015-09-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734625#comment-14734625
 ] 

Steve Loughran commented on YARN-3337:
--

# For slider we have what we need: the API calls are in the AM and its config 
files
# What I do need is something generic for other apps, with Spark the one I'm 
currently looking at
# Robert's SSH-in strategy is OK for local-VM systems where I have the SSH key 
and can automated it; I remember doing something similar to test HA NNs in 
Hadoop 1.x. What SSH does well is that you can then issue a {{kill -19}} to 
suspend a process —and so test liveness monitoring.

What I can't do with his code is 
# run tests against clusters that I don't have SSH keys for (possibly including 
the jenkins builds)
# test on windows
# have some re-usable tests which I can get into ASF code for anyone to use.

API wise, force-kill-container would be enough; while my JUnit tests wouldn't 
need a CLI, test runners in different languages might

> Provide YARN chaos monkey
> -
>
> Key: YARN-3337
> URL: https://issues.apache.org/jira/browse/YARN-3337
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>
> To test failure resilience today you either need custom scripts or implement 
> Chaos Monkey-like logic in your application (SLIDER-202). 
> Killing AMs and containers on a schedule & probability is the core activity 
> here, one that could be handled by a CLI App/client lib that does this. 
> # entry point to have a startup delay before acting
> # frequency of chaos wakeup/polling
> # probability to AM failure generation (0-100)
> # probability of non-AM container kill
> # future: other operations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml

2015-09-08 Thread forrestchen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

forrestchen updated YARN-4022:
--
Attachment: YARN-4022.002.patch

Fix test bug & whitespace.

> queue not remove from webpage(/cluster/scheduler) when delete queue in 
> xxx-scheduler.xml
> 
>
> Key: YARN-4022
> URL: https://issues.apache.org/jira/browse/YARN-4022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: forrestchen
>  Labels: scheduler
> Attachments: YARN-4022.001.patch, YARN-4022.002.patch
>
>
> When I delete an existing queue by modify the xxx-schedule.xml, I can still 
> see the queue information block in webpage(/cluster/scheduler) though the 
> 'Min Resources' items all become to zero and have no item of 'Max Running 
> Applications'.
> I can still submit an application to the deleted queue and the application 
> will run using 'root.default' queue instead, but submit to an un-exist queue 
> will cause an exception.
> My expectation is the deleted queue will not displayed in webpage and submit 
> application to the deleted queue will act just like the queue doesn't exist.
> PS: There's no application running in the queue I delete.
> Some related config in yarn-site.xml:
> {code}
> 
> yarn.scheduler.fair.user-as-default-queue
> false
> 
> 
> yarn.scheduler.fair.allow-undeclared-pools
> false
> 
> {code}
> a related question is here: 
> http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734751#comment-14734751
 ] 

Hadoop QA commented on YARN-4022:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 59s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 50s | The applied patch generated  
11 new checkstyle issues (total was 85, now 94). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  54m  7s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 55s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754618/YARN-4022.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 435f935 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9035/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9035/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9035/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9035/console |


This message was automatically generated.

> queue not remove from webpage(/cluster/scheduler) when delete queue in 
> xxx-scheduler.xml
> 
>
> Key: YARN-4022
> URL: https://issues.apache.org/jira/browse/YARN-4022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: forrestchen
>  Labels: scheduler
> Attachments: YARN-4022.001.patch, YARN-4022.002.patch
>
>
> When I delete an existing queue by modify the xxx-schedule.xml, I can still 
> see the queue information block in webpage(/cluster/scheduler) though the 
> 'Min Resources' items all become to zero and have no item of 'Max Running 
> Applications'.
> I can still submit an application to the deleted queue and the application 
> will run using 'root.default' queue instead, but submit to an un-exist queue 
> will cause an exception.
> My expectation is the deleted queue will not displayed in webpage and submit 
> application to the deleted queue will act just like the queue doesn't exist.
> PS: There's no application running in the queue I delete.
> Some related config in yarn-site.xml:
> {code}
> 
> yarn.scheduler.fair.user-as-default-queue
> false
> 
> 
> yarn.scheduler.fair.allow-undeclared-pools
> false
> 
> {code}
> a related question is here: 
> http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml

2015-09-08 Thread forrestchen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

forrestchen updated YARN-4022:
--
Labels: scheduler  (was: )

> queue not remove from webpage(/cluster/scheduler) when delete queue in 
> xxx-scheduler.xml
> 
>
> Key: YARN-4022
> URL: https://issues.apache.org/jira/browse/YARN-4022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: forrestchen
>  Labels: scheduler
> Attachments: YARN-4022.001.patch
>
>
> When I delete an existing queue by modify the xxx-schedule.xml, I can still 
> see the queue information block in webpage(/cluster/scheduler) though the 
> 'Min Resources' items all become to zero and have no item of 'Max Running 
> Applications'.
> I can still submit an application to the deleted queue and the application 
> will run using 'root.default' queue instead, but submit to an un-exist queue 
> will cause an exception.
> My expectation is the deleted queue will not displayed in webpage and submit 
> application to the deleted queue will act just like the queue doesn't exist.
> PS: There's no application running in the queue I delete.
> Some related config in yarn-site.xml:
> {code}
> 
> yarn.scheduler.fair.user-as-default-queue
> false
> 
> 
> yarn.scheduler.fair.allow-undeclared-pools
> false
> 
> {code}
> a related question is here: 
> http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734560#comment-14734560
 ] 

Hadoop QA commented on YARN-4129:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m  6s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   8m 12s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 21s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 28s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 27s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 35s | The patch appears to introduce 2 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  54m 12s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m  6s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754593/YARN-4129.YARN-2928.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / e6afe26 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9031/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9031/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9031/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9031/console |


This message was automatically generated.

> Refactor the SystemMetricPublisher in RM to better support newer events
> ---
>
> Key: YARN-4129
> URL: https://issues.apache.org/jira/browse/YARN-4129
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4129.YARN-2928.001.patch
>
>
> Currently to add new timeline event/ entity in RM side, one has to add a 
> method in publisher and a method in handler and create a new event class 
> which looks cumbersome and redundant. also further all the events might not 
> be required to be published in V1 & V2. So adopting the approach similar to 
> what was adopted in YARN-3045(NM side)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]

2015-09-08 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734602#comment-14734602
 ] 

nijel commented on YARN-3771:
-

hi all,
any comment on this change ? 

> "final" behavior is not honored for 
> YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH  since it is a String[]
> 
>
> Key: YARN-3771
> URL: https://issues.apache.org/jira/browse/YARN-3771
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3771.patch
>
>
> i was going through some find bugs rules. One issue reported in that is 
>  public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = {
> and 
>   public static final String[] 
> DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH=
> is not honoring the final qualifier. The string array contents can be re 
> assigned !
> Simple test
> {code}
> public class TestClass {
>   static final String[] t = { "1", "2" };
>   public static void main(String[] args) {
> System.out.println(12 < 10);
> String[] t1={"u"};
> //t = t1; // this will show compilation  error
> t (1) = t1 (1) ; // But this works
>   }
> }
> {code}
> One option is to use Collections.unmodifiableList
> any thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734610#comment-14734610
 ] 

Hudson commented on YARN-4121:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2281 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2281/])
YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai 
Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
* hadoop-yarn-project/CHANGES.txt


> Typos in capacity scheduler documentation.
> --
>
> Key: YARN-4121
> URL: https://issues.apache.org/jira/browse/YARN-4121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4121.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4130) Duplicate declaration of ApplicationId in RMAppManager

2015-09-08 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki updated YARN-4130:
-
Attachment: YARN-4130.00.patch

> Duplicate declaration of ApplicationId in RMAppManager
> --
>
> Key: YARN-4130
> URL: https://issues.apache.org/jira/browse/YARN-4130
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Trivial
>  Labels: resourcemanager
> Attachments: YARN-4130.00.patch
>
>
> ApplicationId is declared double in {{RMAppManager}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734580#comment-14734580
 ] 

Hadoop QA commented on YARN-4022:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 28s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 39s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 25s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  52m 30s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  90m  8s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerFairShare |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754601/YARN-4022.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 435f935 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9032/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9032/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9032/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9032/console |


This message was automatically generated.

> queue not remove from webpage(/cluster/scheduler) when delete queue in 
> xxx-scheduler.xml
> 
>
> Key: YARN-4022
> URL: https://issues.apache.org/jira/browse/YARN-4022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: forrestchen
>  Labels: scheduler
> Attachments: YARN-4022.001.patch
>
>
> When I delete an existing queue by modify the xxx-schedule.xml, I can still 
> see the queue information block in webpage(/cluster/scheduler) though the 
> 'Min Resources' items all become to zero and have no item of 'Max Running 
> Applications'.
> I can still submit an application to the deleted queue and the application 
> will run using 'root.default' queue instead, but submit to an un-exist queue 
> will cause an exception.
> My expectation is the deleted queue will not displayed in webpage and submit 
> application to the deleted queue will act just like the queue doesn't exist.
> PS: There's no application running in the queue I delete.
> Some related config in yarn-site.xml:
> {code}
> 
> yarn.scheduler.fair.user-as-default-queue
> false
> 
> 
> yarn.scheduler.fair.allow-undeclared-pools
> false
> 
> {code}
> a related question is here: 
> http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3337) Provide YARN chaos monkey

2015-09-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734647#comment-14734647
 ] 

Junping Du commented on YARN-3337:
--

OK. Let me create a sub task with focusing on adding this API and CLI.

> Provide YARN chaos monkey
> -
>
> Key: YARN-3337
> URL: https://issues.apache.org/jira/browse/YARN-3337
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>
> To test failure resilience today you either need custom scripts or implement 
> Chaos Monkey-like logic in your application (SLIDER-202). 
> Killing AMs and containers on a schedule & probability is the core activity 
> here, one that could be handled by a CLI App/client lib that does this. 
> # entry point to have a startup delay before acting
> # frequency of chaos wakeup/polling
> # probability to AM failure generation (0-100)
> # probability of non-AM container kill
> # future: other operations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734611#comment-14734611
 ] 

Hudson commented on YARN-4121:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2304 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2304/])
YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai 
Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
* hadoop-yarn-project/CHANGES.txt


> Typos in capacity scheduler documentation.
> --
>
> Key: YARN-4121
> URL: https://issues.apache.org/jira/browse/YARN-4121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4121.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4130) Duplicate declaration of ApplicationId in RMAppManager

2015-09-08 Thread Kai Sasaki (JIRA)
Kai Sasaki created YARN-4130:


 Summary: Duplicate declaration of ApplicationId in RMAppManager
 Key: YARN-4130
 URL: https://issues.apache.org/jira/browse/YARN-4130
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Kai Sasaki
Assignee: Kai Sasaki
Priority: Trivial


ApplicationId is declared double in {{RMAppManager}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734717#comment-14734717
 ] 

Hadoop QA commented on YARN-4110:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 59s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 11s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 10s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  91m 47s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754166/YARN-4110_1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 435f935 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9034/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9034/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9034/console |


This message was automatically generated.

> RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
> 
>
> Key: YARN-4110
> URL: https://issues.apache.org/jira/browse/YARN-4110
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4110_1.patch
>
>
> It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() 
> and equals() implementations. These state objects should override these 
> implementations.
> # For RMAppImpl, we can use of ApplicationId#hashcode and 
> ApplicationId#equals.
> # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and 
> ApplicationAttemptId#equals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-09-08 Thread Chang Li (JIRA)
Chang Li created YARN-4132:
--

 Summary: Nodemanagers should try harder to connect to the RM
 Key: YARN-4132
 URL: https://issues.apache.org/jira/browse/YARN-4132
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li


Being part of the cluster, nodemanagers should try very hard (and possibly 
never give up) to connect to a resourcemanager. Minimally we should have a 
separate config to set how aggressively a nodemanager will connect to the RM 
separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735452#comment-14735452
 ] 

Vrushali C commented on YARN-3901:
--

Thanks [~sjlee0] for the review!

I will correct the variable ordering for static and private members as well as 
making variables final.

bq. l.210: Strictly speaking, GenericObjectMapper will return an integer if the 
value fits within an integer; so it's not exactly a concern for min/max 
(timestamps) but for caution we might want to stay with Number instead of long
Comparisons are not allowed for Number datatype. 
{code} 
The operator < is undefined for the argument type(s) java.lang.Number, 
java.lang.Number
{code} 

So I would have to do something like {code} Number d = a.longValue() + 
b.longValue(); {code}  Do you think this is better? 

bq. l.52: Is the TimestampGenerator class going to be used outside 
FlowRunCoprocessor? If not, I would argue that we should make it an inner class 
of FlowRunCoprocessor. At least we should make it non-public to keep it within 
the package. If it would see general use outside this class, then it might be 
better to make it a true public class in the common package. I suspect a 
non-public class might be what we want here.
I am thinking I will need this when the flush/compaction scanner is added in. 
If you'd like, I can move it in as a non-public class for now and then move it 
out if needed. 

bq. It's up to you, but you could leave the row key improvement to YARN-4074. 
That might be easier to manage the changes between yours and mine. I'm 
restructuring all *RowKey classes uniformly.
I actually needed this in the unit test while checking the FlowActivityTable 
contents, if you want I can take it out and you can add that test case in when 
you add in the RowKey changes? 

bq. l.144: This would mean that some cell timestamps would have the unit of the 
milliseconds and others would be in nanoseconds. I'm a little bit concerned if 
we ever interpret these timestamps incorrectly. Could there be a more explicit 
way of clearly differentiating them? I don't have good suggestions at the 
moment.
Yeah, I was thinking about that too. Right now, metrics will get their own 
timestamps. For other columns, we'd be using the nanoseconds. I am trying to 
see if we can just use milliseconds.

bq. it might be good to have short comments on what each method is testing
I did try to make the unit test names themselves descriptive like 
testFlowActivityTable or testWriteFlowRunMinMaxToHBase or 
testWriteFlowRunMetricsOneFlow or testWriteFlowActivityOneFlow but I agree some 
more comments in the unit test will surely help. 

Will upload a new patch shortly, thanks! 


> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the 

[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735471#comment-14735471
 ] 

Hadoop QA commented on YARN-3635:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  8s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 52s | The applied patch generated  
14 new checkstyle issues (total was 236, now 242). |
| {color:red}-1{color} | whitespace |   0m  3s | The patch has 15  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  54m 13s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 49s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754296/YARN-3635.7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 970daaa |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9040/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9040/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9040/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9040/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9040/console |


This message was automatically generated.

> Get-queue-mapping should be a common interface of YarnScheduler
> ---
>
> Key: YARN-3635
> URL: https://issues.apache.org/jira/browse/YARN-3635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Tan, Wangda
> Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
> YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch, YARN-3635.7.patch
>
>
> Currently, both of fair/capacity scheduler support queue mapping, which makes 
> scheduler can change queue of an application after submitted to scheduler.
> One issue of doing this in specific scheduler is: If the queue after mapping 
> has different maximum_allocation/default-node-label-expression of the 
> original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
> the wrong queue.
> I propose to make the queue mapping as a common interface of scheduler, and 
> RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735384#comment-14735384
 ] 

Sangjin Lee commented on YARN-3901:
---

Thanks for the updated patch [~vrushalic]! I went over the new patch, and the 
following is the quick feedback. I'll also apply it with YARN-4074, and test it 
a little more.

(HBaseTimelineWriterImpl.java)
- l.141-155: the whole thing could be inside {{if (isApplication)...}}
- l.264: this null check is not needed

(FlowRunCoprocessor.java)
- l.52: Is the {{TimestampGenerator}} class going to be used outside 
{{FlowRunCoprocessor}}? If not, I would argue that we should make it an inner 
class of {{FlowRunCoprocessor}}. At least we should make it non-public to keep 
it within the package. If it would see general use outside this class, then it 
might be better to make it a true public class in the common package. I suspect 
a non-public class might be what we want here.
- l.52: let's make it final
- l.54: style nit: I think the common style is to place the static variables 
before instance variables
- Also, overall it seems we're using both the diamond operator (<>) and the old 
style generic declaration. It might be good to stick with one style (in which 
case the diamond operator might be better).
- l.144: This would mean that some cell timestamps would have the unit of the 
milliseconds and others would be in nanoseconds. I'm a little bit concerned if 
we ever interpret these timestamps incorrectly. Could there be a more explicit 
way of clearly differentiating them? I don't have good suggestions at the 
moment.

(FlowScanner.java)
- variable ordering
- l.210: Strictly speaking, {{GenericObjectMapper}} will return an integer if 
the value fits within an integer; so it's not exactly a concern for min/max 
(timestamps) but for caution we might want to stay with {{Number}} instead of 
long.

(TimestampGenerator.java)
- l.29: make it final
- variable ordering
- see above for the public/non-public comment

(FlowActivityRowKey.java)
- It's up to you, but you could leave the row key improvement to YARN-4074. 
That might be easier to manage the changes between yours and mine. I'm 
restructuring all *RowKey classes uniformly.

(TestHBaseTimelineWriterImplFlowRun.java)
- it might be good to have short comments on what each method is testing


> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-09-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735545#comment-14735545
 ] 

Sangjin Lee commented on YARN-4074:
---

It'd be great if you could take a look at the latest patch and let me know your 
feedback. Thanks!

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735448#comment-14735448
 ] 

zhihai xu commented on YARN-4096:
-

+1. Committing it in.

> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735497#comment-14735497
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8416 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8416/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.2
>
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735541#comment-14735541
 ] 

Sangjin Lee commented on YARN-4075:
---

Sorry [~varun_saxena], it took me a while to review this. The patch looks good 
for the most part.

FYI, I incorporated the XmlElement annotation for flow runs in 
{{FlowActivityEntity}} in YARN-4074. This change will be in the next patch 
(once I rebase with Vrushali's latest for YARN-3091). I also implemented the 
full {{compareTo()}} method already in the current patch for YARN-4074.


> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4075-YARN-2928.POC.1.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4126:
---
Attachment: 0003-YARN-4126.patch

Hi [~jianhe]

Attaching patch after testcase updation.
{{TestRMWebServicesDelegationTokens}} havnt corrected yet.
In nonsecure mode what should be the behaviour for 
{{RMWebServicesDelegationTokens}}.

Currently it will be {{500 Internal Error}}

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4096:

Hadoop Flags: Reviewed

> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735537#comment-14735537
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1095 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1095/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.2
>
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735533#comment-14735533
 ] 

Hadoop QA commented on YARN-4132:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 56s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 56s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 23s | The applied patch generated  3 
new checkstyle issues (total was 211, now 213). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 48s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   0m 22s | Tests failed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   6m 52s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  49m 56s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
|   | hadoop.yarn.server.nodemanager.TestNodeStatusUpdater |
|   | hadoop.yarn.server.nodemanager.TestNodeManagerShutdown |
|   | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754710/YARN-4132.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 970daaa |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9041/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9041/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9041/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9041/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9041/console |


This message was automatically generated.

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-09-08 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4132:
---
Attachment: YARN-4132.patch

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735490#comment-14735490
 ] 

zhihai xu commented on YARN-4096:
-

thanks Jason for the contribution! Committed it to branch-2.7.2, branch-2 and 
trunk.

> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-09-08 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4132:
---
Attachment: YARN-4132.2.patch

fixed broken test in TestYarnConfigurationFields. The other broken tests are 
not related to my changes(seem to be caused by network problem on testing 
platform). Those tests all pass on my .2 patch on my local machine.

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.2.patch, YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-09-08 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735683#comment-14735683
 ] 

Chang Li commented on YARN-4132:


[~jlowe] please help review the latest patch. Thanks!

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.2.patch, YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors

2015-09-08 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-2410:
--
Attachment: YARN-2410-v7.patch

Modified ShuffleHandler to not use channel attachments. Moved MockNetty code to 
a helper method.

> Nodemanager ShuffleHandler can possible exhaust file descriptors
> 
>
> Key: YARN-2410
> URL: https://issues.apache.org/jira/browse/YARN-2410
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nathan Roberts
>Assignee: Kuhu Shukla
> Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, 
> YARN-2410-v3.patch, YARN-2410-v4.patch, YARN-2410-v5.patch, 
> YARN-2410-v6.patch, YARN-2410-v7.patch
>
>
> The async nature of the shufflehandler can cause it to open a huge number of
> file descriptors, when it runs out it crashes.
> Scenario:
> Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
> Let's say all 6K reduces hit a node at about same time asking for their
> outputs. Each reducer will ask for all 40 map outputs over a single socket in 
> a
> single request (not necessarily all 40 at once, but with coalescing it is
> likely to be a large number).
> sendMapOutput() will open the file for random reading and then perform an 
> async transfer of the particular portion of this file(). This will 
> theoretically
> happen 6000*40=24 times which will run the NM out of file descriptors and 
> cause it to crash.
> The algorithm should be refactored a little to not open the fds until they're
> actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3901:
-
Attachment: YARN-3901-YARN-2928.5.patch


Uploading patch v5 that incorporates Sangjin's review suggestions. 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1651:
-
Attachment: YARN-1651-4.YARN-1197.patch

Thanks comments! [~mding].

bq. I only mention this because pullNewlyAllocatedContainers() has a check for 
null for the same logic, so I think we may want to make it consistent?
Yes you're correct, updated code, thanks.

bq. So, based on my understanding, if an application has reserved some resource 
for a container resource increase request on a node, that amount of resource 
should never be unreserved in order for the application to allocate a regular 
container on some other node. But that doesn't seem to be the case right now? 
Can you confirm?
Done, now added check to {{getNodeIdToUnreserve}}, will check if a container is 
a increase reservation before cancel it.

bq. I think it will be desirable to implement a pendingDecrease set in 
SchedulerApplicationAttempt, and corresponding logic, just like 
SchedulerApplicationAttempt.pendingRelease. This is to guard against the 
situation when decrease requests are received while RM is in the middle of 
recovery, and has not received all container statuses from NM yet.
I agree the general idea, and we should do the similar thing. However, I'm not 
sure caching in RM is a good idea, potentially a malicious AM can send millions 
of unknown-to-be-decreased-containers to RM when RM started. Maybe it's better 
to cache in AMRMClient side. I think we can do this in a separated JIRA? Could 
you file a new JIRA for this if you agree?

bq. Some nits...
Addressed.

Uploaded ver.4 patch.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735677#comment-14735677
 ] 

Hadoop QA commented on YARN-4132:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 15s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  1s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 52s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 20s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 55s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  56m 39s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754730/YARN-4132.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d9c1fab |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9043/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9043/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9043/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9043/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9043/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9043/console |


This message was automatically generated.

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.2.patch, YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735704#comment-14735704
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2307 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2307/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.2
>
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-08 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735676#comment-14735676
 ] 

MENG DING commented on YARN-1651:
-

Hi, [~leftnoteasy]
bq. I agree the general idea, and we should do the similar thing. However, I'm 
not sure caching in RM is a good idea, potentially a malicious AM can send 
millions of unknown-to-be-decreased-containers to RM when RM started. Maybe 
it's better to cache in AMRMClient side. I think we can do this in a separated 
JIRA? Could you file a new JIRA for this if you agree?

Your proposal makes sense. I will file a JIRA for this.

Thanks for addressing my comments. I don't have more comments for now. As per 
our discussion, I will come up with an end-to-end test based on 
distributedshell, and post onto this JIRA for review.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735672#comment-14735672
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #357 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/357/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* hadoop-yarn-project/CHANGES.txt


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.2
>
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-09-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3985:

Component/s: (was: fairscheduler)
 (was: capacityscheduler)

> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735714#comment-14735714
 ] 

Hadoop QA commented on YARN-4126:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 25s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 57s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  23m  2s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | yarn tests |  53m 35s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 120m 41s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754713/0003-YARN-4126.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 16b9037 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9042/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9042/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9042/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9042/console |


This message was automatically generated.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3337) Provide YARN chaos monkey

2015-09-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734800#comment-14734800
 ] 

Junping Du commented on YARN-3337:
--

I think there is one difficulty here: it looks like we didn't keep finished 
container info in RM scheduler info but only keep live containers info (in 
SchedulerApplicationAttempt). If no dead container info get preserved in RM, 
the new added API can only send kill container event but no way to know if 
container get killed actually (no way to differentiate a wrong container ID or 
an ID for finished container). CLI could be better as it can query running 
container list first, then kill it and wait container is not active. 
If we want exactly the same semantic as kill apps API, then we have to make RM 
to track info for dead containers which sounds too overkill to me as it force 
RM to track all containers for all applications (complexity become the same as 
MRv1).
May be a better trade-off here is: the semantic for forceKillContainer() only 
means to send kill containers events but not means container get killed or not. 
A boolean value response for forceKillContainer() indicate if we found a live 
container to kill or not. So we could lose Idempotent property for this API?

> Provide YARN chaos monkey
> -
>
> Key: YARN-3337
> URL: https://issues.apache.org/jira/browse/YARN-3337
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>
> To test failure resilience today you either need custom scripts or implement 
> Chaos Monkey-like logic in your application (SLIDER-202). 
> Killing AMs and containers on a schedule & probability is the core activity 
> here, one that could be handled by a CLI App/client lib that does this. 
> # entry point to have a startup delay before acting
> # frequency of chaos wakeup/polling
> # probability to AM failure generation (0-100)
> # probability of non-AM container kill
> # future: other operations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735786#comment-14735786
 ] 

Jian He commented on YARN-4126:
---

yes, oozie has fixed its own. This is just YARN side fix.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735766#comment-14735766
 ] 

Hadoop QA commented on YARN-2410:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 59s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 21s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 44s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   0m 19s | Tests passed in 
hadoop-mapreduce-client-shuffle. |
| | |  37m 47s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754746/YARN-2410-v7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d9c1fab |
| hadoop-mapreduce-client-shuffle test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9046/artifact/patchprocess/testrun_hadoop-mapreduce-client-shuffle.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9046/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9046/console |


This message was automatically generated.

> Nodemanager ShuffleHandler can possible exhaust file descriptors
> 
>
> Key: YARN-2410
> URL: https://issues.apache.org/jira/browse/YARN-2410
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nathan Roberts
>Assignee: Kuhu Shukla
> Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, 
> YARN-2410-v3.patch, YARN-2410-v4.patch, YARN-2410-v5.patch, 
> YARN-2410-v6.patch, YARN-2410-v7.patch
>
>
> The async nature of the shufflehandler can cause it to open a huge number of
> file descriptors, when it runs out it crashes.
> Scenario:
> Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
> Let's say all 6K reduces hit a node at about same time asking for their
> outputs. Each reducer will ask for all 40 map outputs over a single socket in 
> a
> single request (not necessarily all 40 at once, but with coalescing it is
> likely to be a large number).
> sendMapOutput() will open the file for random reading and then perform an 
> async transfer of the particular portion of this file(). This will 
> theoretically
> happen 6000*40=24 times which will run the NM out of file descriptors and 
> cause it to crash.
> The algorithm should be refactored a little to not open the fds until they're
> actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735791#comment-14735791
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2284 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2284/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.2
>
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735859#comment-14735859
 ] 

Hadoop QA commented on YARN-1651:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m  2s | Findbugs (version ) appears to 
be broken on YARN-1197. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 20 new or modified test files. |
| {color:red}-1{color} | javac |   8m 10s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |  10m 17s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 55s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |  31m  2s | The patch has 163  line(s) 
that end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 29s | The patch appears to introduce 7 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m 26s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | tools/hadoop tests |   0m 53s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |   6m 58s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  59m 24s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 154m 43s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-common |
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754736/YARN-1651-4.YARN-1197.patch
 |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | YARN-1197 / f86eae1 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/diffJavacWarnings.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/console |


This message was automatically generated.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735787#comment-14735787
 ] 

Jian He commented on YARN-4126:
---

yes, oozie has fixed its own. This is just YARN side fix.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4126:
--
Comment: was deleted

(was: yes, oozie has fixed its own. This is just YARN side fix.)

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735837#comment-14735837
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #345 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/345/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* hadoop-yarn-project/CHANGES.txt


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.2
>
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread zhihai xu (JIRA)
zhihai xu created YARN-4133:
---

 Summary: Containers to be preempted leaks in FairScheduler 
preemption logic.
 Key: YARN-4133
 URL: https://issues.apache.org/jira/browse/YARN-4133
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.1
Reporter: zhihai xu
Assignee: zhihai xu


Containers to be preempted leaks in FairScheduler preemption logic. It may 
cause missing preemption due to containers in {{warnedContainers}} wrongly 
removed. The problem is in {{preemptResources}}:
There are two issues which can cause containers  wrongly removed from 
{{warnedContainers}}:
Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
condition check:
{code}
(container.getState() == RMContainerState.RUNNING ||
  container.getState() == RMContainerState.ALLOCATED)
{code}
Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
shouldn't remove container from {{warnedContainers}}, We should only remove 
container from {{warnedContainers}}, if container is not in state 
{{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
{{RMContainerState.ACQUIRED}}.
{code}
  if ((container.getState() == RMContainerState.RUNNING ||
  container.getState() == RMContainerState.ALLOCATED) &&
  isResourceGreaterThanNone(toPreempt)) {
warnOrKillContainer(container);
Resources.subtractFrom(toPreempt, 
container.getContainer().getResource());
  } else {
warnedIter.remove();
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors

2015-09-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734931#comment-14734931
 ] 

Jason Lowe commented on YARN-2410:
--

Thanks for updating the patch!

bq. The only reason was findbugs which does not allow more than 7 parameters in 
a function call
Normally a builder pattern is used to make the code more readable in those 
situations.  However I don't think we need more than 7.  ReduceContext really 
only needs mapIds, reduceId, channelCtx, user, infoMap, and outputBasePathStr.  
The other two parameters are either known to be zero (should not be passed) and 
can be derived from another (size of mapIds).  As such we don't need 
SendMapOutputParams.

bq. The reduceContext is a variable holds the value set by the setAttachment() 
method and is used by the getAttachment() answer. If I declare it in the test 
method, it needs be final which cannot be done due to it being used by the 
setter.
createMockChannel can simply have a ReduceContext parameter, marked final, and 
that should solve that problem.  But I thought we were getting rid of the use 
of channel attachments and just associating the context with the listener 
directly?

Related to the last comment, we're still using channel attachments.  sendMap 
can just take a ReduceContext parameter, and the listener can provide its 
context when calling it.  No need for channel attachments.

This can NPE since we're checking for null after we already use it:
{noformat}
+nextMap = sendMapOutput(
+reduceContext.getSendMapOutputParams().getCtx(),
+reduceContext.getSendMapOutputParams().getCtx().getChannel(),
+reduceContext.getSendMapOutputParams().getUser(), mapId,
+reduceContext.getSendMapOutputParams().getReduceId(), info);
+nextMap.addListener(new ReduceMapFileCount(reduceContext));
+if (null == nextMap) {
{noformat}

maxSendMapCount should be cached during serviceInit like the other conf-derived 
settings so we aren't doing conf lookups on every shuffle.

The indentation in sendMap isn't correct, as code is indented after a 
conditional block at the same level as the contents of the conditional block.  
There's other places that are over-indented.

MockShuffleHandler only needs to override one thing, getShuffle, but the mock 
that method returns has to override a bunch of stuff.  It makes more sense to 
create a separate class for the mocked Shuffle than the mocked ShuffleHandler.

Should the mock Future stuff be part of creating the mocked channel?  Can 
simply pass the listener list to use as an arg to the method that mocks up the 
channel.

Nit: SHUFFLE_MAX_SEND_COUNT should probably be something like 
SHUFFLE_MAX_SESSION_OPEN_FILES to better match the property name.  Similarly 
maxSendMapCount could have a more appropriate name.

Nit: Format for 80 columns

Nit: There's still instances where we have a class definition immediately after 
variable definitions and a lack of whitespace between classes and methods or 
between methods. Whitespace would help readability in those places.

> Nodemanager ShuffleHandler can possible exhaust file descriptors
> 
>
> Key: YARN-2410
> URL: https://issues.apache.org/jira/browse/YARN-2410
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nathan Roberts
>Assignee: Kuhu Shukla
> Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, 
> YARN-2410-v3.patch, YARN-2410-v4.patch, YARN-2410-v5.patch, YARN-2410-v6.patch
>
>
> The async nature of the shufflehandler can cause it to open a huge number of
> file descriptors, when it runs out it crashes.
> Scenario:
> Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
> Let's say all 6K reduces hit a node at about same time asking for their
> outputs. Each reducer will ask for all 40 map outputs over a single socket in 
> a
> single request (not necessarily all 40 at once, but with coalescing it is
> likely to be a large number).
> sendMapOutput() will open the file for random reading and then perform an 
> async transfer of the particular portion of this file(). This will 
> theoretically
> happen 6000*40=24 times which will run the NM out of file descriptors and 
> cause it to crash.
> The algorithm should be refactored a little to not open the fds until they're
> actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: YARN-3943.000.patch

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: (was: YARN-3943.000.patch)

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4131:
-
Attachment: YARN-4131-demo.patch

Attach a demo patch, more test work is still needed.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3999) RM hangs on draining events

2015-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3999:
--
Labels: 2.6.1-candidate  (was: )

Adding to 2.6.1 from Jian's comment in the mailing list that I missed before.

> RM hangs on draining events
> ---
>
> Key: YARN-3999
> URL: https://issues.apache.org/jira/browse/YARN-3999
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
>  Labels: 2.6.1-candidate
> Fix For: 2.7.2
>
> Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, 
> YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, 
> YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch
>
>
> If external systems like ATS, or ZK becomes very slow, draining all the 
> events take a lot of time. If this time becomes larger than 10 mins, all 
> applications will expire. Fixes include:
> 1. add a timeout and stop the dispatcher even if not all events are drained.
> 2. Move ATS service out from RM active service so that RM doesn't need to 
> wait for ATS to flush the events when transitioning to standby.
> 3. Stop client-facing services (ClientRMService etc.) first so that clients 
> get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4120) FSAppAttempt.getResourceUsage() should not take preemptedResource into account

2015-09-08 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735952#comment-14735952
 ] 

Xianyin Xin commented on YARN-4120:
---

Hi [~kasha], there's another issue in the current preemption logic, it's in 
{{FSParentQueue.java}} and {{FSLeafQueue.java}},
{code}
  public RMContainer preemptContainer() {
RMContainer toBePreempted = null;

// Find the childQueue which is most over fair share
FSQueue candidateQueue = null;
Comparator comparator = policy.getComparator();

readLock.lock();
try {
  for (FSQueue queue : childQueues) {
if (candidateQueue == null ||
comparator.compare(queue, candidateQueue) > 0) {
  candidateQueue = queue;
}
  }
} finally {
  readLock.unlock();
}

// Let the selected queue choose which of its container to preempt
if (candidateQueue != null) {
  toBePreempted = candidateQueue.preemptContainer();
}
return toBePreempted;
  }
{code}
{code}
  public RMContainer preemptContainer() {
RMContainer toBePreempted = null;

// If this queue is not over its fair share, reject
if (!preemptContainerPreCheck()) {
  return toBePreempted;
}
{code}
If the queue's hierarchy like that in the *Description*, suppose queue1 and 
queue2 have the same weight, and the cluster has 8 containers, 4 occupied by 
queue1.1 and 4 occupied by queue2. If new app was added in queue1.2, 2 
containers should be preempted from queue1.1. However, according the above 
code, queue1 and queue2 are both at their fairshare, so the preemption will not 
happen.

So if all of the childqueues at any level are at their fairshare, preemption 
will not happen even though there is/are resource deficit in some leafqueues.

I think we have to drop this logic in this case. As a candidate, we can 
calculates an ideal preemption distribution by traversing the queues. Any 
thoughts?

> FSAppAttempt.getResourceUsage() should not take preemptedResource into account
> --
>
> Key: YARN-4120
> URL: https://issues.apache.org/jira/browse/YARN-4120
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Xianyin Xin
>
> When compute resource usage for Schedulables, the following code is envolved,
> {{FSAppAttempt.getResourceUsage}},
> {code}
> public Resource getResourceUsage() {
>   return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
> }
> {code}
> and this value is aggregated to FSLeafQueues and FSParentQueues. In my 
> opinion, taking {{preemptedResource}} into account here is not reasonable, 
> there are two main reasons,
> # it is something in future, i.e., even though these resources are marked as 
> preempted, it is currently used by app, and these resources will be 
> subtracted from {{currentCosumption}} once the preemption is finished. it's 
> not reasonable to make arrange for it ahead of time. 
> # there's another problem here, consider following case,
> {code}
> root
>/\
>   queue1   queue2
>   /\
> queue1.3, queue1.4
> {code}
> suppose queue1.3 need resource and it can preempt resources from queue1.4, 
> the preemption happens in the interior of queue1. But when compute resource 
> usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - 
> preemption}} according to the current code, which is unfair to queue2 when 
> doing resource allocating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735987#comment-14735987
 ] 

Xianyin Xin commented on YARN-4133:
---

Of course we can also address these problems one by one in different jiras. If 
you like this, just kindly ignore the above comment.

> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}. We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}
> Also once the containers in {{warnedContainers}} are wrongly removed, it will 
> never be preempted. Because these containers are already in 
> {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't 
> return the containers in {{FSAppAttempt#preemptionMap}}.
> {code}
>   public RMContainer preemptContainer() {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("App " + getName() + " is going to preempt a running " +
>   "container");
> }
> RMContainer toBePreempted = null;
> for (RMContainer container : getLiveContainers()) {
>   if (!getPreemptionContainers().contains(container) &&
>   (toBePreempted == null ||
>   comparator.compare(toBePreempted, container) > 0)) {
> toBePreempted = container;
>   }
> }
> return toBePreempted;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4126:
---
Attachment: 0004-YARN-4126.patch

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch, 0004-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4086) Allow Aggregated Log readers to handle HAR files

2015-09-08 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-4086:

Attachment: YARN-4086.002.patch

The 002 patch makes that test less brittle.  I also fixed the RAT and 
checkstyle warnings.  The test failure was because test-patch couldn't handle 
the binary part of the patch.

> Allow Aggregated Log readers to handle HAR files
> 
>
> Key: YARN-4086
> URL: https://issues.apache.org/jira/browse/YARN-4086
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-4086.001.patch, YARN-4086.002.patch
>
>
> This is for the YARN changes for MAPREDUCE-6415.  It allows the yarn CLI and 
> web UIs to read aggregated logs from HAR files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735887#comment-14735887
 ] 

Joep Rottinghuis commented on YARN-3901:


Thanks [~vrushalic]. I'm going to dig through the details on the latest patch.
Separately [~sjlee0] and I further discussed the challenges of taking the 
timestamp on the coprocessor, buffering writes, app restarts, timestamp 
collisions and ordering of various writes that come on.

1) Given that we have timestamps in # millis, then multiplying by 1,000 should 
suffice. It is unlikely that we'd have > 1M writes for one column in one region 
server for one flow. If we multiply by 1M we get close to the total date range 
that can fit in a long (still years to come, but still).

2) If we do any shifting of time, we should do the same everywhere to keep 
things consistent, and to keep the ability to ask what a particular row 
(roughly) looked like at any particular time (like last night midnight, what 
was the state of this entire row).

3) We think in the column helper, if the ATS client supplies a timestamp, we 
should multiply by 1,000. If we read any timestamp from HBase, we'll divide by 
1,000.

4) If the ATS client doesn't supply the timestamp, we'll grab the timestamp in 
the ats writer the moment the write arrives (and before it is batched / 
buffered in the buffered mutator, HBase client, or RS queue). We then take this 
time and multiply by 1,000. Reads again divide by 1,000 to get back to millis 
in epoch as before.

5) For Agg operation SUM, MIN, and MAX we take the least significant 3 digits 
of the app_id and add this to the (timestamp*1000), so that we create a unique 
timestamp per app in an active flow-run. This should avoid any collisions.
This takes care of uniqueness (no collisions on a single ms), but also solves 
for older instances of a writer (in case of a second AM attempt for example) or 
any other kind of ordering issue. The write are timestamped when they arrive at 
the writer.

6) If some piece of client code doesn't set any timestamp (this should be an 
error) then we cannot effectively order the writes as per the previous point. 
We still need to ensure that we don't have collisions. If the client supplied 
timestamp if LONG.Maxvalue, then we can generate the timestamp in the 
coprocessor on the servers side, modulo the counter to ensure uniqueness. We 
should still multiply by 1K to make the same amount of space for the unique 
counter.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735900#comment-14735900
 ] 

Joep Rottinghuis commented on YARN-3901:


The one remaining issue we have to tackle is when there are two app attempts. 
The previous app attempt ends up buffering some writes, and the new app attempt 
ends up writing a final_value.
Now if the flush happens before the first attempt its write comes in, we no 
longer have the unaggregated value for that app_id in order to discard against 
(the timestamp should have taken care of this order).
We can deal with this issue in three ways:
1) Ignore (risky and very hard to debug if it ever happens)
2) Keep the final value around until it has aged a certain time. Upside is that 
the value is initially kept (for for example 1-2 days?) and then later 
discarded. Downside is that we won't collapse values as quickly on flush as we 
can. The collapse would probably happen when a compaction happens, possibly 
only when a major compaction happens. But previous unaggregated values may have 
been written to disk anyway, so not sure how much of an issue this really is.
3) keep a list of the last x app_ids (aggregation compaction dimension values) 
on the aggregated flow-level data. What we would then do in the aggregator is 
to go through all the values as we currently do. We'd collapse all the values 
to keep only the latest per flow. Before we sum an item for the flow, we'd 
compare if the app_id was in the list of most recent x (10) apps that were 
completed and collapsed. 
Pro is that with a lower app completion rate in a flow, we'd be guarded against 
stale writes for longer than a fixed time period. We'd still limit the size of 
extra storage in tags to a list of x (10?) items.
Downside is that if apps complete in very rapid succession, we would 
potentially be protected from stale writes from an app for a shorter period of 
time. Given that there is a correlation between an app completion and its 
previous run, this may not be a huge factor. It's not like random previous app 
attempts are launched. This is really to cover the case when a new app attempt 
is launched, but the previous writer had some buffered writes that somehow 
still got through.

I'm sort of tempted towards 2, since that is the most similar to the existing 
TTL functionality, and probably the easiest to code and understand. Simply 
compact only after a certain time period has passed.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4086) Allow Aggregated Log readers to handle HAR files

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735947#comment-14735947
 ] 

Hadoop QA commented on YARN-4086:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 55s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  2s | Tests passed in 
hadoop-yarn-common. |
| | |  51m  4s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754773/YARN-4086.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d9c1fab |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9048/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9048/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9048/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9048/console |


This message was automatically generated.

> Allow Aggregated Log readers to handle HAR files
> 
>
> Key: YARN-4086
> URL: https://issues.apache.org/jira/browse/YARN-4086
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-4086.001.patch, YARN-4086.002.patch
>
>
> This is for the YARN changes for MAPREDUCE-6415.  It allows the yarn CLI and 
> web UIs to read aggregated logs from HAR files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4133:

Attachment: YARN-4133.000.patch

> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}, We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735959#comment-14735959
 ] 

Xianyin Xin commented on YARN-4133:
---

Hi [~zxu], it seems the current preemption logic has many problems. I just 
updated one in 
[https://issues.apache.org/jira/browse/YARN-4120?focusedCommentId=14735952=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14735952].
 I think a logic refactor is need, what do you think?

> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}, We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4133:

Description: 
Containers to be preempted leaks in FairScheduler preemption logic. It may 
cause missing preemption due to containers in {{warnedContainers}} wrongly 
removed. The problem is in {{preemptResources}}:
There are two issues which can cause containers  wrongly removed from 
{{warnedContainers}}:
Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
condition check:
{code}
(container.getState() == RMContainerState.RUNNING ||
  container.getState() == RMContainerState.ALLOCATED)
{code}
Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
shouldn't remove container from {{warnedContainers}}. We should only remove 
container from {{warnedContainers}}, if container is not in state 
{{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
{{RMContainerState.ACQUIRED}}.
{code}
  if ((container.getState() == RMContainerState.RUNNING ||
  container.getState() == RMContainerState.ALLOCATED) &&
  isResourceGreaterThanNone(toPreempt)) {
warnOrKillContainer(container);
Resources.subtractFrom(toPreempt, 
container.getContainer().getResource());
  } else {
warnedIter.remove();
  }
{code}
Also once the containers in {{warnedContainers}} are wrongly removed, it will 
never be preempted. Because these containers are already in 
{{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't 
return the containers in {{FSAppAttempt#preemptionMap}}.
{code}
  public RMContainer preemptContainer() {
if (LOG.isDebugEnabled()) {
  LOG.debug("App " + getName() + " is going to preempt a running " +
  "container");
}

RMContainer toBePreempted = null;
for (RMContainer container : getLiveContainers()) {
  if (!getPreemptionContainers().contains(container) &&
  (toBePreempted == null ||
  comparator.compare(toBePreempted, container) > 0)) {
toBePreempted = container;
  }
}
return toBePreempted;
  }
{code}

  was:
Containers to be preempted leaks in FairScheduler preemption logic. It may 
cause missing preemption due to containers in {{warnedContainers}} wrongly 
removed. The problem is in {{preemptResources}}:
There are two issues which can cause containers  wrongly removed from 
{{warnedContainers}}:
Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
condition check:
{code}
(container.getState() == RMContainerState.RUNNING ||
  container.getState() == RMContainerState.ALLOCATED)
{code}
Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
shouldn't remove container from {{warnedContainers}}, We should only remove 
container from {{warnedContainers}}, if container is not in state 
{{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
{{RMContainerState.ACQUIRED}}.
{code}
  if ((container.getState() == RMContainerState.RUNNING ||
  container.getState() == RMContainerState.ALLOCATED) &&
  isResourceGreaterThanNone(toPreempt)) {
warnOrKillContainer(container);
Resources.subtractFrom(toPreempt, 
container.getContainer().getResource());
  } else {
warnedIter.remove();
  }
{code}


> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}. We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> 

[jira] [Updated] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4131:
-
Attachment: YARN-4131-v1.patch

Update patch with following updates:
1. Add ContainerKilledType in KillContainerRequest to indicate container will 
be killed as preempted or expired (failed).
2. Add async call in YarnClient per Steve's above comments
3. Add more unit tests with fixing build failures.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, 
> YARN-4131-v1.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4106:
---
Attachment: 0006-YARN-4106.patch

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-08 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736025#comment-14736025
 ] 

Bibin A Chundatt commented on YARN-4106:


Hi [~leftnoteasy]
Thnks for comments. Updates patch uploaded

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736052#comment-14736052
 ] 

Hadoop QA commented on YARN-4133:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 41s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 25s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  54m 10s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m  6s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754780/YARN-4133.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d9c1fab |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9049/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9049/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9049/console |


This message was automatically generated.

> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}. We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}
> Also once the containers in {{warnedContainers}} are wrongly removed, it will 
> never be preempted. Because these containers are already in 
> {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't 
> return the containers in {{FSAppAttempt#preemptionMap}}.
> {code}
>   public RMContainer preemptContainer() {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("App " + getName() + " is going to preempt a running " +
>   "container");
> }
> RMContainer toBePreempted = null;
> for (RMContainer container : getLiveContainers()) {
>   if 

[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736112#comment-14736112
 ] 

Hadoop QA commented on YARN-4106:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  4s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 15s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   7m 36s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  46m 34s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-nodemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754800/0006-YARN-4106.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a153b96 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9051/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9051/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9051/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9051/console |


This message was automatically generated.

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4134) FairScheduler preemption stops at queue level that all child queues are not over their fairshare

2015-09-08 Thread Xianyin Xin (JIRA)
Xianyin Xin created YARN-4134:
-

 Summary: FairScheduler preemption stops at queue level that all 
child queues are not over their fairshare
 Key: YARN-4134
 URL: https://issues.apache.org/jira/browse/YARN-4134
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Xianyin Xin


Now FairScheudler uses a choose-a-candidate method to select a container from 
leaf queues that to be preempted, in {{FSParentQueue.preemptContainer()}},
{code}
readLock.lock();
try {
  for (FSQueue queue : childQueues) {
if (candidateQueue == null ||
comparator.compare(queue, candidateQueue) > 0) {
  candidateQueue = queue;
}
  }
} finally {
  readLock.unlock();
}

// Let the selected queue choose which of its container to preempt
if (candidateQueue != null) {
  toBePreempted = candidateQueue.preemptContainer();
}
{code}
a candidate child queue is selected. However, if the queue's usage isn't over 
it's fairshare, preemption will not happen:
{code}
if (!preemptContainerPreCheck()) {
  return toBePreempted;
}
{code}
 A scenario:
{code}
root
   /\
  queue1   queue2
  /\
queue1.3, (  queue1.4  )
{code}
suppose there're 8 containers, and queues at any level have the same weight. 
queue1.3 takes 4 and queue2 takes 4, so both queue1 and queue2 are at their 
fairshare. Now we submit an app in queue1.4 with 4 containers needs, it should 
preempt 2 from queue1.3, but the candidate-containers selection procedure will 
stop at level that all of the child queues are not over their fairshare, and 
none of the containers will be preempted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >