[jira] [Commented] (YARN-4119) Expose the NM bind address as an env, so that AM can make use of it for exposing tracking URL
[ https://issues.apache.org/jira/browse/YARN-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734333#comment-14734333 ] Naganarasimha G R commented on YARN-4119: - Synced up offline with [~varun_saxena] , When i modified the description of MAPREDUCE-5938, was not able to find the Jira with this issue (MAPREDUCE-6402). Anyway as i have started working on it, i will continue to finish it. Thanks [~varun_saxena]. > Expose the NM bind address as an env, so that AM can make use of it for > exposing tracking URL > -- > > Key: YARN-4119 > URL: https://issues.apache.org/jira/browse/YARN-4119 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > > As described in MAPREDUCE-5938, In many security scanning tools its not > advisable to bind on all network addresses and would be good to bind only on > the desired address. As AM's can run on any of the nodes it would be better > for NM to share its bind address as part of Environment variables to the > container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.
[ https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734514#comment-14734514 ] Hudson commented on YARN-4121: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1092 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1092/]) YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md * hadoop-yarn-project/CHANGES.txt > Typos in capacity scheduler documentation. > -- > > Key: YARN-4121 > URL: https://issues.apache.org/jira/browse/YARN-4121 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-4121.00.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.
[ https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734369#comment-14734369 ] Hudson commented on YARN-4121: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #354 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/354/]) YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md * hadoop-yarn-project/CHANGES.txt > Typos in capacity scheduler documentation. > -- > > Key: YARN-4121 > URL: https://issues.apache.org/jira/browse/YARN-4121 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-4121.00.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] forrestchen updated YARN-4022: -- Fix Version/s: (was: 2.7.1) > queue not remove from webpage(/cluster/scheduler) when delete queue in > xxx-scheduler.xml > > > Key: YARN-4022 > URL: https://issues.apache.org/jira/browse/YARN-4022 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: forrestchen > > When I delete an existing queue by modify the xxx-schedule.xml, I can still > see the queue information block in webpage(/cluster/scheduler) though the > 'Min Resources' items all become to zero and have no item of 'Max Running > Applications'. > I can still submit an application to the deleted queue and the application > will run using 'root.default' queue instead, but submit to an un-exist queue > will cause an exception. > My expectation is the deleted queue will not displayed in webpage and submit > application to the deleted queue will act just like the queue doesn't exist. > PS: There's no application running in the queue I delete. > Some related config in yarn-site.xml: > {code} > > yarn.scheduler.fair.user-as-default-queue > false > > > yarn.scheduler.fair.allow-undeclared-pools > false > > {code} > a related question is here: > http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] forrestchen updated YARN-4022: -- Affects Version/s: (was: 2.7.1) > queue not remove from webpage(/cluster/scheduler) when delete queue in > xxx-scheduler.xml > > > Key: YARN-4022 > URL: https://issues.apache.org/jira/browse/YARN-4022 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: forrestchen > > When I delete an existing queue by modify the xxx-schedule.xml, I can still > see the queue information block in webpage(/cluster/scheduler) though the > 'Min Resources' items all become to zero and have no item of 'Max Running > Applications'. > I can still submit an application to the deleted queue and the application > will run using 'root.default' queue instead, but submit to an un-exist queue > will cause an exception. > My expectation is the deleted queue will not displayed in webpage and submit > application to the deleted queue will act just like the queue doesn't exist. > PS: There's no application running in the queue I delete. > Some related config in yarn-site.xml: > {code} > > yarn.scheduler.fair.user-as-default-queue > false > > > yarn.scheduler.fair.allow-undeclared-pools > false > > {code} > a related question is here: > http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode
Jian He created YARN-4127: - Summary: RM fail with noAuth error if switched from non-failover mode to failover mode Key: YARN-4127 URL: https://issues.apache.org/jira/browse/YARN-4127 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He The scenario is that RM failover was initially enabled, so the zkRootNodeAcl is by default set with the *RM ID* in the ACL string If RM failover is then switched to be disabled, it cannot load data from ZK and fail with noAuth error. After I reset the root node ACL, it again can access. {code} 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to load/recover state org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) at org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) at org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) at org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) at org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194) {code} the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to connect with ZK and thus fail with no Auth error. We should be able to switch failover on and off with no interruption to the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4081) Add support for multiple resource types in the Resource class
[ https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4081: Attachment: YARN-4081-YARN-3926.007.patch Fix the whitespace issue and addressed some checkstyle issues. > Add support for multiple resource types in the Resource class > - > > Key: YARN-4081 > URL: https://issues.apache.org/jira/browse/YARN-4081 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4081-YARN-3926.001.patch, > YARN-4081-YARN-3926.002.patch, YARN-4081-YARN-3926.003.patch, > YARN-4081-YARN-3926.004.patch, YARN-4081-YARN-3926.005.patch, > YARN-4081-YARN-3926.006.patch, YARN-4081-YARN-3926.007.patch > > > For adding support for multiple resource types, we need to add support for > this in the Resource class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4128) Correct logs in capacity scheduler while printing priority is acceptable for a queue
[ https://issues.apache.org/jira/browse/YARN-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena resolved YARN-4128. Resolution: Not A Problem Did not have latest code. This has been fixed in YARN-3970 which has been recently committed. So closing it. > Correct logs in capacity scheduler while printing priority is acceptable for > a queue > > > Key: YARN-4128 > URL: https://issues.apache.org/jira/browse/YARN-4128 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Priority: Trivial > > Spaces are missing between queuename and "for" and application id and "for". > {noformat} > [IPC Server handler 0 on 33140]: INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Priority '0' is acceptable in queue :varunqfor > application:application_1441653547287_0003for the user: varun > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734299#comment-14734299 ] Hudson commented on YARN-2019: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #341 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/341/]) YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java > Retrospect on decision of making RM crashed if any exception throw in > ZKRMStateStore > > > Key: YARN-2019 > URL: https://issues.apache.org/jira/browse/YARN-2019 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Assignee: Jian He >Priority: Critical > Labels: ha > Fix For: 2.8.0, 2.7.2, 2.6.2 > > Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch > > > Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal > exception to crash RM down. As shown in YARN-1924, it could due to RM HA > internal bug itself, but not fatal exception. We should retrospect some > decision here as HA feature is designed to protect key component but not > disturb it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734298#comment-14734298 ] Hudson commented on YARN-2884: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #341 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/341/]) YARN-2884. Added a proxy service in NM to proxy the the communication between AM and RM. Contributed by Kishore Chaliparambil (jianhe: rev 6f72f1e6003ab11679bebeb96f27f1f62b3b3e02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestAMRMProxyService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/DefaultRequestInterceptor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AbstractRequestInterceptor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/PassThroughRequestInterceptor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockRequestInterceptor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyTokenSecretManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerSecurityUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/RequestInterceptor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyService.java > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Fix For: 2.8.0 > > Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, > YARN-2884-V11.patch, YARN-2884-V12.patch, YARN-2884-V13.patch, > YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, > YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, > YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Followup fixes after YARN-2019 regarding RM behavior when state-store error occurs
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734300#comment-14734300 ] Hudson commented on YARN-4087: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #341 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/341/]) YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java > Followup fixes after YARN-2019 regarding RM behavior when state-store error > occurs > -- > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.2, 2.6.2 > > Attachments: YARN-4087-branch-2.6.patch, YARN-4087.1.patch, > YARN-4087.2.patch, YARN-4087.3.patch, YARN-4087.5.patch, YARN-4087.6.patch, > YARN-4087.7.patch > > > Several fixes: > 1. Set YARN_FAIL_FAST to be false by default, since this makes more sense in > production environment. > 2. If HA is enabled and if there's any state-store error, after the retry > operation failed, we always transition RM to standby state. Otherwise, we > may see two active RMs running. YARN-4107 is one example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.
[ https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734343#comment-14734343 ] Hudson commented on YARN-4121: -- FAILURE: Integrated in Hadoop-trunk-Commit #8413 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8413/]) YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md > Typos in capacity scheduler documentation. > -- > > Key: YARN-4121 > URL: https://issues.apache.org/jira/browse/YARN-4121 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-4121.00.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734399#comment-14734399 ] Hadoop QA commented on YARN-4126: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 5s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 9s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 52s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 31s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 52m 57s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 92m 38s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens | | | hadoop.yarn.server.resourcemanager.TestClientRMService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754582/0002-YARN-4126.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6f72f1e | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9028/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9028/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9028/console | This message was automatically generated. > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4126: --- Attachment: 0001-YARN-4126.patch Uploading patch for the same if you have started working on it please do reassign. > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4126: --- Attachment: 0002-YARN-4126.patch > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.
[ https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734437#comment-14734437 ] Hudson commented on YARN-4121: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #361 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/361/]) YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md > Typos in capacity scheduler documentation. > -- > > Key: YARN-4121 > URL: https://issues.apache.org/jira/browse/YARN-4121 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-4121.00.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734292#comment-14734292 ] Naganarasimha G R commented on YARN-4126: - Yes [~bibinchundatt] you are right, check is there but else case should return false, Missed to see this !. > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt reassigned YARN-4126: -- Assignee: Bibin A Chundatt > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3943: Attachment: YARN-3943.000.patch > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events
Naganarasimha G R created YARN-4129: --- Summary: Refactor the SystemMetricPublisher in RM to better support newer events Key: YARN-4129 URL: https://issues.apache.org/jira/browse/YARN-4129 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Currently to add new timeline event/ entity in RM side, one has to add a method in publisher and a method in handler and create a new event class which looks cumbersome and redundant. also further all the events might not be required to be published in V1 & V2. So adopting the approach similar to what was adopted in YARN-3045(NM side) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4128) Correct logs in capacity scheduler while printing priority is acceptable for a queue
Varun Saxena created YARN-4128: -- Summary: Correct logs in capacity scheduler while printing priority is acceptable for a queue Key: YARN-4128 URL: https://issues.apache.org/jira/browse/YARN-4128 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Saxena Priority: Trivial Spaces are missing between queuename and "for" and application id and "for". {noformat} [IPC Server handler 0 on 33140]: INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Priority '0' is acceptable in queue :varunqfor application:application_1441653547287_0003for the user: varun {noformat} Relevant log in CapacityScheduler#checkAndGetApplicationPriority {code} LOG.info("Priority '" + appPriority.getPriority() + "' is acceptable in queue : " + queueName + " for application: " + applicationId + " for the user: " + user); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4128) Correct logs in capacity scheduler while printing priority is acceptable for a queue
[ https://issues.apache.org/jira/browse/YARN-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4128: --- Description: Spaces are missing between queuename and "for" and application id and "for". {noformat} [IPC Server handler 0 on 33140]: INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Priority '0' is acceptable in queue :varunqfor application:application_1441653547287_0003for the user: varun {noformat} was: Spaces are missing between queuename and "for" and application id and "for". {noformat} [IPC Server handler 0 on 33140]: INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Priority '0' is acceptable in queue :varunqfor application:application_1441653547287_0003for the user: varun {noformat} Relevant log in CapacityScheduler#checkAndGetApplicationPriority {code} LOG.info("Priority '" + appPriority.getPriority() + "' is acceptable in queue : " + queueName + " for application: " + applicationId + " for the user: " + user); {code} > Correct logs in capacity scheduler while printing priority is acceptable for > a queue > > > Key: YARN-4128 > URL: https://issues.apache.org/jira/browse/YARN-4128 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Priority: Trivial > > Spaces are missing between queuename and "for" and application id and "for". > {noformat} > [IPC Server handler 0 on 33140]: INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Priority '0' is acceptable in queue :varunqfor > application:application_1441653547287_0003for the user: varun > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events
[ https://issues.apache.org/jira/browse/YARN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4129: Attachment: YARN-4129.YARN-2928.001.patch Hi [~djp] & [~sjlee0], Similar to the approach followed in NM side (YARN-3045), i have modified for RM side too. Have modified the approach as follows * Extract all the public methods of SystemMetricPublisher to a interface and kept the interface name same SystemMetricsPublisher (or can modify to any other if suggested) * create 2 implementations of ithe interface, one for V1 and one for V2, thus if some events need not be handled then particular version implementation can just ignore and return back * In the specific implementation, if required to publish an timeline event/entity, they can create it and set to a common async event and give it to Asyncdispatcher to dispatch. * specific handlers are created for V1 & V2 to publish the events I am attaching a patch for this issue. Please review. > Refactor the SystemMetricPublisher in RM to better support newer events > --- > > Key: YARN-4129 > URL: https://issues.apache.org/jira/browse/YARN-4129 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-4129.YARN-2928.001.patch > > > Currently to add new timeline event/ entity in RM side, one has to add a > method in publisher and a method in handler and create a new event class > which looks cumbersome and redundant. also further all the events might not > be required to be published in V1 & V2. So adopting the approach similar to > what was adopted in YARN-3045(NM side) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734407#comment-14734407 ] Hudson commented on YARN-2884: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2303 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2303/]) YARN-2884. Added a proxy service in NM to proxy the the communication between AM and RM. Contributed by Kishore Chaliparambil (jianhe: rev 6f72f1e6003ab11679bebeb96f27f1f62b3b3e02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AbstractRequestInterceptor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerSecurityUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/DefaultRequestInterceptor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestAMRMProxyService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/RequestInterceptor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/PassThroughRequestInterceptor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockRequestInterceptor.java > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Fix For: 2.8.0 > > Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, > YARN-2884-V11.patch, YARN-2884-V12.patch, YARN-2884-V13.patch, > YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, > YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, > YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] forrestchen updated YARN-4022: -- Labels: (was: YARN patch) > queue not remove from webpage(/cluster/scheduler) when delete queue in > xxx-scheduler.xml > > > Key: YARN-4022 > URL: https://issues.apache.org/jira/browse/YARN-4022 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: forrestchen > > When I delete an existing queue by modify the xxx-schedule.xml, I can still > see the queue information block in webpage(/cluster/scheduler) though the > 'Min Resources' items all become to zero and have no item of 'Max Running > Applications'. > I can still submit an application to the deleted queue and the application > will run using 'root.default' queue instead, but submit to an un-exist queue > will cause an exception. > My expectation is the deleted queue will not displayed in webpage and submit > application to the deleted queue will act just like the queue doesn't exist. > PS: There's no application running in the queue I delete. > Some related config in yarn-site.xml: > {code} > > yarn.scheduler.fair.user-as-default-queue > false > > > yarn.scheduler.fair.allow-undeclared-pools > false > > {code} > a related question is here: > http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734303#comment-14734303 ] Hudson commented on YARN-2884: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #353 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/353/]) YARN-2884. Added a proxy service in NM to proxy the the communication between AM and RM. Contributed by Kishore Chaliparambil (jianhe: rev 6f72f1e6003ab11679bebeb96f27f1f62b3b3e02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerSecurityUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestAMRMProxyService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/PassThroughRequestInterceptor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockRequestInterceptor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/RequestInterceptor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AbstractRequestInterceptor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/DefaultRequestInterceptor.java > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Fix For: 2.8.0 > > Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, > YARN-2884-V11.patch, YARN-2884-V12.patch, YARN-2884-V13.patch, > YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, > YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, > YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode
[ https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-4127: -- Assignee: Varun Saxena > RM fail with noAuth error if switched from non-failover mode to failover mode > -- > > Key: YARN-4127 > URL: https://issues.apache.org/jira/browse/YARN-4127 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Varun Saxena > > The scenario is that RM failover was initially enabled, so the zkRootNodeAcl > is by default set with the *RM ID* in the ACL string > If RM failover is then switched to be disabled, it cannot load data from ZK > and fail with noAuth error. After I reset the root node ACL, it again can > access. > {code} > 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194) > {code} > the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to > connect with ZK and thus fail with no Auth error. > We should be able to switch failover on and off with no interruption to the > user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] forrestchen updated YARN-4022: -- Attachment: YARN-4022.001.patch > queue not remove from webpage(/cluster/scheduler) when delete queue in > xxx-scheduler.xml > > > Key: YARN-4022 > URL: https://issues.apache.org/jira/browse/YARN-4022 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: forrestchen > Attachments: YARN-4022.001.patch > > > When I delete an existing queue by modify the xxx-schedule.xml, I can still > see the queue information block in webpage(/cluster/scheduler) though the > 'Min Resources' items all become to zero and have no item of 'Max Running > Applications'. > I can still submit an application to the deleted queue and the application > will run using 'root.default' queue instead, but submit to an un-exist queue > will cause an exception. > My expectation is the deleted queue will not displayed in webpage and submit > application to the deleted queue will act just like the queue doesn't exist. > PS: There's no application running in the queue I delete. > Some related config in yarn-site.xml: > {code} > > yarn.scheduler.fair.user-as-default-queue > false > > > yarn.scheduler.fair.allow-undeclared-pools > false > > {code} > a related question is here: > http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class
[ https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734477#comment-14734477 ] Hadoop QA commented on YARN-4081: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 13s | Pre-patch YARN-3926 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 4s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 56s | The applied patch generated 1 new checkstyle issues (total was 10, now 3). | | {color:green}+1{color} | whitespace | 0m 20s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 54m 49s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 104m 2s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754588/YARN-4081-YARN-3926.007.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-3926 / 1dbd8e3 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9030/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9030/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9030/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9030/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9030/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9030/console | This message was automatically generated. > Add support for multiple resource types in the Resource class > - > > Key: YARN-4081 > URL: https://issues.apache.org/jira/browse/YARN-4081 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4081-YARN-3926.001.patch, > YARN-4081-YARN-3926.002.patch, YARN-4081-YARN-3926.003.patch, > YARN-4081-YARN-3926.004.patch, YARN-4081-YARN-3926.005.patch, > YARN-4081-YARN-3926.006.patch, YARN-4081-YARN-3926.007.patch > > > For adding support for multiple resource types, we need to add support for > this in the Resource class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734377#comment-14734377 ] Hadoop QA commented on YARN-3943: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 31s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 52s | The applied patch generated 2 additional warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 46s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 22s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 7m 43s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 56m 40s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754583/YARN-3943.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6f72f1e | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/9029/artifact/patchprocess/diffJavadocWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9029/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9029/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9029/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9029/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9029/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9029/console | This message was automatically generated. > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734529#comment-14734529 ] Bibin A Chundatt commented on YARN-4126: Hi [~jianhe] Is this change as expected by you? Any change required other than the above is required? Looks like testcases needs lot of correction. > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.
[ https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734597#comment-14734597 ] Hudson commented on YARN-4121: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #342 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/342/]) YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md > Typos in capacity scheduler documentation. > -- > > Key: YARN-4121 > URL: https://issues.apache.org/jira/browse/YARN-4121 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-4121.00.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3337) Provide YARN chaos monkey
[ https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734625#comment-14734625 ] Steve Loughran commented on YARN-3337: -- # For slider we have what we need: the API calls are in the AM and its config files # What I do need is something generic for other apps, with Spark the one I'm currently looking at # Robert's SSH-in strategy is OK for local-VM systems where I have the SSH key and can automated it; I remember doing something similar to test HA NNs in Hadoop 1.x. What SSH does well is that you can then issue a {{kill -19}} to suspend a process —and so test liveness monitoring. What I can't do with his code is # run tests against clusters that I don't have SSH keys for (possibly including the jenkins builds) # test on windows # have some re-usable tests which I can get into ASF code for anyone to use. API wise, force-kill-container would be enough; while my JUnit tests wouldn't need a CLI, test runners in different languages might > Provide YARN chaos monkey > - > > Key: YARN-3337 > URL: https://issues.apache.org/jira/browse/YARN-3337 > Project: Hadoop YARN > Issue Type: New Feature > Components: test >Affects Versions: 2.7.0 >Reporter: Steve Loughran > > To test failure resilience today you either need custom scripts or implement > Chaos Monkey-like logic in your application (SLIDER-202). > Killing AMs and containers on a schedule & probability is the core activity > here, one that could be handled by a CLI App/client lib that does this. > # entry point to have a startup delay before acting > # frequency of chaos wakeup/polling > # probability to AM failure generation (0-100) > # probability of non-AM container kill > # future: other operations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] forrestchen updated YARN-4022: -- Attachment: YARN-4022.002.patch Fix test bug & whitespace. > queue not remove from webpage(/cluster/scheduler) when delete queue in > xxx-scheduler.xml > > > Key: YARN-4022 > URL: https://issues.apache.org/jira/browse/YARN-4022 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: forrestchen > Labels: scheduler > Attachments: YARN-4022.001.patch, YARN-4022.002.patch > > > When I delete an existing queue by modify the xxx-schedule.xml, I can still > see the queue information block in webpage(/cluster/scheduler) though the > 'Min Resources' items all become to zero and have no item of 'Max Running > Applications'. > I can still submit an application to the deleted queue and the application > will run using 'root.default' queue instead, but submit to an un-exist queue > will cause an exception. > My expectation is the deleted queue will not displayed in webpage and submit > application to the deleted queue will act just like the queue doesn't exist. > PS: There's no application running in the queue I delete. > Some related config in yarn-site.xml: > {code} > > yarn.scheduler.fair.user-as-default-queue > false > > > yarn.scheduler.fair.allow-undeclared-pools > false > > {code} > a related question is here: > http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734751#comment-14734751 ] Hadoop QA commented on YARN-4022: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 20s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 59s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 50s | The applied patch generated 11 new checkstyle issues (total was 85, now 94). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 33s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 54m 7s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 92m 55s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754618/YARN-4022.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 435f935 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9035/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9035/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9035/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9035/console | This message was automatically generated. > queue not remove from webpage(/cluster/scheduler) when delete queue in > xxx-scheduler.xml > > > Key: YARN-4022 > URL: https://issues.apache.org/jira/browse/YARN-4022 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: forrestchen > Labels: scheduler > Attachments: YARN-4022.001.patch, YARN-4022.002.patch > > > When I delete an existing queue by modify the xxx-schedule.xml, I can still > see the queue information block in webpage(/cluster/scheduler) though the > 'Min Resources' items all become to zero and have no item of 'Max Running > Applications'. > I can still submit an application to the deleted queue and the application > will run using 'root.default' queue instead, but submit to an un-exist queue > will cause an exception. > My expectation is the deleted queue will not displayed in webpage and submit > application to the deleted queue will act just like the queue doesn't exist. > PS: There's no application running in the queue I delete. > Some related config in yarn-site.xml: > {code} > > yarn.scheduler.fair.user-as-default-queue > false > > > yarn.scheduler.fair.allow-undeclared-pools > false > > {code} > a related question is here: > http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] forrestchen updated YARN-4022: -- Labels: scheduler (was: ) > queue not remove from webpage(/cluster/scheduler) when delete queue in > xxx-scheduler.xml > > > Key: YARN-4022 > URL: https://issues.apache.org/jira/browse/YARN-4022 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: forrestchen > Labels: scheduler > Attachments: YARN-4022.001.patch > > > When I delete an existing queue by modify the xxx-schedule.xml, I can still > see the queue information block in webpage(/cluster/scheduler) though the > 'Min Resources' items all become to zero and have no item of 'Max Running > Applications'. > I can still submit an application to the deleted queue and the application > will run using 'root.default' queue instead, but submit to an un-exist queue > will cause an exception. > My expectation is the deleted queue will not displayed in webpage and submit > application to the deleted queue will act just like the queue doesn't exist. > PS: There's no application running in the queue I delete. > Some related config in yarn-site.xml: > {code} > > yarn.scheduler.fair.user-as-default-queue > false > > > yarn.scheduler.fair.allow-undeclared-pools > false > > {code} > a related question is here: > http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events
[ https://issues.apache.org/jira/browse/YARN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734560#comment-14734560 ] Hadoop QA commented on YARN-4129: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 6s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 8m 12s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 21s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 28s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 27s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 35s | The patch appears to introduce 2 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 54m 12s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 94m 6s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754593/YARN-4129.YARN-2928.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / e6afe26 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9031/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9031/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9031/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9031/console | This message was automatically generated. > Refactor the SystemMetricPublisher in RM to better support newer events > --- > > Key: YARN-4129 > URL: https://issues.apache.org/jira/browse/YARN-4129 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-4129.YARN-2928.001.patch > > > Currently to add new timeline event/ entity in RM side, one has to add a > method in publisher and a method in handler and create a new event class > which looks cumbersome and redundant. also further all the events might not > be required to be published in V1 & V2. So adopting the approach similar to > what was adopted in YARN-3045(NM side) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]
[ https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734602#comment-14734602 ] nijel commented on YARN-3771: - hi all, any comment on this change ? > "final" behavior is not honored for > YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[] > > > Key: YARN-3771 > URL: https://issues.apache.org/jira/browse/YARN-3771 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: 0001-YARN-3771.patch > > > i was going through some find bugs rules. One issue reported in that is > public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = { > and > public static final String[] > DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH= > is not honoring the final qualifier. The string array contents can be re > assigned ! > Simple test > {code} > public class TestClass { > static final String[] t = { "1", "2" }; > public static void main(String[] args) { > System.out.println(12 < 10); > String[] t1={"u"}; > //t = t1; // this will show compilation error > t (1) = t1 (1) ; // But this works > } > } > {code} > One option is to use Collections.unmodifiableList > any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.
[ https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734610#comment-14734610 ] Hudson commented on YARN-4121: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2281 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2281/]) YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md * hadoop-yarn-project/CHANGES.txt > Typos in capacity scheduler documentation. > -- > > Key: YARN-4121 > URL: https://issues.apache.org/jira/browse/YARN-4121 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-4121.00.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4130) Duplicate declaration of ApplicationId in RMAppManager
[ https://issues.apache.org/jira/browse/YARN-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Sasaki updated YARN-4130: - Attachment: YARN-4130.00.patch > Duplicate declaration of ApplicationId in RMAppManager > -- > > Key: YARN-4130 > URL: https://issues.apache.org/jira/browse/YARN-4130 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Trivial > Labels: resourcemanager > Attachments: YARN-4130.00.patch > > > ApplicationId is declared double in {{RMAppManager}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734580#comment-14734580 ] Hadoop QA commented on YARN-4022: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 28s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 25s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 52m 30s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 90m 8s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerFairShare | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754601/YARN-4022.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 435f935 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9032/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9032/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9032/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9032/console | This message was automatically generated. > queue not remove from webpage(/cluster/scheduler) when delete queue in > xxx-scheduler.xml > > > Key: YARN-4022 > URL: https://issues.apache.org/jira/browse/YARN-4022 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: forrestchen > Labels: scheduler > Attachments: YARN-4022.001.patch > > > When I delete an existing queue by modify the xxx-schedule.xml, I can still > see the queue information block in webpage(/cluster/scheduler) though the > 'Min Resources' items all become to zero and have no item of 'Max Running > Applications'. > I can still submit an application to the deleted queue and the application > will run using 'root.default' queue instead, but submit to an un-exist queue > will cause an exception. > My expectation is the deleted queue will not displayed in webpage and submit > application to the deleted queue will act just like the queue doesn't exist. > PS: There's no application running in the queue I delete. > Some related config in yarn-site.xml: > {code} > > yarn.scheduler.fair.user-as-default-queue > false > > > yarn.scheduler.fair.allow-undeclared-pools > false > > {code} > a related question is here: > http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3337) Provide YARN chaos monkey
[ https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734647#comment-14734647 ] Junping Du commented on YARN-3337: -- OK. Let me create a sub task with focusing on adding this API and CLI. > Provide YARN chaos monkey > - > > Key: YARN-3337 > URL: https://issues.apache.org/jira/browse/YARN-3337 > Project: Hadoop YARN > Issue Type: New Feature > Components: test >Affects Versions: 2.7.0 >Reporter: Steve Loughran > > To test failure resilience today you either need custom scripts or implement > Chaos Monkey-like logic in your application (SLIDER-202). > Killing AMs and containers on a schedule & probability is the core activity > here, one that could be handled by a CLI App/client lib that does this. > # entry point to have a startup delay before acting > # frequency of chaos wakeup/polling > # probability to AM failure generation (0-100) > # probability of non-AM container kill > # future: other operations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4121) Typos in capacity scheduler documentation.
[ https://issues.apache.org/jira/browse/YARN-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734611#comment-14734611 ] Hudson commented on YARN-4121: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2304 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2304/]) YARN-4121. Fix typos in capacity scheduler documentation. Contributed by Kai Sasaki. (vvasudev: rev 435f935ba7d8abb1a35796b72d1de906ded80592) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md * hadoop-yarn-project/CHANGES.txt > Typos in capacity scheduler documentation. > -- > > Key: YARN-4121 > URL: https://issues.apache.org/jira/browse/YARN-4121 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-4121.00.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4130) Duplicate declaration of ApplicationId in RMAppManager
Kai Sasaki created YARN-4130: Summary: Duplicate declaration of ApplicationId in RMAppManager Key: YARN-4130 URL: https://issues.apache.org/jira/browse/YARN-4130 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.1 Reporter: Kai Sasaki Assignee: Kai Sasaki Priority: Trivial ApplicationId is declared double in {{RMAppManager}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4110) RMappImpl and RmAppAttemptImpl should override hashcode() & equals()
[ https://issues.apache.org/jira/browse/YARN-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734717#comment-14734717 ] Hadoop QA commented on YARN-4110: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 59s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 11s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 24s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 53m 10s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 91m 47s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754166/YARN-4110_1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 435f935 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9034/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9034/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9034/console | This message was automatically generated. > RMappImpl and RmAppAttemptImpl should override hashcode() & equals() > > > Key: YARN-4110 > URL: https://issues.apache.org/jira/browse/YARN-4110 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > Attachments: YARN-4110_1.patch > > > It is observed that RMAppImpl and RMAppAttemptImpl does not have hashcode() > and equals() implementations. These state objects should override these > implementations. > # For RMAppImpl, we can use of ApplicationId#hashcode and > ApplicationId#equals. > # Similarly, RMAppAttemptImpl, ApplicationAttemptId#hashcode and > ApplicationAttemptId#equals -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4132) Nodemanagers should try harder to connect to the RM
Chang Li created YARN-4132: -- Summary: Nodemanagers should try harder to connect to the RM Key: YARN-4132 URL: https://issues.apache.org/jira/browse/YARN-4132 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Being part of the cluster, nodemanagers should try very hard (and possibly never give up) to connect to a resourcemanager. Minimally we should have a separate config to set how aggressively a nodemanager will connect to the RM separate from what clients will do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735452#comment-14735452 ] Vrushali C commented on YARN-3901: -- Thanks [~sjlee0] for the review! I will correct the variable ordering for static and private members as well as making variables final. bq. l.210: Strictly speaking, GenericObjectMapper will return an integer if the value fits within an integer; so it's not exactly a concern for min/max (timestamps) but for caution we might want to stay with Number instead of long Comparisons are not allowed for Number datatype. {code} The operator < is undefined for the argument type(s) java.lang.Number, java.lang.Number {code} So I would have to do something like {code} Number d = a.longValue() + b.longValue(); {code} Do you think this is better? bq. l.52: Is the TimestampGenerator class going to be used outside FlowRunCoprocessor? If not, I would argue that we should make it an inner class of FlowRunCoprocessor. At least we should make it non-public to keep it within the package. If it would see general use outside this class, then it might be better to make it a true public class in the common package. I suspect a non-public class might be what we want here. I am thinking I will need this when the flush/compaction scanner is added in. If you'd like, I can move it in as a non-public class for now and then move it out if needed. bq. It's up to you, but you could leave the row key improvement to YARN-4074. That might be easier to manage the changes between yours and mine. I'm restructuring all *RowKey classes uniformly. I actually needed this in the unit test while checking the FlowActivityTable contents, if you want I can take it out and you can add that test case in when you add in the RowKey changes? bq. l.144: This would mean that some cell timestamps would have the unit of the milliseconds and others would be in nanoseconds. I'm a little bit concerned if we ever interpret these timestamps incorrectly. Could there be a more explicit way of clearly differentiating them? I don't have good suggestions at the moment. Yeah, I was thinking about that too. Right now, metrics will get their own timestamps. For other columns, we'd be using the nanoseconds. I am trying to see if we can just use milliseconds. bq. it might be good to have short comments on what each method is testing I did try to make the unit test names themselves descriptive like testFlowActivityTable or testWriteFlowRunMinMaxToHBase or testWriteFlowRunMetricsOneFlow or testWriteFlowActivityOneFlow but I agree some more comments in the unit test will surely help. Will upload a new patch shortly, thanks! > Populate flow run data in the flow_run & flow activity tables > - > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3901-YARN-2928.1.patch, > YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, > YARN-3901-YARN-2928.4.patch > > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735471#comment-14735471 ] Hadoop QA commented on YARN-3635: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 51s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 8s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 52s | The applied patch generated 14 new checkstyle issues (total was 236, now 242). | | {color:red}-1{color} | whitespace | 0m 3s | The patch has 15 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 54m 13s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 93m 49s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754296/YARN-3635.7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 970daaa | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9040/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9040/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9040/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9040/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9040/console | This message was automatically generated. > Get-queue-mapping should be a common interface of YarnScheduler > --- > > Key: YARN-3635 > URL: https://issues.apache.org/jira/browse/YARN-3635 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Tan, Wangda > Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, > YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch, YARN-3635.7.patch > > > Currently, both of fair/capacity scheduler support queue mapping, which makes > scheduler can change queue of an application after submitted to scheduler. > One issue of doing this in specific scheduler is: If the queue after mapping > has different maximum_allocation/default-node-label-expression of the > original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks > the wrong queue. > I propose to make the queue mapping as a common interface of scheduler, and > RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735384#comment-14735384 ] Sangjin Lee commented on YARN-3901: --- Thanks for the updated patch [~vrushalic]! I went over the new patch, and the following is the quick feedback. I'll also apply it with YARN-4074, and test it a little more. (HBaseTimelineWriterImpl.java) - l.141-155: the whole thing could be inside {{if (isApplication)...}} - l.264: this null check is not needed (FlowRunCoprocessor.java) - l.52: Is the {{TimestampGenerator}} class going to be used outside {{FlowRunCoprocessor}}? If not, I would argue that we should make it an inner class of {{FlowRunCoprocessor}}. At least we should make it non-public to keep it within the package. If it would see general use outside this class, then it might be better to make it a true public class in the common package. I suspect a non-public class might be what we want here. - l.52: let's make it final - l.54: style nit: I think the common style is to place the static variables before instance variables - Also, overall it seems we're using both the diamond operator (<>) and the old style generic declaration. It might be good to stick with one style (in which case the diamond operator might be better). - l.144: This would mean that some cell timestamps would have the unit of the milliseconds and others would be in nanoseconds. I'm a little bit concerned if we ever interpret these timestamps incorrectly. Could there be a more explicit way of clearly differentiating them? I don't have good suggestions at the moment. (FlowScanner.java) - variable ordering - l.210: Strictly speaking, {{GenericObjectMapper}} will return an integer if the value fits within an integer; so it's not exactly a concern for min/max (timestamps) but for caution we might want to stay with {{Number}} instead of long. (TimestampGenerator.java) - l.29: make it final - variable ordering - see above for the public/non-public comment (FlowActivityRowKey.java) - It's up to you, but you could leave the row key improvement to YARN-4074. That might be easier to manage the changes between yours and mine. I'm restructuring all *RowKey classes uniformly. (TestHBaseTimelineWriterImplFlowRun.java) - it might be good to have short comments on what each method is testing > Populate flow run data in the flow_run & flow activity tables > - > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3901-YARN-2928.1.patch, > YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, > YARN-3901-YARN-2928.4.patch > > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735545#comment-14735545 ] Sangjin Lee commented on YARN-4074: --- It'd be great if you could take a look at the latest patch and let me know your feedback. Thanks! > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735448#comment-14735448 ] zhihai xu commented on YARN-4096: - +1. Committing it in. > App local logs are leaked if log aggregation fails to initialize for the app > > > Key: YARN-4096 > URL: https://issues.apache.org/jira/browse/YARN-4096 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-4096.001.patch > > > If log aggregation fails to initialize for an application then the local logs > will never be deleted. This is similar to YARN-3476 except this is a failure > when log aggregation tries to initialize the app-specific log aggregator > rather than a failure during a log upload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735497#comment-14735497 ] Hudson commented on YARN-4096: -- FAILURE: Integrated in Hadoop-trunk-Commit #8416 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8416/]) YARN-4096. App local logs are leaked if log aggregation fails to initialize for the app. Contributed by Jason Lowe. (zxu: rev 16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java > App local logs are leaked if log aggregation fails to initialize for the app > > > Key: YARN-4096 > URL: https://issues.apache.org/jira/browse/YARN-4096 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Fix For: 2.7.2 > > Attachments: YARN-4096.001.patch > > > If log aggregation fails to initialize for an application then the local logs > will never be deleted. This is similar to YARN-3476 except this is a failure > when log aggregation tries to initialize the app-specific log aggregator > rather than a failure during a log upload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735541#comment-14735541 ] Sangjin Lee commented on YARN-4075: --- Sorry [~varun_saxena], it took me a while to review this. The patch looks good for the most part. FYI, I incorporated the XmlElement annotation for flow runs in {{FlowActivityEntity}} in YARN-4074. This change will be in the next patch (once I rebase with Vrushali's latest for YARN-3091). I also implemented the full {{compareTo()}} method already in the current patch for YARN-4074. > [reader REST API] implement support for querying for flows and flow runs > > > Key: YARN-4075 > URL: https://issues.apache.org/jira/browse/YARN-4075 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-4075-YARN-2928.POC.1.patch > > > We need to be able to query for flows and flow runs via REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4126: --- Attachment: 0003-YARN-4126.patch Hi [~jianhe] Attaching patch after testcase updation. {{TestRMWebServicesDelegationTokens}} havnt corrected yet. In nonsecure mode what should be the behaviour for {{RMWebServicesDelegationTokens}}. Currently it will be {{500 Internal Error}} > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-4096: Hadoop Flags: Reviewed > App local logs are leaked if log aggregation fails to initialize for the app > > > Key: YARN-4096 > URL: https://issues.apache.org/jira/browse/YARN-4096 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-4096.001.patch > > > If log aggregation fails to initialize for an application then the local logs > will never be deleted. This is similar to YARN-3476 except this is a failure > when log aggregation tries to initialize the app-specific log aggregator > rather than a failure during a log upload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735537#comment-14735537 ] Hudson commented on YARN-4096: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1095 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1095/]) YARN-4096. App local logs are leaked if log aggregation fails to initialize for the app. Contributed by Jason Lowe. (zxu: rev 16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java > App local logs are leaked if log aggregation fails to initialize for the app > > > Key: YARN-4096 > URL: https://issues.apache.org/jira/browse/YARN-4096 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Fix For: 2.7.2 > > Attachments: YARN-4096.001.patch > > > If log aggregation fails to initialize for an application then the local logs > will never be deleted. This is similar to YARN-3476 except this is a failure > when log aggregation tries to initialize the app-specific log aggregator > rather than a failure during a log upload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735533#comment-14735533 ] Hadoop QA commented on YARN-4132: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 56s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 56s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 23s | The applied patch generated 3 new checkstyle issues (total was 211, now 213). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 48s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 0m 22s | Tests failed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 6m 52s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 49m 56s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.server.nodemanager.TestNodeStatusUpdater | | | hadoop.yarn.server.nodemanager.TestNodeManagerShutdown | | | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754710/YARN-4132.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 970daaa | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9041/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9041/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9041/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9041/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9041/console | This message was automatically generated. > Nodemanagers should try harder to connect to the RM > --- > > Key: YARN-4132 > URL: https://issues.apache.org/jira/browse/YARN-4132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4132.patch > > > Being part of the cluster, nodemanagers should try very hard (and possibly > never give up) to connect to a resourcemanager. Minimally we should have a > separate config to set how aggressively a nodemanager will connect to the RM > separate from what clients will do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4132: --- Attachment: YARN-4132.patch > Nodemanagers should try harder to connect to the RM > --- > > Key: YARN-4132 > URL: https://issues.apache.org/jira/browse/YARN-4132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4132.patch > > > Being part of the cluster, nodemanagers should try very hard (and possibly > never give up) to connect to a resourcemanager. Minimally we should have a > separate config to set how aggressively a nodemanager will connect to the RM > separate from what clients will do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735490#comment-14735490 ] zhihai xu commented on YARN-4096: - thanks Jason for the contribution! Committed it to branch-2.7.2, branch-2 and trunk. > App local logs are leaked if log aggregation fails to initialize for the app > > > Key: YARN-4096 > URL: https://issues.apache.org/jira/browse/YARN-4096 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-4096.001.patch > > > If log aggregation fails to initialize for an application then the local logs > will never be deleted. This is similar to YARN-3476 except this is a failure > when log aggregation tries to initialize the app-specific log aggregator > rather than a failure during a log upload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4132: --- Attachment: YARN-4132.2.patch fixed broken test in TestYarnConfigurationFields. The other broken tests are not related to my changes(seem to be caused by network problem on testing platform). Those tests all pass on my .2 patch on my local machine. > Nodemanagers should try harder to connect to the RM > --- > > Key: YARN-4132 > URL: https://issues.apache.org/jira/browse/YARN-4132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4132.2.patch, YARN-4132.patch > > > Being part of the cluster, nodemanagers should try very hard (and possibly > never give up) to connect to a resourcemanager. Minimally we should have a > separate config to set how aggressively a nodemanager will connect to the RM > separate from what clients will do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735683#comment-14735683 ] Chang Li commented on YARN-4132: [~jlowe] please help review the latest patch. Thanks! > Nodemanagers should try harder to connect to the RM > --- > > Key: YARN-4132 > URL: https://issues.apache.org/jira/browse/YARN-4132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4132.2.patch, YARN-4132.patch > > > Being part of the cluster, nodemanagers should try very hard (and possibly > never give up) to connect to a resourcemanager. Minimally we should have a > separate config to set how aggressively a nodemanager will connect to the RM > separate from what clients will do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors
[ https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-2410: -- Attachment: YARN-2410-v7.patch Modified ShuffleHandler to not use channel attachments. Moved MockNetty code to a helper method. > Nodemanager ShuffleHandler can possible exhaust file descriptors > > > Key: YARN-2410 > URL: https://issues.apache.org/jira/browse/YARN-2410 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nathan Roberts >Assignee: Kuhu Shukla > Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, > YARN-2410-v3.patch, YARN-2410-v4.patch, YARN-2410-v5.patch, > YARN-2410-v6.patch, YARN-2410-v7.patch > > > The async nature of the shufflehandler can cause it to open a huge number of > file descriptors, when it runs out it crashes. > Scenario: > Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node. > Let's say all 6K reduces hit a node at about same time asking for their > outputs. Each reducer will ask for all 40 map outputs over a single socket in > a > single request (not necessarily all 40 at once, but with coalescing it is > likely to be a large number). > sendMapOutput() will open the file for random reading and then perform an > async transfer of the particular portion of this file(). This will > theoretically > happen 6000*40=24 times which will run the NM out of file descriptors and > cause it to crash. > The algorithm should be refactored a little to not open the fds until they're > actually needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3901) Populate flow run data in the flow_run & flow activity tables
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3901: - Attachment: YARN-3901-YARN-2928.5.patch Uploading patch v5 that incorporates Sangjin's review suggestions. > Populate flow run data in the flow_run & flow activity tables > - > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3901-YARN-2928.1.patch, > YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, > YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch > > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1651: - Attachment: YARN-1651-4.YARN-1197.patch Thanks comments! [~mding]. bq. I only mention this because pullNewlyAllocatedContainers() has a check for null for the same logic, so I think we may want to make it consistent? Yes you're correct, updated code, thanks. bq. So, based on my understanding, if an application has reserved some resource for a container resource increase request on a node, that amount of resource should never be unreserved in order for the application to allocate a regular container on some other node. But that doesn't seem to be the case right now? Can you confirm? Done, now added check to {{getNodeIdToUnreserve}}, will check if a container is a increase reservation before cancel it. bq. I think it will be desirable to implement a pendingDecrease set in SchedulerApplicationAttempt, and corresponding logic, just like SchedulerApplicationAttempt.pendingRelease. This is to guard against the situation when decrease requests are received while RM is in the middle of recovery, and has not received all container statuses from NM yet. I agree the general idea, and we should do the similar thing. However, I'm not sure caching in RM is a good idea, potentially a malicious AM can send millions of unknown-to-be-decreased-containers to RM when RM started. Maybe it's better to cache in AMRMClient side. I think we can do this in a separated JIRA? Could you file a new JIRA for this if you agree? bq. Some nits... Addressed. Uploaded ver.4 patch. > CapacityScheduler side changes to support increase/decrease container > resource. > --- > > Key: YARN-1651 > URL: https://issues.apache.org/jira/browse/YARN-1651 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-1651-1.YARN-1197.patch, > YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, > YARN-1651-4.YARN-1197.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735677#comment-14735677 ] Hadoop QA commented on YARN-4132: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 15s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 1s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 52s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 20s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 7m 55s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 56m 39s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754730/YARN-4132.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d9c1fab | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9043/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9043/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9043/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9043/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9043/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9043/console | This message was automatically generated. > Nodemanagers should try harder to connect to the RM > --- > > Key: YARN-4132 > URL: https://issues.apache.org/jira/browse/YARN-4132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4132.2.patch, YARN-4132.patch > > > Being part of the cluster, nodemanagers should try very hard (and possibly > never give up) to connect to a resourcemanager. Minimally we should have a > separate config to set how aggressively a nodemanager will connect to the RM > separate from what clients will do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735704#comment-14735704 ] Hudson commented on YARN-4096: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2307 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2307/]) YARN-4096. App local logs are leaked if log aggregation fails to initialize for the app. Contributed by Jason Lowe. (zxu: rev 16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java > App local logs are leaked if log aggregation fails to initialize for the app > > > Key: YARN-4096 > URL: https://issues.apache.org/jira/browse/YARN-4096 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Fix For: 2.7.2 > > Attachments: YARN-4096.001.patch > > > If log aggregation fails to initialize for an application then the local logs > will never be deleted. This is similar to YARN-3476 except this is a failure > when log aggregation tries to initialize the app-specific log aggregator > rather than a failure during a log upload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735676#comment-14735676 ] MENG DING commented on YARN-1651: - Hi, [~leftnoteasy] bq. I agree the general idea, and we should do the similar thing. However, I'm not sure caching in RM is a good idea, potentially a malicious AM can send millions of unknown-to-be-decreased-containers to RM when RM started. Maybe it's better to cache in AMRMClient side. I think we can do this in a separated JIRA? Could you file a new JIRA for this if you agree? Your proposal makes sense. I will file a JIRA for this. Thanks for addressing my comments. I don't have more comments for now. As per our discussion, I will come up with an end-to-end test based on distributedshell, and post onto this JIRA for review. > CapacityScheduler side changes to support increase/decrease container > resource. > --- > > Key: YARN-1651 > URL: https://issues.apache.org/jira/browse/YARN-1651 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-1651-1.YARN-1197.patch, > YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, > YARN-1651-4.YARN-1197.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735672#comment-14735672 ] Hudson commented on YARN-4096: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #357 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/357/]) YARN-4096. App local logs are leaked if log aggregation fails to initialize for the app. Contributed by Jason Lowe. (zxu: rev 16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * hadoop-yarn-project/CHANGES.txt > App local logs are leaked if log aggregation fails to initialize for the app > > > Key: YARN-4096 > URL: https://issues.apache.org/jira/browse/YARN-4096 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Fix For: 2.7.2 > > Attachments: YARN-4096.001.patch > > > If log aggregation fails to initialize for an application then the local logs > will never be deleted. This is similar to YARN-3476 except this is a failure > when log aggregation tries to initialize the app-specific log aggregator > rather than a failure during a log upload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs
[ https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3985: Component/s: (was: fairscheduler) (was: capacityscheduler) > Make ReservationSystem persist state using RMStateStore reservation APIs > - > > Key: YARN-3985 > URL: https://issues.apache.org/jira/browse/YARN-3985 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > > YARN-3736 adds the RMStateStore apis to store and load reservation state. > This jira adds the actual storing of state from ReservationSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735714#comment-14735714 ] Hadoop QA commented on YARN-4126: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 25s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 54s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 57s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 50s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 23m 2s | Tests passed in hadoop-common. | | {color:red}-1{color} | yarn tests | 53m 35s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 120m 41s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens | | | hadoop.yarn.server.resourcemanager.TestClientRMService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754713/0003-YARN-4126.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 16b9037 | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9042/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9042/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9042/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9042/console | This message was automatically generated. > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3337) Provide YARN chaos monkey
[ https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734800#comment-14734800 ] Junping Du commented on YARN-3337: -- I think there is one difficulty here: it looks like we didn't keep finished container info in RM scheduler info but only keep live containers info (in SchedulerApplicationAttempt). If no dead container info get preserved in RM, the new added API can only send kill container event but no way to know if container get killed actually (no way to differentiate a wrong container ID or an ID for finished container). CLI could be better as it can query running container list first, then kill it and wait container is not active. If we want exactly the same semantic as kill apps API, then we have to make RM to track info for dead containers which sounds too overkill to me as it force RM to track all containers for all applications (complexity become the same as MRv1). May be a better trade-off here is: the semantic for forceKillContainer() only means to send kill containers events but not means container get killed or not. A boolean value response for forceKillContainer() indicate if we found a live container to kill or not. So we could lose Idempotent property for this API? > Provide YARN chaos monkey > - > > Key: YARN-3337 > URL: https://issues.apache.org/jira/browse/YARN-3337 > Project: Hadoop YARN > Issue Type: New Feature > Components: test >Affects Versions: 2.7.0 >Reporter: Steve Loughran > > To test failure resilience today you either need custom scripts or implement > Chaos Monkey-like logic in your application (SLIDER-202). > Killing AMs and containers on a schedule & probability is the core activity > here, one that could be handled by a CLI App/client lib that does this. > # entry point to have a startup delay before acting > # frequency of chaos wakeup/polling > # probability to AM failure generation (0-100) > # probability of non-AM container kill > # future: other operations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735786#comment-14735786 ] Jian He commented on YARN-4126: --- yes, oozie has fixed its own. This is just YARN side fix. > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors
[ https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735766#comment-14735766 ] Hadoop QA commented on YARN-2410: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 59s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 51s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 7s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 21s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 44s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 0m 19s | Tests passed in hadoop-mapreduce-client-shuffle. | | | | 37m 47s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754746/YARN-2410-v7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d9c1fab | | hadoop-mapreduce-client-shuffle test log | https://builds.apache.org/job/PreCommit-YARN-Build/9046/artifact/patchprocess/testrun_hadoop-mapreduce-client-shuffle.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9046/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9046/console | This message was automatically generated. > Nodemanager ShuffleHandler can possible exhaust file descriptors > > > Key: YARN-2410 > URL: https://issues.apache.org/jira/browse/YARN-2410 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nathan Roberts >Assignee: Kuhu Shukla > Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, > YARN-2410-v3.patch, YARN-2410-v4.patch, YARN-2410-v5.patch, > YARN-2410-v6.patch, YARN-2410-v7.patch > > > The async nature of the shufflehandler can cause it to open a huge number of > file descriptors, when it runs out it crashes. > Scenario: > Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node. > Let's say all 6K reduces hit a node at about same time asking for their > outputs. Each reducer will ask for all 40 map outputs over a single socket in > a > single request (not necessarily all 40 at once, but with coalescing it is > likely to be a large number). > sendMapOutput() will open the file for random reading and then perform an > async transfer of the particular portion of this file(). This will > theoretically > happen 6000*40=24 times which will run the NM out of file descriptors and > cause it to crash. > The algorithm should be refactored a little to not open the fds until they're > actually needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735791#comment-14735791 ] Hudson commented on YARN-4096: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2284 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2284/]) YARN-4096. App local logs are leaked if log aggregation fails to initialize for the app. Contributed by Jason Lowe. (zxu: rev 16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java > App local logs are leaked if log aggregation fails to initialize for the app > > > Key: YARN-4096 > URL: https://issues.apache.org/jira/browse/YARN-4096 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Fix For: 2.7.2 > > Attachments: YARN-4096.001.patch > > > If log aggregation fails to initialize for an application then the local logs > will never be deleted. This is similar to YARN-3476 except this is a failure > when log aggregation tries to initialize the app-specific log aggregator > rather than a failure during a log upload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735859#comment-14735859 ] Hadoop QA commented on YARN-1651: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 2s | Findbugs (version ) appears to be broken on YARN-1197. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 20 new or modified test files. | | {color:red}-1{color} | javac | 8m 10s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | javadoc | 10m 17s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 55s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 31m 2s | The patch has 163 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 5m 29s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 9m 26s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | tools/hadoop tests | 0m 53s | Tests passed in hadoop-sls. | | {color:green}+1{color} | yarn tests | 6m 58s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 59m 24s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 154m 43s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-common | | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754736/YARN-1651-4.YARN-1197.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-1197 / f86eae1 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/diffJavacWarnings.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9045/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9045/console | This message was automatically generated. > CapacityScheduler side changes to support increase/decrease container > resource. > --- > > Key: YARN-1651 > URL: https://issues.apache.org/jira/browse/YARN-1651 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-1651-1.YARN-1197.patch, > YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, > YARN-1651-4.YARN-1197.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735787#comment-14735787 ] Jian He commented on YARN-4126: --- yes, oozie has fixed its own. This is just YARN side fix. > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4126: -- Comment: was deleted (was: yes, oozie has fixed its own. This is just YARN side fix.) > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735837#comment-14735837 ] Hudson commented on YARN-4096: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #345 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/345/]) YARN-4096. App local logs are leaked if log aggregation fails to initialize for the app. Contributed by Jason Lowe. (zxu: rev 16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/CHANGES.txt > App local logs are leaked if log aggregation fails to initialize for the app > > > Key: YARN-4096 > URL: https://issues.apache.org/jira/browse/YARN-4096 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Fix For: 2.7.2 > > Attachments: YARN-4096.001.patch > > > If log aggregation fails to initialize for an application then the local logs > will never be deleted. This is similar to YARN-3476 except this is a failure > when log aggregation tries to initialize the app-specific log aggregator > rather than a failure during a log upload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.
zhihai xu created YARN-4133: --- Summary: Containers to be preempted leaks in FairScheduler preemption logic. Key: YARN-4133 URL: https://issues.apache.org/jira/browse/YARN-4133 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Reporter: zhihai xu Assignee: zhihai xu Containers to be preempted leaks in FairScheduler preemption logic. It may cause missing preemption due to containers in {{warnedContainers}} wrongly removed. The problem is in {{preemptResources}}: There are two issues which can cause containers wrongly removed from {{warnedContainers}}: Firstly missing the container state {{RMContainerState.ACQUIRED}} in the condition check: {code} (container.getState() == RMContainerState.RUNNING || container.getState() == RMContainerState.ALLOCATED) {code} Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we shouldn't remove container from {{warnedContainers}}, We should only remove container from {{warnedContainers}}, if container is not in state {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and {{RMContainerState.ACQUIRED}}. {code} if ((container.getState() == RMContainerState.RUNNING || container.getState() == RMContainerState.ALLOCATED) && isResourceGreaterThanNone(toPreempt)) { warnOrKillContainer(container); Resources.subtractFrom(toPreempt, container.getContainer().getResource()); } else { warnedIter.remove(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors
[ https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734931#comment-14734931 ] Jason Lowe commented on YARN-2410: -- Thanks for updating the patch! bq. The only reason was findbugs which does not allow more than 7 parameters in a function call Normally a builder pattern is used to make the code more readable in those situations. However I don't think we need more than 7. ReduceContext really only needs mapIds, reduceId, channelCtx, user, infoMap, and outputBasePathStr. The other two parameters are either known to be zero (should not be passed) and can be derived from another (size of mapIds). As such we don't need SendMapOutputParams. bq. The reduceContext is a variable holds the value set by the setAttachment() method and is used by the getAttachment() answer. If I declare it in the test method, it needs be final which cannot be done due to it being used by the setter. createMockChannel can simply have a ReduceContext parameter, marked final, and that should solve that problem. But I thought we were getting rid of the use of channel attachments and just associating the context with the listener directly? Related to the last comment, we're still using channel attachments. sendMap can just take a ReduceContext parameter, and the listener can provide its context when calling it. No need for channel attachments. This can NPE since we're checking for null after we already use it: {noformat} +nextMap = sendMapOutput( +reduceContext.getSendMapOutputParams().getCtx(), +reduceContext.getSendMapOutputParams().getCtx().getChannel(), +reduceContext.getSendMapOutputParams().getUser(), mapId, +reduceContext.getSendMapOutputParams().getReduceId(), info); +nextMap.addListener(new ReduceMapFileCount(reduceContext)); +if (null == nextMap) { {noformat} maxSendMapCount should be cached during serviceInit like the other conf-derived settings so we aren't doing conf lookups on every shuffle. The indentation in sendMap isn't correct, as code is indented after a conditional block at the same level as the contents of the conditional block. There's other places that are over-indented. MockShuffleHandler only needs to override one thing, getShuffle, but the mock that method returns has to override a bunch of stuff. It makes more sense to create a separate class for the mocked Shuffle than the mocked ShuffleHandler. Should the mock Future stuff be part of creating the mocked channel? Can simply pass the listener list to use as an arg to the method that mocks up the channel. Nit: SHUFFLE_MAX_SEND_COUNT should probably be something like SHUFFLE_MAX_SESSION_OPEN_FILES to better match the property name. Similarly maxSendMapCount could have a more appropriate name. Nit: Format for 80 columns Nit: There's still instances where we have a class definition immediately after variable definitions and a lack of whitespace between classes and methods or between methods. Whitespace would help readability in those places. > Nodemanager ShuffleHandler can possible exhaust file descriptors > > > Key: YARN-2410 > URL: https://issues.apache.org/jira/browse/YARN-2410 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nathan Roberts >Assignee: Kuhu Shukla > Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, > YARN-2410-v3.patch, YARN-2410-v4.patch, YARN-2410-v5.patch, YARN-2410-v6.patch > > > The async nature of the shufflehandler can cause it to open a huge number of > file descriptors, when it runs out it crashes. > Scenario: > Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node. > Let's say all 6K reduces hit a node at about same time asking for their > outputs. Each reducer will ask for all 40 map outputs over a single socket in > a > single request (not necessarily all 40 at once, but with coalescing it is > likely to be a large number). > sendMapOutput() will open the file for random reading and then perform an > async transfer of the particular portion of this file(). This will > theoretically > happen 6000*40=24 times which will run the NM out of file descriptors and > cause it to crash. > The algorithm should be refactored a little to not open the fds until they're > actually needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3943: Attachment: YARN-3943.000.patch > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3943: Attachment: (was: YARN-3943.000.patch) > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4131) Add API and CLI to kill container on given containerId
[ https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4131: - Attachment: YARN-4131-demo.patch Attach a demo patch, more test work is still needed. > Add API and CLI to kill container on given containerId > -- > > Key: YARN-4131 > URL: https://issues.apache.org/jira/browse/YARN-4131 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-4131-demo.patch > > > Per YARN-3337, we need a handy tools to kill container in some scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3999) RM hangs on draining events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3999: -- Labels: 2.6.1-candidate (was: ) Adding to 2.6.1 from Jian's comment in the mailing list that I missed before. > RM hangs on draining events > --- > > Key: YARN-3999 > URL: https://issues.apache.org/jira/browse/YARN-3999 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Labels: 2.6.1-candidate > Fix For: 2.7.2 > > Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, > YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, > YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch > > > If external systems like ATS, or ZK becomes very slow, draining all the > events take a lot of time. If this time becomes larger than 10 mins, all > applications will expire. Fixes include: > 1. add a timeout and stop the dispatcher even if not all events are drained. > 2. Move ATS service out from RM active service so that RM doesn't need to > wait for ATS to flush the events when transitioning to standby. > 3. Stop client-facing services (ClientRMService etc.) first so that clients > get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4120) FSAppAttempt.getResourceUsage() should not take preemptedResource into account
[ https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735952#comment-14735952 ] Xianyin Xin commented on YARN-4120: --- Hi [~kasha], there's another issue in the current preemption logic, it's in {{FSParentQueue.java}} and {{FSLeafQueue.java}}, {code} public RMContainer preemptContainer() { RMContainer toBePreempted = null; // Find the childQueue which is most over fair share FSQueue candidateQueue = null; Comparator comparator = policy.getComparator(); readLock.lock(); try { for (FSQueue queue : childQueues) { if (candidateQueue == null || comparator.compare(queue, candidateQueue) > 0) { candidateQueue = queue; } } } finally { readLock.unlock(); } // Let the selected queue choose which of its container to preempt if (candidateQueue != null) { toBePreempted = candidateQueue.preemptContainer(); } return toBePreempted; } {code} {code} public RMContainer preemptContainer() { RMContainer toBePreempted = null; // If this queue is not over its fair share, reject if (!preemptContainerPreCheck()) { return toBePreempted; } {code} If the queue's hierarchy like that in the *Description*, suppose queue1 and queue2 have the same weight, and the cluster has 8 containers, 4 occupied by queue1.1 and 4 occupied by queue2. If new app was added in queue1.2, 2 containers should be preempted from queue1.1. However, according the above code, queue1 and queue2 are both at their fairshare, so the preemption will not happen. So if all of the childqueues at any level are at their fairshare, preemption will not happen even though there is/are resource deficit in some leafqueues. I think we have to drop this logic in this case. As a candidate, we can calculates an ideal preemption distribution by traversing the queues. Any thoughts? > FSAppAttempt.getResourceUsage() should not take preemptedResource into account > -- > > Key: YARN-4120 > URL: https://issues.apache.org/jira/browse/YARN-4120 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Xianyin Xin > > When compute resource usage for Schedulables, the following code is envolved, > {{FSAppAttempt.getResourceUsage}}, > {code} > public Resource getResourceUsage() { > return Resources.subtract(getCurrentConsumption(), getPreemptedResources()); > } > {code} > and this value is aggregated to FSLeafQueues and FSParentQueues. In my > opinion, taking {{preemptedResource}} into account here is not reasonable, > there are two main reasons, > # it is something in future, i.e., even though these resources are marked as > preempted, it is currently used by app, and these resources will be > subtracted from {{currentCosumption}} once the preemption is finished. it's > not reasonable to make arrange for it ahead of time. > # there's another problem here, consider following case, > {code} > root >/\ > queue1 queue2 > /\ > queue1.3, queue1.4 > {code} > suppose queue1.3 need resource and it can preempt resources from queue1.4, > the preemption happens in the interior of queue1. But when compute resource > usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - > preemption}} according to the current code, which is unfair to queue2 when > doing resource allocating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.
[ https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735987#comment-14735987 ] Xianyin Xin commented on YARN-4133: --- Of course we can also address these problems one by one in different jiras. If you like this, just kindly ignore the above comment. > Containers to be preempted leaks in FairScheduler preemption logic. > --- > > Key: YARN-4133 > URL: https://issues.apache.org/jira/browse/YARN-4133 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4133.000.patch > > > Containers to be preempted leaks in FairScheduler preemption logic. It may > cause missing preemption due to containers in {{warnedContainers}} wrongly > removed. The problem is in {{preemptResources}}: > There are two issues which can cause containers wrongly removed from > {{warnedContainers}}: > Firstly missing the container state {{RMContainerState.ACQUIRED}} in the > condition check: > {code} > (container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) > {code} > Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we > shouldn't remove container from {{warnedContainers}}. We should only remove > container from {{warnedContainers}}, if container is not in state > {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and > {{RMContainerState.ACQUIRED}}. > {code} > if ((container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) && > isResourceGreaterThanNone(toPreempt)) { > warnOrKillContainer(container); > Resources.subtractFrom(toPreempt, > container.getContainer().getResource()); > } else { > warnedIter.remove(); > } > {code} > Also once the containers in {{warnedContainers}} are wrongly removed, it will > never be preempted. Because these containers are already in > {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't > return the containers in {{FSAppAttempt#preemptionMap}}. > {code} > public RMContainer preemptContainer() { > if (LOG.isDebugEnabled()) { > LOG.debug("App " + getName() + " is going to preempt a running " + > "container"); > } > RMContainer toBePreempted = null; > for (RMContainer container : getLiveContainers()) { > if (!getPreemptionContainers().contains(container) && > (toBePreempted == null || > comparator.compare(toBePreempted, container) > 0)) { > toBePreempted = container; > } > } > return toBePreempted; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4126: --- Attachment: 0004-YARN-4126.patch > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch, 0004-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4086) Allow Aggregated Log readers to handle HAR files
[ https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-4086: Attachment: YARN-4086.002.patch The 002 patch makes that test less brittle. I also fixed the RAT and checkstyle warnings. The test failure was because test-patch couldn't handle the binary part of the patch. > Allow Aggregated Log readers to handle HAR files > > > Key: YARN-4086 > URL: https://issues.apache.org/jira/browse/YARN-4086 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-4086.001.patch, YARN-4086.002.patch > > > This is for the YARN changes for MAPREDUCE-6415. It allows the yarn CLI and > web UIs to read aggregated logs from HAR files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735887#comment-14735887 ] Joep Rottinghuis commented on YARN-3901: Thanks [~vrushalic]. I'm going to dig through the details on the latest patch. Separately [~sjlee0] and I further discussed the challenges of taking the timestamp on the coprocessor, buffering writes, app restarts, timestamp collisions and ordering of various writes that come on. 1) Given that we have timestamps in # millis, then multiplying by 1,000 should suffice. It is unlikely that we'd have > 1M writes for one column in one region server for one flow. If we multiply by 1M we get close to the total date range that can fit in a long (still years to come, but still). 2) If we do any shifting of time, we should do the same everywhere to keep things consistent, and to keep the ability to ask what a particular row (roughly) looked like at any particular time (like last night midnight, what was the state of this entire row). 3) We think in the column helper, if the ATS client supplies a timestamp, we should multiply by 1,000. If we read any timestamp from HBase, we'll divide by 1,000. 4) If the ATS client doesn't supply the timestamp, we'll grab the timestamp in the ats writer the moment the write arrives (and before it is batched / buffered in the buffered mutator, HBase client, or RS queue). We then take this time and multiply by 1,000. Reads again divide by 1,000 to get back to millis in epoch as before. 5) For Agg operation SUM, MIN, and MAX we take the least significant 3 digits of the app_id and add this to the (timestamp*1000), so that we create a unique timestamp per app in an active flow-run. This should avoid any collisions. This takes care of uniqueness (no collisions on a single ms), but also solves for older instances of a writer (in case of a second AM attempt for example) or any other kind of ordering issue. The write are timestamped when they arrive at the writer. 6) If some piece of client code doesn't set any timestamp (this should be an error) then we cannot effectively order the writes as per the previous point. We still need to ensure that we don't have collisions. If the client supplied timestamp if LONG.Maxvalue, then we can generate the timestamp in the coprocessor on the servers side, modulo the counter to ensure uniqueness. We should still multiply by 1K to make the same amount of space for the unique counter. > Populate flow run data in the flow_run & flow activity tables > - > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3901-YARN-2928.1.patch, > YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, > YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch > > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735900#comment-14735900 ] Joep Rottinghuis commented on YARN-3901: The one remaining issue we have to tackle is when there are two app attempts. The previous app attempt ends up buffering some writes, and the new app attempt ends up writing a final_value. Now if the flush happens before the first attempt its write comes in, we no longer have the unaggregated value for that app_id in order to discard against (the timestamp should have taken care of this order). We can deal with this issue in three ways: 1) Ignore (risky and very hard to debug if it ever happens) 2) Keep the final value around until it has aged a certain time. Upside is that the value is initially kept (for for example 1-2 days?) and then later discarded. Downside is that we won't collapse values as quickly on flush as we can. The collapse would probably happen when a compaction happens, possibly only when a major compaction happens. But previous unaggregated values may have been written to disk anyway, so not sure how much of an issue this really is. 3) keep a list of the last x app_ids (aggregation compaction dimension values) on the aggregated flow-level data. What we would then do in the aggregator is to go through all the values as we currently do. We'd collapse all the values to keep only the latest per flow. Before we sum an item for the flow, we'd compare if the app_id was in the list of most recent x (10) apps that were completed and collapsed. Pro is that with a lower app completion rate in a flow, we'd be guarded against stale writes for longer than a fixed time period. We'd still limit the size of extra storage in tags to a list of x (10?) items. Downside is that if apps complete in very rapid succession, we would potentially be protected from stale writes from an app for a shorter period of time. Given that there is a correlation between an app completion and its previous run, this may not be a huge factor. It's not like random previous app attempts are launched. This is really to cover the case when a new app attempt is launched, but the previous writer had some buffered writes that somehow still got through. I'm sort of tempted towards 2, since that is the most similar to the existing TTL functionality, and probably the easiest to code and understand. Simply compact only after a certain time period has passed. > Populate flow run data in the flow_run & flow activity tables > - > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3901-YARN-2928.1.patch, > YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, > YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch > > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4086) Allow Aggregated Log readers to handle HAR files
[ https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735947#comment-14735947 ] Hadoop QA commented on YARN-4086: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 51s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 7s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 24s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 31s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 55s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 2m 2s | Tests passed in hadoop-yarn-common. | | | | 51m 4s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754773/YARN-4086.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d9c1fab | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/9048/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9048/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9048/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9048/console | This message was automatically generated. > Allow Aggregated Log readers to handle HAR files > > > Key: YARN-4086 > URL: https://issues.apache.org/jira/browse/YARN-4086 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-4086.001.patch, YARN-4086.002.patch > > > This is for the YARN changes for MAPREDUCE-6415. It allows the yarn CLI and > web UIs to read aggregated logs from HAR files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.
[ https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-4133: Attachment: YARN-4133.000.patch > Containers to be preempted leaks in FairScheduler preemption logic. > --- > > Key: YARN-4133 > URL: https://issues.apache.org/jira/browse/YARN-4133 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4133.000.patch > > > Containers to be preempted leaks in FairScheduler preemption logic. It may > cause missing preemption due to containers in {{warnedContainers}} wrongly > removed. The problem is in {{preemptResources}}: > There are two issues which can cause containers wrongly removed from > {{warnedContainers}}: > Firstly missing the container state {{RMContainerState.ACQUIRED}} in the > condition check: > {code} > (container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) > {code} > Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we > shouldn't remove container from {{warnedContainers}}, We should only remove > container from {{warnedContainers}}, if container is not in state > {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and > {{RMContainerState.ACQUIRED}}. > {code} > if ((container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) && > isResourceGreaterThanNone(toPreempt)) { > warnOrKillContainer(container); > Resources.subtractFrom(toPreempt, > container.getContainer().getResource()); > } else { > warnedIter.remove(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.
[ https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735959#comment-14735959 ] Xianyin Xin commented on YARN-4133: --- Hi [~zxu], it seems the current preemption logic has many problems. I just updated one in [https://issues.apache.org/jira/browse/YARN-4120?focusedCommentId=14735952=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14735952]. I think a logic refactor is need, what do you think? > Containers to be preempted leaks in FairScheduler preemption logic. > --- > > Key: YARN-4133 > URL: https://issues.apache.org/jira/browse/YARN-4133 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4133.000.patch > > > Containers to be preempted leaks in FairScheduler preemption logic. It may > cause missing preemption due to containers in {{warnedContainers}} wrongly > removed. The problem is in {{preemptResources}}: > There are two issues which can cause containers wrongly removed from > {{warnedContainers}}: > Firstly missing the container state {{RMContainerState.ACQUIRED}} in the > condition check: > {code} > (container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) > {code} > Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we > shouldn't remove container from {{warnedContainers}}, We should only remove > container from {{warnedContainers}}, if container is not in state > {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and > {{RMContainerState.ACQUIRED}}. > {code} > if ((container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) && > isResourceGreaterThanNone(toPreempt)) { > warnOrKillContainer(container); > Resources.subtractFrom(toPreempt, > container.getContainer().getResource()); > } else { > warnedIter.remove(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.
[ https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-4133: Description: Containers to be preempted leaks in FairScheduler preemption logic. It may cause missing preemption due to containers in {{warnedContainers}} wrongly removed. The problem is in {{preemptResources}}: There are two issues which can cause containers wrongly removed from {{warnedContainers}}: Firstly missing the container state {{RMContainerState.ACQUIRED}} in the condition check: {code} (container.getState() == RMContainerState.RUNNING || container.getState() == RMContainerState.ALLOCATED) {code} Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we shouldn't remove container from {{warnedContainers}}. We should only remove container from {{warnedContainers}}, if container is not in state {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and {{RMContainerState.ACQUIRED}}. {code} if ((container.getState() == RMContainerState.RUNNING || container.getState() == RMContainerState.ALLOCATED) && isResourceGreaterThanNone(toPreempt)) { warnOrKillContainer(container); Resources.subtractFrom(toPreempt, container.getContainer().getResource()); } else { warnedIter.remove(); } {code} Also once the containers in {{warnedContainers}} are wrongly removed, it will never be preempted. Because these containers are already in {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't return the containers in {{FSAppAttempt#preemptionMap}}. {code} public RMContainer preemptContainer() { if (LOG.isDebugEnabled()) { LOG.debug("App " + getName() + " is going to preempt a running " + "container"); } RMContainer toBePreempted = null; for (RMContainer container : getLiveContainers()) { if (!getPreemptionContainers().contains(container) && (toBePreempted == null || comparator.compare(toBePreempted, container) > 0)) { toBePreempted = container; } } return toBePreempted; } {code} was: Containers to be preempted leaks in FairScheduler preemption logic. It may cause missing preemption due to containers in {{warnedContainers}} wrongly removed. The problem is in {{preemptResources}}: There are two issues which can cause containers wrongly removed from {{warnedContainers}}: Firstly missing the container state {{RMContainerState.ACQUIRED}} in the condition check: {code} (container.getState() == RMContainerState.RUNNING || container.getState() == RMContainerState.ALLOCATED) {code} Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we shouldn't remove container from {{warnedContainers}}, We should only remove container from {{warnedContainers}}, if container is not in state {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and {{RMContainerState.ACQUIRED}}. {code} if ((container.getState() == RMContainerState.RUNNING || container.getState() == RMContainerState.ALLOCATED) && isResourceGreaterThanNone(toPreempt)) { warnOrKillContainer(container); Resources.subtractFrom(toPreempt, container.getContainer().getResource()); } else { warnedIter.remove(); } {code} > Containers to be preempted leaks in FairScheduler preemption logic. > --- > > Key: YARN-4133 > URL: https://issues.apache.org/jira/browse/YARN-4133 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4133.000.patch > > > Containers to be preempted leaks in FairScheduler preemption logic. It may > cause missing preemption due to containers in {{warnedContainers}} wrongly > removed. The problem is in {{preemptResources}}: > There are two issues which can cause containers wrongly removed from > {{warnedContainers}}: > Firstly missing the container state {{RMContainerState.ACQUIRED}} in the > condition check: > {code} > (container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) > {code} > Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we > shouldn't remove container from {{warnedContainers}}. We should only remove > container from {{warnedContainers}}, if container is not in state > {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and > {{RMContainerState.ACQUIRED}}. > {code} > if ((container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) && > isResourceGreaterThanNone(toPreempt)) { >
[jira] [Updated] (YARN-4131) Add API and CLI to kill container on given containerId
[ https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4131: - Attachment: YARN-4131-v1.patch Update patch with following updates: 1. Add ContainerKilledType in KillContainerRequest to indicate container will be killed as preempted or expired (failed). 2. Add async call in YarnClient per Steve's above comments 3. Add more unit tests with fixing build failures. > Add API and CLI to kill container on given containerId > -- > > Key: YARN-4131 > URL: https://issues.apache.org/jira/browse/YARN-4131 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, > YARN-4131-v1.patch > > > Per YARN-3337, we need a handy tools to kill container in some scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4106: --- Attachment: 0006-YARN-4106.patch > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736025#comment-14736025 ] Bibin A Chundatt commented on YARN-4106: Hi [~leftnoteasy] Thnks for comments. Updates patch uploaded > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.
[ https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736052#comment-14736052 ] Hadoop QA commented on YARN-4133: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 41s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 25s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 54m 10s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 92m 6s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754780/YARN-4133.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d9c1fab | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9049/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9049/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9049/console | This message was automatically generated. > Containers to be preempted leaks in FairScheduler preemption logic. > --- > > Key: YARN-4133 > URL: https://issues.apache.org/jira/browse/YARN-4133 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4133.000.patch > > > Containers to be preempted leaks in FairScheduler preemption logic. It may > cause missing preemption due to containers in {{warnedContainers}} wrongly > removed. The problem is in {{preemptResources}}: > There are two issues which can cause containers wrongly removed from > {{warnedContainers}}: > Firstly missing the container state {{RMContainerState.ACQUIRED}} in the > condition check: > {code} > (container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) > {code} > Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we > shouldn't remove container from {{warnedContainers}}. We should only remove > container from {{warnedContainers}}, if container is not in state > {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and > {{RMContainerState.ACQUIRED}}. > {code} > if ((container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) && > isResourceGreaterThanNone(toPreempt)) { > warnOrKillContainer(container); > Resources.subtractFrom(toPreempt, > container.getContainer().getResource()); > } else { > warnedIter.remove(); > } > {code} > Also once the containers in {{warnedContainers}} are wrongly removed, it will > never be preempted. Because these containers are already in > {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't > return the containers in {{FSAppAttempt#preemptionMap}}. > {code} > public RMContainer preemptContainer() { > if (LOG.isDebugEnabled()) { > LOG.debug("App " + getName() + " is going to preempt a running " + > "container"); > } > RMContainer toBePreempted = null; > for (RMContainer container : getLiveContainers()) { > if
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736112#comment-14736112 ] Hadoop QA commented on YARN-4106: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 4s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 36s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 15s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 7m 36s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 46m 34s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754800/0006-YARN-4106.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a153b96 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9051/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9051/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9051/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9051/console | This message was automatically generated. > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4134) FairScheduler preemption stops at queue level that all child queues are not over their fairshare
Xianyin Xin created YARN-4134: - Summary: FairScheduler preemption stops at queue level that all child queues are not over their fairshare Key: YARN-4134 URL: https://issues.apache.org/jira/browse/YARN-4134 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Xianyin Xin Now FairScheudler uses a choose-a-candidate method to select a container from leaf queues that to be preempted, in {{FSParentQueue.preemptContainer()}}, {code} readLock.lock(); try { for (FSQueue queue : childQueues) { if (candidateQueue == null || comparator.compare(queue, candidateQueue) > 0) { candidateQueue = queue; } } } finally { readLock.unlock(); } // Let the selected queue choose which of its container to preempt if (candidateQueue != null) { toBePreempted = candidateQueue.preemptContainer(); } {code} a candidate child queue is selected. However, if the queue's usage isn't over it's fairshare, preemption will not happen: {code} if (!preemptContainerPreCheck()) { return toBePreempted; } {code} A scenario: {code} root /\ queue1 queue2 /\ queue1.3, ( queue1.4 ) {code} suppose there're 8 containers, and queues at any level have the same weight. queue1.3 takes 4 and queue2 takes 4, so both queue1 and queue2 are at their fairshare. Now we submit an app in queue1.4 with 4 containers needs, it should preempt 2 from queue1.3, but the candidate-containers selection procedure will stop at level that all of the child queues are not over their fairshare, and none of the containers will be preempted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)