[jira] [Commented] (YARN-3028) Better syntax for replaceLabelsOnNode in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295184#comment-14295184 ] Hudson commented on YARN-3028: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #84 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/84/]) YARN-3028. Better syntax for replaceLabelsOnNode in RMAdmin CLI. Contributed by Rohith Sharmaks (wangda: rev fd93e5387b554a78413bc0f14b729e58fea604ea) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java Better syntax for replaceLabelsOnNode in RMAdmin CLI Key: YARN-3028 URL: https://issues.apache.org/jira/browse/YARN-3028 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Jian He Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-3028.patch, 0002-YARN-3028.patch, 0003-YARN-3028.patch The command to replace label now is such: {code} yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2] {code} Instead of {code} node1:port,label1,label2 {code} I think it's better to say {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295188#comment-14295188 ] Hudson commented on YARN-3011: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2019 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2019/]) YARN-3011. Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3011.001.patch, YARN-3011.002.patch, YARN-3011.003.patch, YARN-3011.004.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295190#comment-14295190 ] Hudson commented on YARN-3086: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2019 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2019/]) YARN-3086. Make NodeManager memory configurable in MiniYARNCluster. Contributed by Robert Metzger. (ozawa: rev f56da3ce040b16582ce8153df0d7cea00becd843) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Robert Metzger Assignee: Robert Metzger Priority: Minor Fix For: 2.7.0 Attachments: YARN-3086-2.patch, YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2932) Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295191#comment-14295191 ] Hudson commented on YARN-2932: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2019 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2019/]) YARN-2932. Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging. (Eric Payne via wangda) (wangda: rev 18741adf97f4fda5f8743318b59c440928e51297) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging -- Key: YARN-2932 URL: https://issues.apache.org/jira/browse/YARN-2932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.7.0 Reporter: Eric Payne Assignee: Eric Payne Fix For: 2.7.0 Attachments: Screenshot.Queue.Preemption.Disabled.jpg, YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, YARN-2932.v4.txt, YARN-2932.v5.txt, YARN-2932.v6.txt, YARN-2932.v7.txt, YARN-2932.v8.txt YARN-2056 enables the ability to turn preemption on or off on a per-queue level. This JIRA will provide the preemption status for each queue in the {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2897) CrossOriginFilter needs more log statements
[ https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295174#comment-14295174 ] Hudson commented on YARN-2897: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #84 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/84/]) YARN-2897. CrossOriginFilter needs more log statements (Mit Desai via jeagles) (jeagles: rev a8ad1e8089e4bf5854085d2d38d1c0133b5a41bc) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * hadoop-yarn-project/CHANGES.txt CrossOriginFilter needs more log statements --- Key: YARN-2897 URL: https://issues.apache.org/jira/browse/YARN-2897 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 2.7.0 Attachments: YARN-2897.patch, YARN-2897.patch, YARN-2897.patch CrossOriginFilter does not log as much to make debugging easier -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2897) CrossOriginFilter needs more log statements
[ https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295187#comment-14295187 ] Hudson commented on YARN-2897: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2019 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2019/]) YARN-2897. CrossOriginFilter needs more log statements (Mit Desai via jeagles) (jeagles: rev a8ad1e8089e4bf5854085d2d38d1c0133b5a41bc) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * hadoop-yarn-project/CHANGES.txt CrossOriginFilter needs more log statements --- Key: YARN-2897 URL: https://issues.apache.org/jira/browse/YARN-2897 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 2.7.0 Attachments: YARN-2897.patch, YARN-2897.patch, YARN-2897.patch CrossOriginFilter does not log as much to make debugging easier -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295175#comment-14295175 ] Hudson commented on YARN-3011: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #84 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/84/]) YARN-3011. Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3011.001.patch, YARN-3011.002.patch, YARN-3011.003.patch, YARN-3011.004.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295171#comment-14295171 ] Jason Lowe commented on YARN-2005: -- bq. App name is the first point came in to my thoughts. The problem with app name in the workflow spamming case is that many workflows I've seen use a different app name each time they submit, since the app name often includes some timestamp indicating which data window it's consuming/producing. If the workflow is retrying the same failed apps then the app name may not be changing, but if it's plowing ahead submitting other jobs then it very likely is changing. bq. If an app from user1 with name job2 fails on node1, it is very much appropriate to try its second attempt in a different node. Totally agree. I think it's worthwhile to consider implementing a relatively simple app-specific blacklisting logic to avoid this fairly common scenario. We can then follow that up with a much more sophisticated blacklisting algortihm with fancy weighting with time decays, etc., but the biggest problem we're seeing probably doesn't need anything that fancy to solve 80% of the cases we see. bq. I feel i could jot down few points and share as a doc for same Sounds good, feel free to post one. Blacklisting support for scheduling AMs --- Key: YARN-2005 URL: https://issues.apache.org/jira/browse/YARN-2005 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe It would be nice if the RM supported blacklisting a node for an AM launch after the same node fails a configurable number of AM attempts. This would be similar to the blacklisting support for scheduling task attempts in the MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3028) Better syntax for replaceLabelsOnNode in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295197#comment-14295197 ] Hudson commented on YARN-3028: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2019 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2019/]) YARN-3028. Better syntax for replaceLabelsOnNode in RMAdmin CLI. Contributed by Rohith Sharmaks (wangda: rev fd93e5387b554a78413bc0f14b729e58fea604ea) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java * hadoop-yarn-project/CHANGES.txt Better syntax for replaceLabelsOnNode in RMAdmin CLI Key: YARN-3028 URL: https://issues.apache.org/jira/browse/YARN-3028 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Jian He Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-3028.patch, 0002-YARN-3028.patch, 0003-YARN-3028.patch The command to replace label now is such: {code} yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2] {code} Instead of {code} node1:port,label1,label2 {code} I think it's better to say {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295177#comment-14295177 ] Hudson commented on YARN-3086: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #84 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/84/]) YARN-3086. Make NodeManager memory configurable in MiniYARNCluster. Contributed by Robert Metzger. (ozawa: rev f56da3ce040b16582ce8153df0d7cea00becd843) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Robert Metzger Assignee: Robert Metzger Priority: Minor Fix For: 2.7.0 Attachments: YARN-3086-2.patch, YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3098) Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues
[ https://issues.apache.org/jira/browse/YARN-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294961#comment-14294961 ] Sunil G commented on YARN-3098: --- Yes [~leftnoteasy] I understood you point. Sometimes we can take the call to keep a clean interface in top level and I also understood your intention As for the second point, ResourceUsage and QueueCapacities are two new classes. and It have its own separate locks now. Earlier this data is protected with the parent lock alone. SO now lock order is LeafQueue - ResourceUsage. I was worried a scenario where LeafQueue and ParentQueue is invoking these 2 new classes in opposite order, as per my review the locking order is correct. But when future additions happens in LeafQueue/ParentQueue/FiCaScehdulerApp etc, keeping correct lock order is at most priority. On that note, I was thinking that these 2 new locks should not complicate existing locks. This was I meant earlier. Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues - Key: YARN-3098 URL: https://issues.apache.org/jira/browse/YARN-3098 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3098.1.patch, YARN-3098.2.patch, YARN-3098.3.patch, YARN-3098.4.patch Similar to YARN-3092, after YARN-796, now queues (ParentQueue and LeafQueue) need to track capacities-label (e.g. absolute-capacity, maximum-capacity, absolute-capacity, absolute-maximum-capacity, etc.). It's better to have a class to encapsulate these capacities to make both better maintainability/readability and fine-grained locking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2875) Bump SLF4J to 1.7.7 from 1.7.5
[ https://issues.apache.org/jira/browse/YARN-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294932#comment-14294932 ] Tim Robertson commented on YARN-2875: - Done - bumped to 1.7.10 which the release notes suggest should be fine. Bump SLF4J to 1.7.7 from 1.7.5 --- Key: YARN-2875 URL: https://issues.apache.org/jira/browse/YARN-2875 Project: Hadoop YARN Issue Type: Bug Reporter: Tim Robertson Priority: Minor hadoop-yarn-common [uses log4j directly|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml#L167] and when trying to redirect that through an SLF4J bridge version 1.7.5 has issues, due to use of AppenderSkeleton which is missing in log4j-over-slf4j version 1.7.5. This is documented on the [1.7.6 release notes|http://www.slf4j.org/news.html] but 1.7.7 should be suitable. This is applicable to all the projects using Hadoop motherpom, but Yarn appears to be bringing Log4J in, rather than coding to the SLF4J API. The issue shows in the logs as follows in Yarn MR apps, which is painful to diagnose. {code} WARN [2014-11-18 09:58:06,390+0100] [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Caught exception in callback postStart java.lang.reflect.InvocationTargetException: null at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_71] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_71] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_71] at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71] at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) ~[job.jar:0.22-SNAPSHOT] at com.sun.proxy.$Proxy2.postStart(Unknown Source) [na:na] at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:157) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1036) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1478) [job.jar:0.22-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_71] at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1474) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407) [job.jar:0.22-SNAPSHOT] Caused by: java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) ~[na:1.7.0_71] at java.lang.ClassLoader.defineClass(ClassLoader.java:800) ~[na:1.7.0_71] at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[na:1.7.0_71] at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) ~[na:1.7.0_71] at java.net.URLClassLoader.access$100(URLClassLoader.java:71) ~[na:1.7.0_71] at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ~[na:1.7.0_71] at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_71] at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_71] at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[na:1.7.0_71] at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_71] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[na:1.7.0_71] at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_71] at org.apache.hadoop.metrics2.source.JvmMetrics.getEventCounters(JvmMetrics.java:183) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.source.JvmMetrics.getMetrics(JvmMetrics.java:100) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295033#comment-14295033 ] Hudson commented on YARN-3086: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #87 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/87/]) YARN-3086. Make NodeManager memory configurable in MiniYARNCluster. Contributed by Robert Metzger. (ozawa: rev f56da3ce040b16582ce8153df0d7cea00becd843) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Robert Metzger Assignee: Robert Metzger Priority: Minor Fix For: 2.7.0 Attachments: YARN-3086-2.patch, YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2897) CrossOriginFilter needs more log statements
[ https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295038#comment-14295038 ] Hudson commented on YARN-2897: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #87 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/87/]) YARN-2897. CrossOriginFilter needs more log statements (Mit Desai via jeagles) (jeagles: rev a8ad1e8089e4bf5854085d2d38d1c0133b5a41bc) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * hadoop-yarn-project/CHANGES.txt CrossOriginFilter needs more log statements --- Key: YARN-2897 URL: https://issues.apache.org/jira/browse/YARN-2897 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 2.7.0 Attachments: YARN-2897.patch, YARN-2897.patch, YARN-2897.patch CrossOriginFilter does not log as much to make debugging easier -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2932) Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295034#comment-14295034 ] Hudson commented on YARN-2932: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #87 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/87/]) YARN-2932. Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging. (Eric Payne via wangda) (wangda: rev 18741adf97f4fda5f8743318b59c440928e51297) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging -- Key: YARN-2932 URL: https://issues.apache.org/jira/browse/YARN-2932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.7.0 Reporter: Eric Payne Assignee: Eric Payne Fix For: 2.7.0 Attachments: Screenshot.Queue.Preemption.Disabled.jpg, YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, YARN-2932.v4.txt, YARN-2932.v5.txt, YARN-2932.v6.txt, YARN-2932.v7.txt, YARN-2932.v8.txt YARN-2056 enables the ability to turn preemption on or off on a per-queue level. This JIRA will provide the preemption status for each queue in the {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295051#comment-14295051 ] Hudson commented on YARN-3086: -- FAILURE: Integrated in Hadoop-Yarn-trunk #821 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/821/]) YARN-3086. Make NodeManager memory configurable in MiniYARNCluster. Contributed by Robert Metzger. (ozawa: rev f56da3ce040b16582ce8153df0d7cea00becd843) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Robert Metzger Assignee: Robert Metzger Priority: Minor Fix For: 2.7.0 Attachments: YARN-3086-2.patch, YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3028) Better syntax for replaceLabelsOnNode in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295063#comment-14295063 ] Hudson commented on YARN-3028: -- FAILURE: Integrated in Hadoop-Yarn-trunk #821 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/821/]) YARN-3028. Better syntax for replaceLabelsOnNode in RMAdmin CLI. Contributed by Rohith Sharmaks (wangda: rev fd93e5387b554a78413bc0f14b729e58fea604ea) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java Better syntax for replaceLabelsOnNode in RMAdmin CLI Key: YARN-3028 URL: https://issues.apache.org/jira/browse/YARN-3028 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Jian He Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-3028.patch, 0002-YARN-3028.patch, 0003-YARN-3028.patch The command to replace label now is such: {code} yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2] {code} Instead of {code} node1:port,label1,label2 {code} I think it's better to say {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295058#comment-14295058 ] Hudson commented on YARN-3011: -- FAILURE: Integrated in Hadoop-Yarn-trunk #821 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/821/]) YARN-3011. Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3011.001.patch, YARN-3011.002.patch, YARN-3011.003.patch, YARN-3011.004.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2932) Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295052#comment-14295052 ] Hudson commented on YARN-2932: -- FAILURE: Integrated in Hadoop-Yarn-trunk #821 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/821/]) YARN-2932. Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging. (Eric Payne via wangda) (wangda: rev 18741adf97f4fda5f8743318b59c440928e51297) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging -- Key: YARN-2932 URL: https://issues.apache.org/jira/browse/YARN-2932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.7.0 Reporter: Eric Payne Assignee: Eric Payne Fix For: 2.7.0 Attachments: Screenshot.Queue.Preemption.Disabled.jpg, YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, YARN-2932.v4.txt, YARN-2932.v5.txt, YARN-2932.v6.txt, YARN-2932.v7.txt, YARN-2932.v8.txt YARN-2056 enables the ability to turn preemption on or off on a per-queue level. This JIRA will provide the preemption status for each queue in the {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3103) AMRMClientImpl does not update AMRM token properly
[ https://issues.apache.org/jira/browse/YARN-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295216#comment-14295216 ] Jason Lowe commented on YARN-3103: -- On second thought, maybe the client doesn't need to know the service name the RM used. The RM is already sending an updated token _that the RM generated_ to the AM. If the AM blindly stuffs it into the credentials _before_ it tries to fixup the token then it will use whatever service name the RM left on the token. As long as that service name matches the one the RM put in originally (and ideally it's not going to collide with any other token) then we know it will clobber the old AMRM token as intended. Then the client can fixup the token service name _after_ it's been stored in the credentials, just like it does during AM startup. So we just need the AM to generate something that will not collide with non-AMRM tokens and also not collide with tokens from other cluster RMs. Cluster ID is tempting, but if the AM is talking to two, non-HA clusters then I'm not sure we know the user bothered to configure the cluster ID. However I think we _have_ to use the cluster ID otherwise two RMs in the same HA-enabled cluster could generate different service names which breaks things. So I think the cluster ID is our best bet, with the caveat that if an AM needs to wield multiple AMRM tokens then all clusters involved need to have unique cluster IDs configured. Thoughts? AMRMClientImpl does not update AMRM token properly -- Key: YARN-3103 URL: https://issues.apache.org/jira/browse/YARN-3103 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker AMRMClientImpl.updateAMRMToken updates the token service _before_ storing it to the credentials, so the token is mapped using the newly updated service rather than the empty service that was used when the RM created the original AMRM token. This leads to two AMRM tokens in the credentials and can still fail if the AMRMTokenSelector picks the wrong one. In addition the AMRMClientImpl grabs the login user rather than the current user when security is enabled, so it's likely the UGI being updated is not the UGI that will be used when reconnecting to the RM. The end result is that AMs can fail with invalid token errors when trying to reconnect to an RM after a new AMRM secret has been activated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2932) Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295218#comment-14295218 ] Eric Payne commented on YARN-2932: -- Thank you for your input and review, [~leftnoteasy] Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging -- Key: YARN-2932 URL: https://issues.apache.org/jira/browse/YARN-2932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.7.0 Reporter: Eric Payne Assignee: Eric Payne Fix For: 2.7.0 Attachments: Screenshot.Queue.Preemption.Disabled.jpg, YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, YARN-2932.v4.txt, YARN-2932.v5.txt, YARN-2932.v6.txt, YARN-2932.v7.txt, YARN-2932.v8.txt YARN-2056 enables the ability to turn preemption on or off on a per-queue level. This JIRA will provide the preemption status for each queue in the {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295241#comment-14295241 ] Hudson commented on YARN-3086: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #88 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/88/]) YARN-3086. Make NodeManager memory configurable in MiniYARNCluster. Contributed by Robert Metzger. (ozawa: rev f56da3ce040b16582ce8153df0d7cea00becd843) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/CHANGES.txt Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Robert Metzger Assignee: Robert Metzger Priority: Minor Fix For: 2.7.0 Attachments: YARN-3086-2.patch, YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3028) Better syntax for replaceLabelsOnNode in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295253#comment-14295253 ] Hudson commented on YARN-3028: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #88 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/88/]) YARN-3028. Better syntax for replaceLabelsOnNode in RMAdmin CLI. Contributed by Rohith Sharmaks (wangda: rev fd93e5387b554a78413bc0f14b729e58fea604ea) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java Better syntax for replaceLabelsOnNode in RMAdmin CLI Key: YARN-3028 URL: https://issues.apache.org/jira/browse/YARN-3028 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Jian He Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-3028.patch, 0002-YARN-3028.patch, 0003-YARN-3028.patch The command to replace label now is such: {code} yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2] {code} Instead of {code} node1:port,label1,label2 {code} I think it's better to say {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295248#comment-14295248 ] Hudson commented on YARN-3011: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #88 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/88/]) YARN-3011. Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3011.001.patch, YARN-3011.002.patch, YARN-3011.003.patch, YARN-3011.004.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2922) ConcurrentModificationException in CapacityScheduler's LeafQueue
[ https://issues.apache.org/jira/browse/YARN-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2922: - Fix Version/s: 2.7.0 ConcurrentModificationException in CapacityScheduler's LeafQueue Key: YARN-2922 URL: https://issues.apache.org/jira/browse/YARN-2922 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager, scheduler Affects Versions: 2.5.1 Reporter: Jason Tufo Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-2922.patch, 0001-YARN-2922.patch java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.collectSchedulerApplications(LeafQueue.java:1618) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getAppsInQueue(CapacityScheduler.java:1119) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:798) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:234) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery
[ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295291#comment-14295291 ] Jun Gong commented on YARN-3094: Hi [~jianhe], could you please help review it? Thank you. reset timer for liveness monitors after RM recovery --- Key: YARN-3094 URL: https://issues.apache.org/jira/browse/YARN-3094 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3094.2.patch, YARN-3094.patch When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). In our system, we found the recover process took about 3 mins, and all AM time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295453#comment-14295453 ] Varun Saxena commented on YARN-3047: [~zjshen], kindly have a look at the initial patch generated based on the code I had written so far. Store related code is mainly for stubbing. WebServices code as well(copied from current timeline server code). This code will evolve as we finalize the object model. I will add test cases once you are fine with the code structure including class and package names. Moreover, I guess we need to support RPC calls coming from yarn client. This can be taken directly from {{ApplicationHistoryManagerOnTimelineStore}} Let me know if we are on the same page or not. set up ATS reader with basic request serving structure and lifecycle Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3047) set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3047: --- Attachment: YARN-3047.001.patch set up ATS reader with basic request serving structure and lifecycle Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery
[ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295814#comment-14295814 ] Anubhav Dhoot commented on YARN-3094: - Hi [~Jun Gong] Can you please use the ControlledClock for manipulating time instead of sleeps? AbstractLivenessMonitor should take an argument for Clock instead of creating a new SystemClock. That way you can have loadState call ControlledClock#setTime instead of sleep and AbstractLivenessMonitor can read the same time Thanks reset timer for liveness monitors after RM recovery --- Key: YARN-3094 URL: https://issues.apache.org/jira/browse/YARN-3094 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3094.2.patch, YARN-3094.patch When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). In our system, we found the recover process took about 3 mins, and all AM time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3104) RM continues to send new AMRM tokens every heartbeat between rolling and activation
[ https://issues.apache.org/jira/browse/YARN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3104: - Attachment: YARN-3104.001.patch As discussed in YARN-2314 the IPC layer makes it near impossible to close the connection, and there's no support for re-negotiating the authentication of the connection. This patch isn't a total fix, since it doesn't address the issue of re-authenticating the connection using the new token. However it does prevent the RM from constantly generating tokens during the period between rolling and activating the next AMRM key and the corresponding three lines of logging per application per second. RM continues to send new AMRM tokens every heartbeat between rolling and activation --- Key: YARN-3104 URL: https://issues.apache.org/jira/browse/YARN-3104 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Attachments: YARN-3104.001.patch When the RM rolls a new AMRM secret, it conveys this to the AMs when it notices they are still connected with the old key. However neither the RM nor the AM explicitly close the connection or otherwise try to reconnect with the new secret. Therefore the RM keeps thinking the AM doesn't have the new token on every heartbeat and keeps sending new tokens for the period between the key roll and the key activation. Once activated the RM no longer squawks in its logs about needing to generate a new token every heartbeat (i.e.: second) for every app, but the apps can still be using the old token. The token is only checked upon connection to the RM. The apps don't reconnect when sent a new token, and the RM doesn't force them to reconnect by closing the connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3103) AMRMClientImpl does not update AMRM token properly
[ https://issues.apache.org/jira/browse/YARN-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295929#comment-14295929 ] Jian He commented on YARN-3103: --- patch looks good to me. bq. In addition the AMRMClientImpl grabs the login user rather than the current user when security is enabled [~xgong], do you recall why this change is made? maybe it didn't work on secure cluster when using the current ugi? Jason, does the patch work on secure cluster too? bq. That will be problematic with multiple AMRM tokens, since we won't know which token is which. Also, we have the token in-hand from the AllocateResponse call, no need to go hunting for it – or are you thinking of a different scenario? make sense, I missed the part that we already got a handle from AllocateResponse. {{ClientRMProxy#setAMRMTokenService}} now loops the existing tokens and set the token service name. If we support multiple AMRM tokens, this code will break too. Anyway, this can be addressed later. bq. AM startup already fixes the service name of the token, but it does not (and cannot) change the key/alias associated with the token in the credentials. In MRAppMaster#initAndStartAppMaster, will it work if we insert the correct key/alias when adding credentials into appMasterUgi ? bq. If we do above, then we don't need the cluster ID? I meant given we already have a way to uniquely identify the AMRMToken on AM side based on the concatenated RM addresses. we may not need an extra cluster ID to uniquely identify the token. AMRMClientImpl does not update AMRM token properly -- Key: YARN-3103 URL: https://issues.apache.org/jira/browse/YARN-3103 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-3103.001.patch AMRMClientImpl.updateAMRMToken updates the token service _before_ storing it to the credentials, so the token is mapped using the newly updated service rather than the empty service that was used when the RM created the original AMRM token. This leads to two AMRM tokens in the credentials and can still fail if the AMRMTokenSelector picks the wrong one. In addition the AMRMClientImpl grabs the login user rather than the current user when security is enabled, so it's likely the UGI being updated is not the UGI that will be used when reconnecting to the RM. The end result is that AMs can fail with invalid token errors when trying to reconnect to an RM after a new AMRM secret has been activated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295856#comment-14295856 ] Hadoop QA commented on YARN-3099: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695067/YARN-3099.3.patch against trunk revision 9850e15. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6441//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6441//console This message is automatically generated. Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. --- Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3099.1.patch, YARN-3099.2.patch, YARN-3099.3.patch After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. To make each individual patch smaller to get easier review, this patch is targeting to make used-resources-by-label in CS Queues are all tracked by ResourceUsage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295924#comment-14295924 ] Siqi Li commented on YARN-3101: --- It looks like you are right. The if condition somehow got reversed, so that reservations will be always rejected FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296046#comment-14296046 ] Siqi Li commented on YARN-3101: --- But I am not quite sure about that if reservedAppSchedulable.getResource(reservedPriority) can give us the correct reserved resource. Maybe considering node.getReservedContainer().getReservedResource() FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3108) ApplicationHistoryServer doesn't process -D arguments
chang li created YARN-3108: -- Summary: ApplicationHistoryServer doesn't process -D arguments Key: YARN-3108 URL: https://issues.apache.org/jira/browse/YARN-3108 Project: Hadoop YARN Issue Type: Improvement Reporter: chang li ApplicationHistoryServer doesn't process -D arguments when created, it's nice to have it to do that -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3108) ApplicationHistoryServer doesn't process -D arguments
[ https://issues.apache.org/jira/browse/YARN-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chang li updated YARN-3108: --- Attachment: yarn3108.patch have provided a solution for this ApplicationHistoryServer doesn't process -D arguments - Key: YARN-3108 URL: https://issues.apache.org/jira/browse/YARN-3108 Project: Hadoop YARN Issue Type: Improvement Reporter: chang li Attachments: yarn3108.patch ApplicationHistoryServer doesn't process -D arguments when created, it's nice to have it to do that -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296066#comment-14296066 ] Yongjun Zhang commented on YARN-3021: - Hi [~rkanter] and [~adhoot], Thanks for your comments. Yes, we are talking about HDFS delegation token, this patch provides an option to turn off initialization validation inside RM, and the token verification will happen inside HDFS when distcp job runs. YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3079: Attachment: YARN-3079.004.patch Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch, YARN-3079.004.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296189#comment-14296189 ] Hadoop QA commented on YARN-3100: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695120/YARN-3100.2.patch against trunk revision d244574. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6444//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6444//console This message is automatically generated. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3101: -- Attachment: (was: YARN-3101-Siqi.v1.patch) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3101: -- Attachment: YARN-3101-Siqi.v1.patch FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3104) RM continues to send new AMRM tokens every heartbeat between rolling and activation
[ https://issues.apache.org/jira/browse/YARN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-3104: Assignee: Jason Lowe RM continues to send new AMRM tokens every heartbeat between rolling and activation --- Key: YARN-3104 URL: https://issues.apache.org/jira/browse/YARN-3104 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-3104.001.patch When the RM rolls a new AMRM secret, it conveys this to the AMs when it notices they are still connected with the old key. However neither the RM nor the AM explicitly close the connection or otherwise try to reconnect with the new secret. Therefore the RM keeps thinking the AM doesn't have the new token on every heartbeat and keeps sending new tokens for the period between the key roll and the key activation. Once activated the RM no longer squawks in its logs about needing to generate a new token every heartbeat (i.e.: second) for every app, but the apps can still be using the old token. The token is only checked upon connection to the RM. The apps don't reconnect when sent a new token, and the RM doesn't force them to reconnect by closing the connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295919#comment-14295919 ] Jian He commented on YARN-3100: --- bq. I hope you recognize that from an outside perspective how ridiculous it sounds if YARN's ACL system is pluggable but HDFS's is not. HDFS-6826 is the work to make HDFS authorization pluggable. The current proposal is to only handle YARN specific ACL(admin and queue ACL), not the common service acl (HADOOP-4348) which is being used by HDFS too. hdfs will not get affected at all. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3022) Expose Container resource information from NodeManager for monitoring
[ https://issues.apache.org/jira/browse/YARN-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296030#comment-14296030 ] Robert Kanter commented on YARN-3022: - +1 Expose Container resource information from NodeManager for monitoring - Key: YARN-3022 URL: https://issues.apache.org/jira/browse/YARN-3022 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3022.001.patch, YARN-3022.002.patch, YARN-3022.003.patch Along with exposing resource consumption of each container such as (YARN-2141) its worth exposing the actual resource limit associated with them to get better insight into YARN allocation and consumption -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296041#comment-14296041 ] Robert Kanter commented on YARN-3021: - {quote}I don't see any security holes. This token is only for the application's own use. The validation and renewal that you are turning off via the new parameter should not impact security of YARN or other applications.{quote} I'm not sure that's entirely correct. We're talking about the HDFS delegation token, right? Might it be possible to circumvent token expiration times by telling YARN not to renew the token? I'm not sure when the expiration check is done, so I could be wrong here. YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3100: -- Attachment: YARN-3100.2.patch Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296077#comment-14296077 ] zhihai xu commented on YARN-3079: - Hi [~rkanter], thanks for the review. I found out since we can reuse the updateMaximumAllocation in updateNodeResource, we don't need create a separate function refreshMaximumAllocation. So I removed refreshMaximumAllocation and move all the code in refreshMaximumAllocation to updateMaximumAllocation which is the same as the original code. The change will also address your concern. Please review it to see whether it looks good to you. thanks zhihai Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch, YARN-3079.004.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296113#comment-14296113 ] Robert Kanter commented on YARN-3021: - Ok, then I think it should probably fine; though I'd like to let some others take a look. Also, is there a reason why the patch moves {{context.setResourcere(resource)}}? YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3108) ApplicationHistoryServer doesn't process -D arguments
[ https://issues.apache.org/jira/browse/YARN-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296122#comment-14296122 ] Hadoop QA commented on YARN-3108: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695122/yarn3108.patch against trunk revision d244574. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6445//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6445//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6445//console This message is automatically generated. ApplicationHistoryServer doesn't process -D arguments - Key: YARN-3108 URL: https://issues.apache.org/jira/browse/YARN-3108 Project: Hadoop YARN Issue Type: Improvement Reporter: chang li Assignee: chang li Attachments: yarn3108.patch ApplicationHistoryServer doesn't process -D arguments when created, it's nice to have it to do that -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296147#comment-14296147 ] Yongjun Zhang commented on YARN-3021: - Hi [~rkanter], the reason I moved {{context.setResourcere(resource)}} is because it seems to be misplaced: all other parameters appear to be set in the order they are passed to the method, except this one. Thanks. YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296153#comment-14296153 ] Jian He commented on YARN-3099: --- - usage.getUsed() should be usage.getUsed(label) {code} LOG.debug(getQueueName() + Check assign to queue, label= + label + usedResources: + usage.getUsed() {code} - maybe usage-queueUsage Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. --- Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3099.1.patch, YARN-3099.2.patch, YARN-3099.3.patch After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. To make each individual patch smaller to get easier review, this patch is targeting to make used-resources-by-label in CS Queues are all tracked by ResourceUsage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3101: -- Attachment: YARN-3101-Siqi.v1.patch I feel like adding reserved memory into consideration is not necessary. Since, fairscheduler does not consider AM size when queue is full. I have attached a patch to this jira FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296174#comment-14296174 ] Hadoop QA commented on YARN-3101: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695141/YARN-3101-Siqi.v1.patch against trunk revision 5a0051f. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6447//console This message is automatically generated. FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3099: - Attachment: YARN-3099.4.patch Thanks for your comment, [~jianhe]. Updated patch addressed your comments. Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. --- Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3099.1.patch, YARN-3099.2.patch, YARN-3099.3.patch, YARN-3099.4.patch After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. To make each individual patch smaller to get easier review, this patch is targeting to make used-resources-by-label in CS Queues are all tracked by ResourceUsage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3103) AMRMClientImpl does not update AMRM token properly
[ https://issues.apache.org/jira/browse/YARN-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296056#comment-14296056 ] Hudson commented on YARN-3103: -- FAILURE: Integrated in Hadoop-trunk-Commit #6953 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6953/]) YARN-3103. AMRMClientImpl does not update AMRM token properly. Contributed by Jason Lowe (jianhe: rev 6d2bdbd7dab179dfb4f19bb41809e97f1db88c6b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java AMRMClientImpl does not update AMRM token properly -- Key: YARN-3103 URL: https://issues.apache.org/jira/browse/YARN-3103 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3103.001.patch AMRMClientImpl.updateAMRMToken updates the token service _before_ storing it to the credentials, so the token is mapped using the newly updated service rather than the empty service that was used when the RM created the original AMRM token. This leads to two AMRM tokens in the credentials and can still fail if the AMRMTokenSelector picks the wrong one. In addition the AMRMClientImpl grabs the login user rather than the current user when security is enabled, so it's likely the UGI being updated is not the UGI that will be used when reconnecting to the RM. The end result is that AMs can fail with invalid token errors when trying to reconnect to an RM after a new AMRM secret has been activated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296059#comment-14296059 ] Anubhav Dhoot commented on YARN-3021: - We are talking about turning off renewal of tokens and some initialization validation checks done inside RM on behalf of the user. This should not be turning off token verification inside HDFS. That should still happen YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296108#comment-14296108 ] Robert Kanter commented on YARN-3079: - Sounds good. +1 Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch, YARN-3079.004.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296197#comment-14296197 ] Hadoop QA commented on YARN-3079: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695123/YARN-3079.004.patch against trunk revision d244574. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6446//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6446//console This message is automatically generated. Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch, YARN-3079.004.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295278#comment-14295278 ] Hudson commented on YARN-3011: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2038 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2038/]) YARN-3011. Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3011.001.patch, YARN-3011.002.patch, YARN-3011.003.patch, YARN-3011.004.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2897) CrossOriginFilter needs more log statements
[ https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295275#comment-14295275 ] Hudson commented on YARN-2897: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2038 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2038/]) YARN-2897. CrossOriginFilter needs more log statements (Mit Desai via jeagles) (jeagles: rev a8ad1e8089e4bf5854085d2d38d1c0133b5a41bc) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java CrossOriginFilter needs more log statements --- Key: YARN-2897 URL: https://issues.apache.org/jira/browse/YARN-2897 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 2.7.0 Attachments: YARN-2897.patch, YARN-2897.patch, YARN-2897.patch CrossOriginFilter does not log as much to make debugging easier -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2932) Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295272#comment-14295272 ] Hudson commented on YARN-2932: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2038 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2038/]) YARN-2932. Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging. (Eric Payne via wangda) (wangda: rev 18741adf97f4fda5f8743318b59c440928e51297) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging -- Key: YARN-2932 URL: https://issues.apache.org/jira/browse/YARN-2932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.7.0 Reporter: Eric Payne Assignee: Eric Payne Fix For: 2.7.0 Attachments: Screenshot.Queue.Preemption.Disabled.jpg, YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, YARN-2932.v4.txt, YARN-2932.v5.txt, YARN-2932.v6.txt, YARN-2932.v7.txt, YARN-2932.v8.txt YARN-2056 enables the ability to turn preemption on or off on a per-queue level. This JIRA will provide the preemption status for each queue in the {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2932) Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295242#comment-14295242 ] Hudson commented on YARN-2932: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #88 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/88/]) YARN-2932. Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging. (Eric Payne via wangda) (wangda: rev 18741adf97f4fda5f8743318b59c440928e51297) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java Add entry for preemptable status (enabled/disabled) to scheduler web UI and queue initialize/refresh logging -- Key: YARN-2932 URL: https://issues.apache.org/jira/browse/YARN-2932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.7.0 Reporter: Eric Payne Assignee: Eric Payne Fix For: 2.7.0 Attachments: Screenshot.Queue.Preemption.Disabled.jpg, YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, YARN-2932.v4.txt, YARN-2932.v5.txt, YARN-2932.v6.txt, YARN-2932.v7.txt, YARN-2932.v8.txt YARN-2056 enables the ability to turn preemption on or off on a per-queue level. This JIRA will provide the preemption status for each queue in the {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2897) CrossOriginFilter needs more log statements
[ https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295245#comment-14295245 ] Hudson commented on YARN-2897: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #88 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/88/]) YARN-2897. CrossOriginFilter needs more log statements (Mit Desai via jeagles) (jeagles: rev a8ad1e8089e4bf5854085d2d38d1c0133b5a41bc) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * hadoop-yarn-project/CHANGES.txt CrossOriginFilter needs more log statements --- Key: YARN-2897 URL: https://issues.apache.org/jira/browse/YARN-2897 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 2.7.0 Attachments: YARN-2897.patch, YARN-2897.patch, YARN-2897.patch CrossOriginFilter does not log as much to make debugging easier -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3028) Better syntax for replaceLabelsOnNode in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295283#comment-14295283 ] Hudson commented on YARN-3028: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2038 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2038/]) YARN-3028. Better syntax for replaceLabelsOnNode in RMAdmin CLI. Contributed by Rohith Sharmaks (wangda: rev fd93e5387b554a78413bc0f14b729e58fea604ea) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java * hadoop-yarn-project/CHANGES.txt Better syntax for replaceLabelsOnNode in RMAdmin CLI Key: YARN-3028 URL: https://issues.apache.org/jira/browse/YARN-3028 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Jian He Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-3028.patch, 0002-YARN-3028.patch, 0003-YARN-3028.patch The command to replace label now is such: {code} yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2] {code} Instead of {code} node1:port,label1,label2 {code} I think it's better to say {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3103) AMRMClientImpl does not update AMRM token properly
[ https://issues.apache.org/jira/browse/YARN-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295952#comment-14295952 ] Jason Lowe commented on YARN-3103: -- bq. Jason, does the patch work on secure cluster too? I didn't test this particular patch on a secure cluster, however I did test the same kind of change in MAPREDUCE-6230 on a secure cluster. That patch did not work if I let it update the login user instead of the current user. The current user is what the RPC layer is going to use (indeed, most of the purpose doAs exists is to specify which UGI the RPC layer will use), so I have no idea why we would try to circumvent that and update some other UGI. bq. In MRAppMaster#initAndStartAppMaster, will it work if we insert the correct key/alias when adding credentials into appMasterUgi ? Yes, I suppose there is one way to change what key/alias is associated with a token in Credentials and that's to create a complete copy of the credentials and specify the new alias when adding the token to the copy. Since innitAndStartAppMaster is doing that, it could explicitly specify the key/alias by manually copying the credentials and special-casing the AMRM token. Seems simpler to just re-use the alias set by the RM if that can work. Or just add a UGI/Credentials API to update the service name of a token that also updates its key/alias in the credentials map so subsequent token stores will overwrite that token. Or maybe a bit cleaner to have an API that explicitly says to replace one token with another, since the client can hunt down the old AMRM token using its updated service name. AMRMClientImpl does not update AMRM token properly -- Key: YARN-3103 URL: https://issues.apache.org/jira/browse/YARN-3103 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-3103.001.patch AMRMClientImpl.updateAMRMToken updates the token service _before_ storing it to the credentials, so the token is mapped using the newly updated service rather than the empty service that was used when the RM created the original AMRM token. This leads to two AMRM tokens in the credentials and can still fail if the AMRMTokenSelector picks the wrong one. In addition the AMRMClientImpl grabs the login user rather than the current user when security is enabled, so it's likely the UGI being updated is not the UGI that will be used when reconnecting to the RM. The end result is that AMs can fail with invalid token errors when trying to reconnect to an RM after a new AMRM secret has been activated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3029) FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere
[ https://issues.apache.org/jira/browse/YARN-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295974#comment-14295974 ] Varun Saxena commented on YARN-3029: [~ozawa], added a test case. Kindly review. FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere - Key: YARN-3029 URL: https://issues.apache.org/jira/browse/YARN-3029 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Attachments: YARN-3029.001.patch, YARN-3029.002.patch {{FSDownload.unpack()}} lower-cases filenames in the local locale before looking at extensions for, tar, zip, .. {code} String lowerDst = dst.getName().toLowerCase(); {code} it MUST use LOCALE_EN for the locale, else a file .ZIP won't be recognised as a zipfile in a turkish locale cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296005#comment-14296005 ] Wangda Tan commented on YARN-3079: -- [~rkanter], wanna take a look? I plan to commit this today or tomorrow. Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3103) AMRMClientImpl does not update AMRM token properly
[ https://issues.apache.org/jira/browse/YARN-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296003#comment-14296003 ] Jian He commented on YARN-3103: --- bq. That patch did not work if I let it update the login user instead of the current user. I see. The previous version patch in YARN-3103 actually use the current user, and then changed to use login user in secure mode for some reason. +1 to the patch. [~xgong], do you have any comments ? AMRMClientImpl does not update AMRM token properly -- Key: YARN-3103 URL: https://issues.apache.org/jira/browse/YARN-3103 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-3103.001.patch AMRMClientImpl.updateAMRMToken updates the token service _before_ storing it to the credentials, so the token is mapped using the newly updated service rather than the empty service that was used when the RM created the original AMRM token. This leads to two AMRM tokens in the credentials and can still fail if the AMRMTokenSelector picks the wrong one. In addition the AMRMClientImpl grabs the login user rather than the current user when security is enabled, so it's likely the UGI being updated is not the UGI that will be used when reconnecting to the RM. The end result is that AMs can fail with invalid token errors when trying to reconnect to an RM after a new AMRM secret has been activated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3103) AMRMClientImpl does not update AMRM token properly
[ https://issues.apache.org/jira/browse/YARN-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296004#comment-14296004 ] Jian He commented on YARN-3103: --- bq. The previous version patch in YARN-3103 sorry, I meant YARN-2212 AMRMClientImpl does not update AMRM token properly -- Key: YARN-3103 URL: https://issues.apache.org/jira/browse/YARN-3103 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-3103.001.patch AMRMClientImpl.updateAMRMToken updates the token service _before_ storing it to the credentials, so the token is mapped using the newly updated service rather than the empty service that was used when the RM created the original AMRM token. This leads to two AMRM tokens in the credentials and can still fail if the AMRMTokenSelector picks the wrong one. In addition the AMRMClientImpl grabs the login user rather than the current user when security is enabled, so it's likely the UGI being updated is not the UGI that will be used when reconnecting to the RM. The end result is that AMs can fail with invalid token errors when trying to reconnect to an RM after a new AMRM secret has been activated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3103) AMRMClientImpl does not update AMRM token properly
[ https://issues.apache.org/jira/browse/YARN-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296008#comment-14296008 ] Xuan Gong commented on YARN-3103: - I do not have any other comments. +1 for the patch. bq. The previous version patch in YARN-3103 actually use the current user, and then changed to use login user in secure mode for some reason. I do not remember why I made this change in YARN-2212. AMRMClientImpl does not update AMRM token properly -- Key: YARN-3103 URL: https://issues.apache.org/jira/browse/YARN-3103 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-3103.001.patch AMRMClientImpl.updateAMRMToken updates the token service _before_ storing it to the credentials, so the token is mapped using the newly updated service rather than the empty service that was used when the RM created the original AMRM token. This leads to two AMRM tokens in the credentials and can still fail if the AMRMTokenSelector picks the wrong one. In addition the AMRMClientImpl grabs the login user rather than the current user when security is enabled, so it's likely the UGI being updated is not the UGI that will be used when reconnecting to the RM. The end result is that AMs can fail with invalid token errors when trying to reconnect to an RM after a new AMRM secret has been activated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3107) Update TestYarnConfigurationFields to flag missing properties in yarn-default.xml with an error
Ray Chiang created YARN-3107: Summary: Update TestYarnConfigurationFields to flag missing properties in yarn-default.xml with an error Key: YARN-3107 URL: https://issues.apache.org/jira/browse/YARN-3107 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang TestYarnConfigurationFields currently makes sure each property in yarn-default.xml is documented in one of the YARN configuration Java classes. The reverse check can be turned on once the each YARN property is: A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3104) RM continues to send new AMRM tokens every heartbeat between rolling and activation
[ https://issues.apache.org/jira/browse/YARN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296013#comment-14296013 ] Hadoop QA commented on YARN-3104: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695094/YARN-3104.001.patch against trunk revision caf7298. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6442//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6442//console This message is automatically generated. RM continues to send new AMRM tokens every heartbeat between rolling and activation --- Key: YARN-3104 URL: https://issues.apache.org/jira/browse/YARN-3104 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-3104.001.patch When the RM rolls a new AMRM secret, it conveys this to the AMs when it notices they are still connected with the old key. However neither the RM nor the AM explicitly close the connection or otherwise try to reconnect with the new secret. Therefore the RM keeps thinking the AM doesn't have the new token on every heartbeat and keeps sending new tokens for the period between the key roll and the key activation. Once activated the RM no longer squawks in its logs about needing to generate a new token every heartbeat (i.e.: second) for every app, but the apps can still be using the old token. The token is only checked upon connection to the RM. The apps don't reconnect when sent a new token, and the RM doesn't force them to reconnect by closing the connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3029) FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere
[ https://issues.apache.org/jira/browse/YARN-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3029: --- Attachment: YARN-3029.002.patch FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere - Key: YARN-3029 URL: https://issues.apache.org/jira/browse/YARN-3029 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Attachments: YARN-3029.001.patch, YARN-3029.002.patch {{FSDownload.unpack()}} lower-cases filenames in the local locale before looking at extensions for, tar, zip, .. {code} String lowerDst = dst.getName().toLowerCase(); {code} it MUST use LOCALE_EN for the locale, else a file .ZIP won't be recognised as a zipfile in a turkish locale cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296006#comment-14296006 ] Robert Kanter commented on YARN-3079: - I'm actually looking at it right now :) Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296018#comment-14296018 ] Robert Kanter commented on YARN-3079: - I think we should acquire {{maxAllocWriteLock}} on entering {{refreshMaximumAllocation()}} and release it on exiting. I could easily see someone in a future JIRA calling {{refreshMaximumAllocation()}} without realizing that they need to acquire {{maxAllocWriteLock}} first. {{maxAllocWriteLock}} is reentrant, so there should be no harm is re-acquiring it. +1 after that Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3029) FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere
[ https://issues.apache.org/jira/browse/YARN-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296028#comment-14296028 ] Hadoop QA commented on YARN-3029: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695106/YARN-3029.002.patch against trunk revision caf7298. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6443//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6443//console This message is automatically generated. FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere - Key: YARN-3029 URL: https://issues.apache.org/jira/browse/YARN-3029 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Attachments: YARN-3029.001.patch, YARN-3029.002.patch {{FSDownload.unpack()}} lower-cases filenames in the local locale before looking at extensions for, tar, zip, .. {code} String lowerDst = dst.getName().toLowerCase(); {code} it MUST use LOCALE_EN for the locale, else a file .ZIP won't be recognised as a zipfile in a turkish locale cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296222#comment-14296222 ] Junping Du commented on YARN-41: [~devaraj.k], thanks for updating the patch! [~vinodkv] is on vacation, so I will help to review here. Just a quick glance at your patch (v3), a couple of comments and questions: {code} STATUS_UPDATE, REBOOTING, RECONNECTED, + SHUTDOWN, {code} Looks like we are adding a new event. Given we already have decommission event, so this is for other cases, e.g. shutdown NM daemon through CLI. Isn't it? If so, we should consider the case that NM work preserving is enabled (for rolling upgrade), and these nodes shouldn't be unregister to RM. {code} protected void serviceStop() throws Exception { +// the isStopped check is for avoiding multiple unregistrations. +if (this.registeredWithRM !this.isStopped) { + unRegisterNM(); +} {code} Like I said above, we only need to unregister NM from RM when NM recovery is disabled. We may should put a check here. More comments will come later. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296228#comment-14296228 ] zhihai xu commented on YARN-3079: - Hi [~leftnoteasy], All these tests(TestFSRMStateStore, TestFSRMStateStore and TestRMDelegationTokens) are passed in my local latest build with my patch. So these failed tests are not related to my patch. thanks zhihai Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch, YARN-3079.004.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296238#comment-14296238 ] Hadoop QA commented on YARN-3099: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695151/YARN-3099.4.patch against trunk revision 5a0051f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6448//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6448//console This message is automatically generated. Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. --- Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3099.1.patch, YARN-3099.2.patch, YARN-3099.3.patch, YARN-3099.4.patch After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. To make each individual patch smaller to get easier review, this patch is targeting to make used-resources-by-label in CS Queues are all tracked by ResourceUsage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296263#comment-14296263 ] Chun Chen commented on YARN-3077: - [~ozawa], OK, upload a new patch to update the patch and change the name to be self-explaining. RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Attachments: YARN-3077.2.patch, YARN-3077.3.patch, YARN-3077.patch If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296331#comment-14296331 ] Hadoop QA commented on YARN-3077: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695162/YARN-3077.3.patch against trunk revision 5a0051f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6450//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6450//console This message is automatically generated. RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Attachments: YARN-3077.2.patch, YARN-3077.3.patch, YARN-3077.patch If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296297#comment-14296297 ] Hadoop QA commented on YARN-3101: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695156/YARN-3101-Siqi.v1.patch against trunk revision 5a0051f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6449//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6449//console This message is automatically generated. FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3109) Broken link in cluster apps for application when queue full and failed
[ https://issues.apache.org/jira/browse/YARN-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3109: --- Description: Application link in cluster/apps broken when queue full and failed Configure capacity scheduler with default queue size as 1 Submit 2 mapreduce jobs to the default queue Select application detail link in ID for /cluster/app/application_1422467063659_0006 {quote} property nameyarn.scheduler.capacity.root.default.maximum-applications/name value1/value description /description /property {quote} {quote} 15/01/29 14:29:43 ERROR webapp.Dispatcher: error handling URI: /cluster/app/application_1422467063659_0006 java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542) at org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:84) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at
[jira] [Updated] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3079: Attachment: (was: YARN-3079.004.patch) Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch, YARN-3079.004.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3079: Attachment: YARN-3079.004.patch Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch, YARN-3079.004.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3109) Broken link in cluster apps for application when queue full an rejected
Bibin A Chundatt created YARN-3109: -- Summary: Broken link in cluster apps for application when queue full an rejected Key: YARN-3109 URL: https://issues.apache.org/jira/browse/YARN-3109 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Affects Versions: 2.6.0 Environment: Linux , 2 NM and 1 RM Reporter: Bibin A Chundatt Priority: Minor Fix For: 2.6.0 Application link in cluster/apps broken when queue full and rejected Configure capacity scheduler with default queue size as 1 Submit 2 mapreduce jobs to the default queue Select application detail link in ID for /cluster/app/application_1422467063659_0006 {quote} property nameyarn.scheduler.capacity.root.default.maximum-applications/name value1/value description /description /property {quote} {quote} 15/01/29 14:29:43 ERROR webapp.Dispatcher: error handling URI: /cluster/app/application_1422467063659_0006 java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542) at org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:84) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at
[jira] [Updated] (YARN-3109) Broken link in cluster apps for application when queue full and failed
[ https://issues.apache.org/jira/browse/YARN-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3109: --- Summary: Broken link in cluster apps for application when queue full and failed (was: Broken link in cluster apps for application when queue full an rejected) Broken link in cluster apps for application when queue full and failed -- Key: YARN-3109 URL: https://issues.apache.org/jira/browse/YARN-3109 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Affects Versions: 2.6.0 Environment: Linux , 2 NM and 1 RM Reporter: Bibin A Chundatt Priority: Minor Fix For: 2.6.0 Application link in cluster/apps broken when queue full and rejected Configure capacity scheduler with default queue size as 1 Submit 2 mapreduce jobs to the default queue Select application detail link in ID for /cluster/app/application_1422467063659_0006 {quote} property nameyarn.scheduler.capacity.root.default.maximum-applications/name value1/value description /description /property {quote} {quote} 15/01/29 14:29:43 ERROR webapp.Dispatcher: error handling URI: /cluster/app/application_1422467063659_0006 java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542) at org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:84) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at
[jira] [Created] (YARN-3110) Faulty link and state in ApplicationHistory when aplication is in unassigned state
Bibin A Chundatt created YARN-3110: -- Summary: Faulty link and state in ApplicationHistory when aplication is in unassigned state Key: YARN-3110 URL: https://issues.apache.org/jira/browse/YARN-3110 Project: Hadoop YARN Issue Type: Bug Components: applications, timelineserver Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Priority: Minor Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity:10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Applcation history State= null and History link shown as N/A -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3110) Faulty link and state in ApplicationHistory when aplication is in unassigned state
[ https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3110: --- Assignee: Naganarasimha G R Faulty link and state in ApplicationHistory when aplication is in unassigned state -- Key: YARN-3110 URL: https://issues.apache.org/jira/browse/YARN-3110 Project: Hadoop YARN Issue Type: Bug Components: applications, timelineserver Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity: 10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Applcation history State= null and History link shown as N/A -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3110) Faulty link and state in ApplicationHistory when aplication is in unassigned state
[ https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3110: --- Description: Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity:10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Application history State= null and History link shown as N/A in applicationhistoty page was: Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity:10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Applcation history State= null and History link shown as N/A Faulty link and state in ApplicationHistory when aplication is in unassigned state -- Key: YARN-3110 URL: https://issues.apache.org/jira/browse/YARN-3110 Project: Hadoop YARN Issue Type: Bug Components: applications, timelineserver Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity: 10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Application history State= null and History link shown as N/A in applicationhistoty page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3110) Faulty link and state in ApplicationHistory when aplication is in unassigned state
[ https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3110: --- Description: Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity:10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Application history State= null and History link shown as N/A in applicationhistory page was: Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity:10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Application history State= null and History link shown as N/A in applicationhistoty page Faulty link and state in ApplicationHistory when aplication is in unassigned state -- Key: YARN-3110 URL: https://issues.apache.org/jira/browse/YARN-3110 Project: Hadoop YARN Issue Type: Bug Components: applications, timelineserver Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity: 10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Application history State= null and History link shown as N/A in applicationhistory page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2680) Node shouldn't be listed as RUNNING when NM daemon is stop even when recovery work is disabled.
[ https://issues.apache.org/jira/browse/YARN-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-2680. -- Resolution: Duplicate Node shouldn't be listed as RUNNING when NM daemon is stop even when recovery work is disabled. --- Key: YARN-2680 URL: https://issues.apache.org/jira/browse/YARN-2680 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Junping Du Priority: Critical After YARN-1336 (specifically saying YARN-1337), we now support container preserving during NM restart. During NM is down, the node shouldn't be listed as RUNNING from yarn node CLI or watched from RM website. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2680) Node shouldn't be listed as RUNNING when NM daemon is stop even when recovery work is disabled.
[ https://issues.apache.org/jira/browse/YARN-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2680: - Summary: Node shouldn't be listed as RUNNING when NM daemon is stop even when recovery work is disabled. (was: Node shouldn't be listed as RUNNING when NM daemon is stop even when recovery work is enabled.) Node shouldn't be listed as RUNNING when NM daemon is stop even when recovery work is disabled. --- Key: YARN-2680 URL: https://issues.apache.org/jira/browse/YARN-2680 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Junping Du Priority: Critical After YARN-1336 (specifically saying YARN-1337), we now support container preserving during NM restart. During NM is down, the node shouldn't be listed as RUNNING from yarn node CLI or watched from RM website. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chen updated YARN-3077: Attachment: YARN-3077.3.patch RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Attachments: YARN-3077.2.patch, YARN-3077.3.patch, YARN-3077.patch If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2680) Node shouldn't be listed as RUNNING when NM daemon is stop even when recovery work is enabled.
[ https://issues.apache.org/jira/browse/YARN-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296209#comment-14296209 ] Junping Du commented on YARN-2680: -- Hi [~jlowe], I think I meant that Node shouldn't be listed as running when recovery work is disabled. Correct the title here and found an existing JIRA there: YARN-41. Will mark this as duplicated. Node shouldn't be listed as RUNNING when NM daemon is stop even when recovery work is enabled. -- Key: YARN-2680 URL: https://issues.apache.org/jira/browse/YARN-2680 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Junping Du Priority: Critical After YARN-1336 (specifically saying YARN-1337), we now support container preserving during NM restart. During NM is down, the node shouldn't be listed as RUNNING from yarn node CLI or watched from RM website. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296214#comment-14296214 ] Wangda Tan commented on YARN-3079: -- [~zxu], could you verify if failed tests are not related? Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch, YARN-3079.004.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3110) Faulty link and state in ApplicationHistory when aplication is in unassigned state
[ https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3110: --- Description: Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity:10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Application history State= null and History link shown as N/A in applicationhistory page Kill the same application . In timeline server logs the below is show when selecting application link. {quote} 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to read the AM container of the application attempt appattempt_1422467063659_0007_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38) at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.yarn.server.timeline.webapp.CrossOriginFilter.doFilter(CrossOriginFilter.java:95) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269) at
[jira] [Updated] (YARN-3110) Faulty link and state in ApplicationHistory when aplication is in unassigned state
[ https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3110: --- Description: Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity:10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Application history State= null and History link shown as N/A in applicationhistory page Kill the same application in {quote} 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to read the AM container of the application attempt appattempt_1422467063659_0007_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38) at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.yarn.server.timeline.webapp.CrossOriginFilter.doFilter(CrossOriginFilter.java:95) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542) at
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296383#comment-14296383 ] Hadoop QA commented on YARN-3079: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695175/YARN-3079.004.patch against trunk revision 5a0051f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6451//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6451//console This message is automatically generated. Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch, YARN-3079.004.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3094) reset timer for liveness monitors after RM recovery
[ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3094: --- Attachment: YARN-3094.3.patch reset timer for liveness monitors after RM recovery --- Key: YARN-3094 URL: https://issues.apache.org/jira/browse/YARN-3094 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3094.2.patch, YARN-3094.3.patch, YARN-3094.patch When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). In our system, we found the recover process took about 3 mins, and all AM time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)