[jira] [Commented] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster
[ https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232753#comment-14232753 ] Rohith commented on YARN-2892: -- +1(non-binding) lgtm Unable to get AMRMToken in unmanaged AM when using a secure cluster --- Key: YARN-2892 URL: https://issues.apache.org/jira/browse/YARN-2892 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Sevada Abraamyan Assignee: Sevada Abraamyan Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch An AMRMToken is retrieved from the ApplicationReport by the YarnClient. When the RM creates the ApplicationReport and sends it back to the client it makes a simple security check whether it should include the AMRMToken in the report (See createAndGetApplicationReport in RMAppImpl).This security check verifies that the user who submitted the original application is the same user who is requesting the ApplicationReport. If they are indeed the same user then it includes the AMRMToken, otherwise it does not include it. The problem arises from the fact that when an application is submitted, the RM saves the short username of the user who created the application (See submitApplication in ClientRmService). Afterwards when the ApplicationReport is requested, the system tries to match the full username of the requester against the previously stored short username. In a secure cluster using Kerberos this check fails because the principle is stripped from the username when we request a short username. So for example the short username might be Foo whereas the full username is f...@company.com Note: A very similar problem has been previously reported ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2081) TestDistributedShell fails after YARN-1962
[ https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Hao updated YARN-2081: --- Target Version/s: (was: 2.4.1) TestDistributedShell fails after YARN-1962 -- Key: YARN-2081 URL: https://issues.apache.org/jira/browse/YARN-2081 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 3.0.0, 2.4.1 Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Labels: 2.4.1 Fix For: 2.4.1 Attachments: YARN-2081.patch java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2081) TestDistributedShell fails after YARN-1962
[ https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Hao updated YARN-2081: --- Labels: 2.4.1 (was: ) TestDistributedShell fails after YARN-1962 -- Key: YARN-2081 URL: https://issues.apache.org/jira/browse/YARN-2081 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 3.0.0, 2.4.1 Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Labels: 2.4.1 Fix For: 2.4.1 Attachments: YARN-2081.patch java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.
[ https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232900#comment-14232900 ] Hudson commented on YARN-2894: -- FAILURE: Integrated in Hadoop-Yarn-trunk #763 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/763/]) YARN-2894. Fixed a bug regarding application view acl when RM fails over. Contributed by Rohith Sharmaks (jianhe: rev 392c3aaea8e8f156b76e418157fa347256283c56) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/UserMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java When ACL's are enabled, if RM switches then application can not be viewed from web. --- Key: YARN-2894 URL: https://issues.apache.org/jira/browse/YARN-2894 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Fix For: 2.7.0 Attachments: YARN-2894.1.patch, YARN-2894.patch Binding aclManager to RMWebApp would cause problem if RM is switched. There could be some validation check may fail. I think , we should not bind aclManager for RMWebApp, instead we should get from RM instance. In RMWebApp, {code} if (rm != null) { bind(ResourceManager.class).toInstance(rm); bind(RMContext.class).toInstance(rm.getRMContext()); bind(ApplicationACLsManager.class).toInstance( rm.getApplicationACLsManager()); bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager()); } {code} and in AppBlock#render below check may fail(Need to test and confirm)
[jira] [Commented] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232907#comment-14232907 ] Junping Du commented on YARN-1156: -- +1. Patch looks good to me. Committing this in. Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Priority: Minor Labels: metrics, newbie Fix For: 2.7.0 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, YARN-1156.4.patch, YARN-1156.5.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1156: - Summary: Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values (was: Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Priority: Minor Labels: metrics, newbie Fix For: 2.7.0 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, YARN-1156.4.patch, YARN-1156.5.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1156: - Priority: Major (was: Minor) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Labels: metrics, newbie Fix For: 2.7.0 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, YARN-1156.4.patch, YARN-1156.5.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232919#comment-14232919 ] Hudson commented on YARN-2136: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/24/]) YARN-2136. Changed RMStateStore to ignore store opearations when fenced. Contributed by Varun Saxena (jianhe: rev 52bcefca8bb13d3757009f1f08203e7dca3b1e16) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEventType.java RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Jian He Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2136.002.patch, YARN-2136.003.patch, YARN-2136.004.patch, YARN-2136.005.patch, YARN-2136.patch RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2472) yarn-daemons.sh should just call yarn directly
[ https://issues.apache.org/jira/browse/YARN-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232916#comment-14232916 ] Hudson commented on YARN-2472: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/24/]) YARN-2472. yarn-daemons.sh should jsut call yarn directly (Masatake Iwasaki via aw) (aw: rev 26319ba0db9907c6254f65cd5b07f72c114d7e85) * hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/bin/start-yarn.sh * hadoop-yarn-project/hadoop-yarn/bin/yarn-daemons.sh yarn-daemons.sh should just call yarn directly -- Key: YARN-2472 URL: https://issues.apache.org/jira/browse/YARN-2472 Project: Hadoop YARN Issue Type: Improvement Reporter: Allen Wittenauer Assignee: Masatake Iwasaki Fix For: 3.0.0 Attachments: YARN-2472-1.patch There is little-to-no need for it to go through yarn-daemon.sh anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.
[ https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232913#comment-14232913 ] Hudson commented on YARN-2894: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/24/]) YARN-2894. Fixed a bug regarding application view acl when RM fails over. Contributed by Rohith Sharmaks (jianhe: rev 392c3aaea8e8f156b76e418157fa347256283c56) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/UserMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java When ACL's are enabled, if RM switches then application can not be viewed from web. --- Key: YARN-2894 URL: https://issues.apache.org/jira/browse/YARN-2894 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Fix For: 2.7.0 Attachments: YARN-2894.1.patch, YARN-2894.patch Binding aclManager to RMWebApp would cause problem if RM is switched. There could be some validation check may fail. I think , we should not bind aclManager for RMWebApp, instead we should get from RM instance. In RMWebApp, {code} if (rm != null) { bind(ResourceManager.class).toInstance(rm); bind(RMContext.class).toInstance(rm.getRMContext()); bind(ApplicationACLsManager.class).toInstance( rm.getApplicationACLsManager()); bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager()); } {code} and in AppBlock#render below check may fail(Need to test
[jira] [Updated] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1156: - Labels: metrics (was: metrics newbie) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Labels: metrics Fix For: 2.7.0 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, YARN-1156.4.patch, YARN-1156.5.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232939#comment-14232939 ] Hudson commented on YARN-1156: -- FAILURE: Integrated in Hadoop-trunk-Commit #6639 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6639/]) YARN-1156. Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values. (Contributed by Tsuyoshi OZAWA) (junping_du: rev e65b7c5ff6b0c013e510e750fe5cf59acfefea5f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/metrics/TestNodeManagerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java * hadoop-yarn-project/CHANGES.txt Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Labels: metrics Fix For: 2.7.0 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, YARN-1156.4.patch, YARN-1156.5.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233006#comment-14233006 ] Wilfred Spiegelenburg commented on YARN-2910: - I have the code change done with all the synchronisation around the for loops. All iterator access of the {{Collections.synchronizedList}} needs to be synchronised, based on the javadoc, which might impact the performance as much or worse than the copy on write. The junit test is in almost done and I will update the patch when that is finished. FSLeafQueue can throw ConcurrentModificationException - Key: YARN-2910 URL: https://issues.apache.org/jira/browse/YARN-2910 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.0 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Attachments: FSLeafQueue_concurrent_exception.txt, YARN-2910.patch The list that maintains the runnable and the non runnable apps are a standard ArrayList but there is no guarantee that it will only be manipulated by one thread in the system. This can lead to the following exception: {noformat} 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM. java.util.ConcurrentModificationException: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) at java.util.ArrayList$Itr.next(ArrayList.java:831) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516) {noformat} Full stack trace in the attached file. We should guard against that by using a thread safe version from java.util.concurrent.CopyOnWriteArrayList -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2917) RM get hanged if fail to store NodeLabels into store.
Rohith created YARN-2917: Summary: RM get hanged if fail to store NodeLabels into store. Key: YARN-2917 URL: https://issues.apache.org/jira/browse/YARN-2917 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical I encoutered scenario where RM hanged while shutting down and keep on logging {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2917) RM get hanged if fail to store NodeLabels into store.
[ https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233023#comment-14233023 ] Rohith commented on YARN-2917: -- Attaching thread dump when RM hanged {code} Thread-1 prio=10 tid=0x006e1000 nid=0x55a4 in Object.wait() [0x7f2ce9493000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xf26b0d48 (a java.lang.Object) at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:141) - locked 0xf26b0d48 (a java.lang.Object) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked 0xf26b0aa8 (a java.lang.Object) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:232) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:238) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked 0xf26b0968 (a java.lang.Object) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:599) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked 0xf2842458 (a java.lang.Object) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stopActiveServices(ResourceManager.java:1002) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:1057) - locked 0xc0c96c98 (a org.apache.hadoop.yarn.server.resourcemanager.ResourceManager) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1104) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked 0xc0cab280 (a java.lang.Object) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:65) at org.apache.hadoop.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:183) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) AsyncDispatcher event handler daemon prio=10 tid=0x7f2cf0b81000 nid=0x54a1 in Object.wait() [0x7f2cf7bfa000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xc01b83e8 (a org.apache.hadoop.util.ShutdownHookManager$1) at java.lang.Thread.join(Thread.java:1281) - locked 0xc01b83e8 (a org.apache.hadoop.util.ShutdownHookManager$1) at java.lang.Thread.join(Thread.java:1355) at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106) at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46) at java.lang.Shutdown.runHooks(Shutdown.java:123) at java.lang.Shutdown.sequence(Shutdown.java:167) at java.lang.Shutdown.exit(Shutdown.java:212) - locked 0xc04ae9c0 (a java.lang.Class for java.lang.Shutdown) at java.lang.Runtime.exit(Runtime.java:109) at java.lang.System.exit(System.java:962) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:185) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) {code} RM get hanged if fail to store NodeLabels into store. - Key: YARN-2917 URL: https://issues.apache.org/jira/browse/YARN-2917 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical I encoutered scenario where RM hanged while shutting down and keep on logging {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2917) RM get hanged if fail to store NodeLabels into store.
[ https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233035#comment-14233035 ] Rohith commented on YARN-2917: -- The main problem is Thread-1 : CommonNodeLabelManager#handle() throw back exception to AsyncDispatcher. Intern, asyncDispatcher calls shutdown hook and waiting for shutdown hook to complete. Thread-2 : ShutodownHook stops RM gracefully.But gracefull stop wait for drainng events from AsyncDispatcher. RM get hanged if fail to store NodeLabels into store. - Key: YARN-2917 URL: https://issues.apache.org/jira/browse/YARN-2917 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical I encoutered scenario where RM hanged while shutting down and keep on logging {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2472) yarn-daemons.sh should just call yarn directly
[ https://issues.apache.org/jira/browse/YARN-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233044#comment-14233044 ] Hudson commented on YARN-2472: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1955 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1955/]) YARN-2472. yarn-daemons.sh should jsut call yarn directly (Masatake Iwasaki via aw) (aw: rev 26319ba0db9907c6254f65cd5b07f72c114d7e85) * hadoop-yarn-project/hadoop-yarn/bin/yarn-daemons.sh * hadoop-yarn-project/hadoop-yarn/bin/start-yarn.sh * hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh * hadoop-yarn-project/CHANGES.txt yarn-daemons.sh should just call yarn directly -- Key: YARN-2472 URL: https://issues.apache.org/jira/browse/YARN-2472 Project: Hadoop YARN Issue Type: Improvement Reporter: Allen Wittenauer Assignee: Masatake Iwasaki Fix For: 3.0.0 Attachments: YARN-2472-1.patch There is little-to-no need for it to go through yarn-daemon.sh anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233047#comment-14233047 ] Hudson commented on YARN-2136: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1955 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1955/]) YARN-2136. Changed RMStateStore to ignore store opearations when fenced. Contributed by Varun Saxena (jianhe: rev 52bcefca8bb13d3757009f1f08203e7dca3b1e16) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEventType.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Jian He Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2136.002.patch, YARN-2136.003.patch, YARN-2136.004.patch, YARN-2136.005.patch, YARN-2136.patch RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.
[ https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233041#comment-14233041 ] Hudson commented on YARN-2894: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1955 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1955/]) YARN-2894. Fixed a bug regarding application view acl when RM fails over. Contributed by Rohith Sharmaks (jianhe: rev 392c3aaea8e8f156b76e418157fa347256283c56) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/UserMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java When ACL's are enabled, if RM switches then application can not be viewed from web. --- Key: YARN-2894 URL: https://issues.apache.org/jira/browse/YARN-2894 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Fix For: 2.7.0 Attachments: YARN-2894.1.patch, YARN-2894.patch Binding aclManager to RMWebApp would cause problem if RM is switched. There could be some validation check may fail. I think , we should not bind aclManager for RMWebApp, instead we should get from RM instance. In RMWebApp, {code} if (rm != null) { bind(ResourceManager.class).toInstance(rm); bind(RMContext.class).toInstance(rm.getRMContext()); bind(ApplicationACLsManager.class).toInstance( rm.getApplicationACLsManager()); bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager()); } {code} and in AppBlock#render below check may fail(Need to test and
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233063#comment-14233063 ] Hudson commented on YARN-2136: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/24/]) YARN-2136. Changed RMStateStore to ignore store opearations when fenced. Contributed by Varun Saxena (jianhe: rev 52bcefca8bb13d3757009f1f08203e7dca3b1e16) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Jian He Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2136.002.patch, YARN-2136.003.patch, YARN-2136.004.patch, YARN-2136.005.patch, YARN-2136.patch RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.
[ https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233057#comment-14233057 ] Hudson commented on YARN-2894: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/24/]) YARN-2894. Fixed a bug regarding application view acl when RM fails over. Contributed by Rohith Sharmaks (jianhe: rev 392c3aaea8e8f156b76e418157fa347256283c56) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/UserMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java When ACL's are enabled, if RM switches then application can not be viewed from web. --- Key: YARN-2894 URL: https://issues.apache.org/jira/browse/YARN-2894 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Fix For: 2.7.0 Attachments: YARN-2894.1.patch, YARN-2894.patch Binding aclManager to RMWebApp would cause problem if RM is switched. There could be some validation check may fail. I think , we should not bind aclManager for RMWebApp, instead we should get from RM instance. In RMWebApp, {code} if (rm != null) { bind(ResourceManager.class).toInstance(rm); bind(RMContext.class).toInstance(rm.getRMContext()); bind(ApplicationACLsManager.class).toInstance( rm.getApplicationACLsManager()); bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager()); } {code} and in AppBlock#render below check may fail(Need to test
[jira] [Commented] (YARN-2472) yarn-daemons.sh should just call yarn directly
[ https://issues.apache.org/jira/browse/YARN-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233060#comment-14233060 ] Hudson commented on YARN-2472: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/24/]) YARN-2472. yarn-daemons.sh should jsut call yarn directly (Masatake Iwasaki via aw) (aw: rev 26319ba0db9907c6254f65cd5b07f72c114d7e85) * hadoop-yarn-project/hadoop-yarn/bin/yarn-daemons.sh * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh * hadoop-yarn-project/hadoop-yarn/bin/start-yarn.sh yarn-daemons.sh should just call yarn directly -- Key: YARN-2472 URL: https://issues.apache.org/jira/browse/YARN-2472 Project: Hadoop YARN Issue Type: Improvement Reporter: Allen Wittenauer Assignee: Masatake Iwasaki Fix For: 3.0.0 Attachments: YARN-2472-1.patch There is little-to-no need for it to go through yarn-daemon.sh anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2918) RM starts up fails if accessible-node-labels are configured to queue.
Rohith created YARN-2918: Summary: RM starts up fails if accessible-node-labels are configured to queue. Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith I configured accessible-node-labels to queue. But RM startup fails with below exception. I see current steps to configure NodeLabel is first need to add via rmadmin and later need to configure for queues. But it will be good if both cluster and queue node labels has consitency in configuring it. {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:982) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1203) Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2918) RM starts up fails if accessible-node-labels are configured to queue without cluster lables
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2918: - Summary: RM starts up fails if accessible-node-labels are configured to queue without cluster lables (was: RM starts up fails if accessible-node-labels are configured to queue.) RM starts up fails if accessible-node-labels are configured to queue without cluster lables --- Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith I configured accessible-node-labels to queue. But RM startup fails with below exception. I see current steps to configure NodeLabel is first need to add via rmadmin and later need to configure for queues. But it will be good if both cluster and queue node labels has consitency in configuring it. {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:982) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1203) Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233113#comment-14233113 ] Hudson commented on YARN-1156: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1978 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1978/]) YARN-1156. Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values. (Contributed by Tsuyoshi OZAWA) (junping_du: rev e65b7c5ff6b0c013e510e750fe5cf59acfefea5f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/metrics/TestNodeManagerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java * hadoop-yarn-project/CHANGES.txt Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Labels: metrics Fix For: 2.7.0 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, YARN-1156.4.patch, YARN-1156.5.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.
[ https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233112#comment-14233112 ] Hudson commented on YARN-2894: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1978 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1978/]) YARN-2894. Fixed a bug regarding application view acl when RM fails over. Contributed by Rohith Sharmaks (jianhe: rev 392c3aaea8e8f156b76e418157fa347256283c56) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/UserMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java When ACL's are enabled, if RM switches then application can not be viewed from web. --- Key: YARN-2894 URL: https://issues.apache.org/jira/browse/YARN-2894 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Fix For: 2.7.0 Attachments: YARN-2894.1.patch, YARN-2894.patch Binding aclManager to RMWebApp would cause problem if RM is switched. There could be some validation check may fail. I think , we should not bind aclManager for RMWebApp, instead we should get from RM instance. In RMWebApp, {code} if (rm != null) { bind(ResourceManager.class).toInstance(rm); bind(RMContext.class).toInstance(rm.getRMContext()); bind(ApplicationACLsManager.class).toInstance( rm.getApplicationACLsManager()); bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager()); } {code} and in AppBlock#render below check may fail(Need to test
[jira] [Commented] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233128#comment-14233128 ] Hudson commented on YARN-1156: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/24/]) YARN-1156. Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values. (Contributed by Tsuyoshi OZAWA) (junping_du: rev e65b7c5ff6b0c013e510e750fe5cf59acfefea5f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/metrics/TestNodeManagerMetrics.java Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Labels: metrics Fix For: 2.7.0 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, YARN-1156.4.patch, YARN-1156.5.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.
[ https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233127#comment-14233127 ] Hudson commented on YARN-2894: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/24/]) YARN-2894. Fixed a bug regarding application view acl when RM fails over. Contributed by Rohith Sharmaks (jianhe: rev 392c3aaea8e8f156b76e418157fa347256283c56) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/UserMetricsInfo.java When ACL's are enabled, if RM switches then application can not be viewed from web. --- Key: YARN-2894 URL: https://issues.apache.org/jira/browse/YARN-2894 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Fix For: 2.7.0 Attachments: YARN-2894.1.patch, YARN-2894.patch Binding aclManager to RMWebApp would cause problem if RM is switched. There could be some validation check may fail. I think , we should not bind aclManager for RMWebApp, instead we should get from RM instance. In RMWebApp, {code} if (rm != null) { bind(ResourceManager.class).toInstance(rm); bind(RMContext.class).toInstance(rm.getRMContext()); bind(ApplicationACLsManager.class).toInstance( rm.getApplicationACLsManager()); bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager()); } {code} and in AppBlock#render below check may fail(Need
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233134#comment-14233134 ] Hudson commented on YARN-2136: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/24/]) YARN-2136. Changed RMStateStore to ignore store opearations when fenced. Contributed by Varun Saxena (jianhe: rev 52bcefca8bb13d3757009f1f08203e7dca3b1e16) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEventType.java RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Jian He Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2136.002.patch, YARN-2136.003.patch, YARN-2136.004.patch, YARN-2136.005.patch, YARN-2136.patch RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2472) yarn-daemons.sh should just call yarn directly
[ https://issues.apache.org/jira/browse/YARN-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233131#comment-14233131 ] Hudson commented on YARN-2472: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/24/]) YARN-2472. yarn-daemons.sh should jsut call yarn directly (Masatake Iwasaki via aw) (aw: rev 26319ba0db9907c6254f65cd5b07f72c114d7e85) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/bin/yarn-daemons.sh * hadoop-yarn-project/hadoop-yarn/bin/start-yarn.sh * hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh yarn-daemons.sh should just call yarn directly -- Key: YARN-2472 URL: https://issues.apache.org/jira/browse/YARN-2472 Project: Hadoop YARN Issue Type: Improvement Reporter: Allen Wittenauer Assignee: Masatake Iwasaki Fix For: 3.0.0 Attachments: YARN-2472-1.patch There is little-to-no need for it to go through yarn-daemon.sh anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2301) Improve yarn container command
[ https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2301: Attachment: YARN-2301.20141203-1.patch rebasing and updating the patch Improve yarn container command -- Key: YARN-2301 URL: https://issues.apache.org/jira/browse/YARN-2301 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Naganarasimha G R Labels: usability Attachments: YARN-2301.01.patch, YARN-2301.03.patch, YARN-2301.20141120-1.patch, YARN-2301.20141203-1.patch, YARN-2303.patch While running yarn container -list Application Attempt ID command, some observations: 1) the scheme (e.g. http/https ) before LOG-URL is missing 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to print as time format. 3) finish-time is 0 if container is not yet finished. May be N/A 4) May have an option to run as yarn container -list appId OR yarn application -list-containers appId also. As attempt Id is not shown on console, this is easier for user to just copy the appId and run it, may also be useful for container-preserving AM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2874) Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps
[ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233192#comment-14233192 ] Naganarasimha G R commented on YARN-2874: - Hi [~ozawa] [~kasha], Thanks for the review and feed back. I put some effort to write the test code to reproduce this issue but as more and more sleeps and wait notify was required and was not consistently going into deadlock, i thought its not worth the effort as the dead lock scenario was easily detectable. bq. RenewalTimerTask is a method which has a side effect, so the state can be invalid after the patch. We need to update the long error handling before merging it. Was not so clear about this statement as i was not able to get which state gets invalidated because of the fix and further you ( [~ozawa]) had mentioned ??Rethinking of this, this is not related to this JIRA.?? , so please if any thing more needs to be updated for this issue please inform. Regarding Sid's comment in MAPREDUCE-5384, If required to be be handled IIUC i need to revert my patch and redo as below (correct me if wrong and also inform if its req to be fixed in this way) {quote} {noformat} @Override public void run() { if (cancelled) { return; } Token? token = dttr.token; try { synchronized (this) { if (cancelled) { return; } requestNewHdfsDelegationTokenIfNeeded(dttr); // if the token is not replaced by a new token, renew the token if (appTokens.get(dttr.applicationId).contains(dttr)) { renewToken(dttr); setTimerForTokenRenewal(dttr);// set the next one } else { LOG.info(The token was removed already. Token = [ +dttr +]); } } } catch (Exception e) { LOG.error(Exception renewing token + token + . Not rescheduled, e); removeFailedDelegationToken(dttr); } } {noformat} {quote} Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps - Key: YARN-2874 URL: https://issues.apache.org/jira/browse/YARN-2874 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0, 2.5.1 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Blocker Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch When token renewal fails and the application finishes this dead lock can occur Jstack dump : {quote} Found one Java-level deadlock: = DelegationTokenRenewer #181865: waiting to lock monitor 0x00900918 (object 0xc18a9998, a java.util.Collections$SynchronizedSet), which is held by DelayedTokenCanceller DelayedTokenCanceller: waiting to lock monitor 0x04141718 (object 0xc7eae720, a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), which is held by Timer-4 Timer-4: waiting to lock monitor 0x00900918 (object 0xc18a9998, a java.util.Collections$SynchronizedSet), which is held by DelayedTokenCanceller Java stack information for the threads listed above: === DelegationTokenRenewer #181865: at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) - waiting to lock 0xc18a9998 (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) DelayedTokenCanceller: at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) - waiting to lock 0xc7eae720 (a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) at
[jira] [Commented] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster
[ https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233199#comment-14233199 ] Junping Du commented on YARN-2892: -- +1. Patch looks good. Will commit it shortly. Unable to get AMRMToken in unmanaged AM when using a secure cluster --- Key: YARN-2892 URL: https://issues.apache.org/jira/browse/YARN-2892 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Sevada Abraamyan Assignee: Sevada Abraamyan Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch An AMRMToken is retrieved from the ApplicationReport by the YarnClient. When the RM creates the ApplicationReport and sends it back to the client it makes a simple security check whether it should include the AMRMToken in the report (See createAndGetApplicationReport in RMAppImpl).This security check verifies that the user who submitted the original application is the same user who is requesting the ApplicationReport. If they are indeed the same user then it includes the AMRMToken, otherwise it does not include it. The problem arises from the fact that when an application is submitted, the RM saves the short username of the user who created the application (See submitApplication in ClientRmService). Afterwards when the ApplicationReport is requested, the system tries to match the full username of the requester against the previously stored short username. In a secure cluster using Kerberos this check fails because the principle is stripped from the username when we request a short username. So for example the short username might be Foo whereas the full username is f...@company.com Note: A very similar problem has been previously reported ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster
[ https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2892: - Hadoop Flags: Reviewed Unable to get AMRMToken in unmanaged AM when using a secure cluster --- Key: YARN-2892 URL: https://issues.apache.org/jira/browse/YARN-2892 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Sevada Abraamyan Assignee: Sevada Abraamyan Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch An AMRMToken is retrieved from the ApplicationReport by the YarnClient. When the RM creates the ApplicationReport and sends it back to the client it makes a simple security check whether it should include the AMRMToken in the report (See createAndGetApplicationReport in RMAppImpl).This security check verifies that the user who submitted the original application is the same user who is requesting the ApplicationReport. If they are indeed the same user then it includes the AMRMToken, otherwise it does not include it. The problem arises from the fact that when an application is submitted, the RM saves the short username of the user who created the application (See submitApplication in ClientRmService). Afterwards when the ApplicationReport is requested, the system tries to match the full username of the requester against the previously stored short username. In a secure cluster using Kerberos this check fails because the principle is stripped from the username when we request a short username. So for example the short username might be Foo whereas the full username is f...@company.com Note: A very similar problem has been previously reported ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2056) Disable preemption at Queue level
[ https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233205#comment-14233205 ] Jason Lowe commented on YARN-2056: -- Last call for comments, as I'm planning to commit by the end of this week. Disable preemption at Queue level - Key: YARN-2056 URL: https://issues.apache.org/jira/browse/YARN-2056 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal Assignee: Eric Payne Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, YARN-2056.201409232329.txt, YARN-2056.201409242210.txt, YARN-2056.201410132225.txt, YARN-2056.201410141330.txt, YARN-2056.201410232244.txt, YARN-2056.201410311746.txt, YARN-2056.201411041635.txt, YARN-2056.201411072153.txt, YARN-2056.201411122305.txt, YARN-2056.201411132215.txt, YARN-2056.201411142002.txt We need to be able to disable preemption at individual queue level -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2728) Support for disabling the Centralized NodeLabel validation in Distributed Node Label Configuration setup
[ https://issues.apache.org/jira/browse/YARN-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233233#comment-14233233 ] Naganarasimha G R commented on YARN-2728: - Hi [~wangda], In the leiu of earlier review [comment1|https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14169984page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14169984] [comment2|https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14169984page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14169984] in Yarn-2495, {quote} Now if user want to leverage change of capacity scheduler, user MUST specify 1) labels can be accessed by the queue and 2) proportion of resource can be accessed by a queue of each label. Back to the central node label validation discussion, without this, we cannot get capacity scheduler work for now. (user cannot specify capacity for a unknown node-label for a queue, etc.). {quote} I feel we can keep the design same and we can have a configuration flag based on which we can decide to do the following # Disable(/throw exception) in CommonNodeLabelsManager.addToCluserNodeLabels removeFromClusterNodeLabels (so that Cluster Node labels are not taken from REST or CLI) # Support protected method in CommonNodeLabelsManager which updates the label mgr with new labels(as cluster node labels) and invoke it from CommonNodeLabelsManager.addLabelsToNode By doing this, we will have the flexibility to enable or disable this centralized valid cluster node labels functionality in both centralized and distributed Node Labels configuration. Support for disabling the Centralized NodeLabel validation in Distributed Node Label Configuration setup Key: YARN-2728 URL: https://issues.apache.org/jira/browse/YARN-2728 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Currently without Central List of Valid Labels, Capacity scheduler will not be able to work (user cannot specify capacity for a unknown node-label for a queue, etc.). But without disabling the central label validation, Distributed Node Label configuration feature is not complete. so we need to support this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2837) Timeline server needs to recover the timeline DT when restarting
[ https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233234#comment-14233234 ] Zhijie Shen commented on YARN-2837: --- bq. Maybe we want to make the version control logic a unified interface in future? I think it's a good suggestion, but how about leaving code refactoring separately? In addition to the timeline server, other components have the state store that is built on top of leveldb, and have the similar version related code. We can do one-pass refactoring, make all leveldb store impls share the common code. Let's file a Jira for it. Timeline server needs to recover the timeline DT when restarting Key: YARN-2837 URL: https://issues.apache.org/jira/browse/YARN-2837 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2837.1.patch, YARN-2837.2.patch, YARN-2837.3.patch Timeline server needs to recover the stateful information when restarting as RM/NM/JHS does now. So far the stateful information only includes the timeline DT. Without recovery, the timeline DT of the existing YARN apps is not long valid, and cannot be renewed any more after the timeline server is restarted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233244#comment-14233244 ] Craig Welch commented on YARN-2637: --- First, the easy parts :-) bq. typo for manualy fixed bq. usedAMResources is not used by sub-class, so suggest to replace it with private done bq. Exception messages here should be more meaningful than c1, or c2. yup - fixed bq. The log level here should be info or warn level rather than debug level. In most cases, LOG.debug() should be under block of LOG.isDebugEnabled(). So, I had made this debug rather than something higher because I'm not sure we always care, and it doesn't represent a failure case - this is normal/expected case, and other similar cases for not starting the app don't log at all. But, I can see that it will be helpful to know this, and I don't think that it will result in excessive logging - so I went ahead and made it an info level, sound good? BTW, the isXYZenabled idiom is to save the cost of evaluating the argument construction for the log message as these can be very expensive, but for cheap cases like this (a string literal) it's not necessary as the only cost is going to be the same evaluation for logging which will happen during the call Now for the more complicated one: bq. Looks like maxAMResourcePerQueuePercent is a allowed percent for AM resource in each queue. So we may should calculate amLimit per queue rather than aggregate all applications together. So, yes and no - the current behavior actually takes the maxAM... which is set globally and it apportions it out based on the queue's baseline share of the cluster - so if the maxam was say, 10%, and a given queue had 50% of the cluster, it would have an effective maxampercent value of 5% (it's translated into how many apps can I have running based on the minallocation of the cluster rather than actual am usage - which is the problem which prompted the fix - but the important thing to get here is the way the overall maxampercent is apportioned out to the queues) There is also the option to override on a per queue basis, so that, in the above scenario, if you didn't like the queue getting the 5% based on the overall process, but you were happy with how other queues were working using the config, you could just override for the given queue. When I tried to translate this into something which was actually paying attention to the real usage of the ams, two approaches seemed reasonable: 1. Just have a global used am resource value, use the global am percent everywhere (not apportioned) - this way the total cluster level effect is what we want - in this case, the subdivision of the amresource percent value is replaced with a total summing of the used resource amongst the queues. You can still override for a given queue if you want this queue to be able to go higher, which has the effective result of allowing one queue to go higher than the others, this could starve other queues (bad) but that was already possible with the other approach, albeit in a different way (when the cluster came to be filled with AM's from one particular queue.). 2. We could subdivide the global maxampercent based on the queue share of the baseline (as before) and then have a per-queue amresource percent (and amused) which are evaluated - this would not be a difficult change from the current approach, but I think it is problematic for the reason below The main reason I took approach number one over two is that I was concerned that with a complex queue structure where there was a reasonable level of subdivision in a smallish cluster you could end up with a queue which can effectively never start anything because the final value is too small to ever be able to start one of the larger AM's we have these days. By sharing it globally this is less likely to happen because that unused am resource allocated out to other queues which have a larger share of the cluster is not potentially sitting idle while leaf queue a.b.c has a derived maxampercent of say 2%, which translates into 512mb, and so can never start an application master which needs 1G (even though, globally, there's more than enough ampercent to do so). It's the this queue can never start an am over x size that concerns me. There are other possible ways to handle this with option 2, but I'm concerned that they would add complexity to the behavior and change the behavior more than is needed to correct the defect. [~djp] Make sense? Thoughts? I may take a go at option 2 so we can evaluate it, but I'm concerned about the small cluster/too much subdivision scenario being problematic. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key:
[jira] [Commented] (YARN-2837) Timeline server needs to recover the timeline DT when restarting
[ https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233294#comment-14233294 ] Li Lu commented on YARN-2837: - Agree to [~zjshen]'s suggestion. Let's do that in a separate Jira. I'd +1 this patch, and maybe some committers would like to take a look at it? Timeline server needs to recover the timeline DT when restarting Key: YARN-2837 URL: https://issues.apache.org/jira/browse/YARN-2837 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2837.1.patch, YARN-2837.2.patch, YARN-2837.3.patch Timeline server needs to recover the stateful information when restarting as RM/NM/JHS does now. So far the stateful information only includes the timeline DT. Without recovery, the timeline DT of the existing YARN apps is not long valid, and cannot be renewed any more after the timeline server is restarted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233323#comment-14233323 ] Wangda Tan commented on YARN-2637: -- [~cwelch], I think option#2 makes more sense to me, since each allocation will check queue's capcity limit only. IIUC. option #1 could lead to some queues all are occupied by AM, which is why we introduced the max-am-resource parameter. For option#2, we can allow user run at least one AM in spite of max am resource to avoid the problem mentioned. In a real world cluster, capacity of queue should be maximum size of container we can launch. Do you agree? Thanks, Wangda maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233341#comment-14233341 ] Bikas Saha commented on YARN-2139: -- So to be clear, currently vdisks is counting the number of physical drives present on the box. Something to keep in mind would be whether this also entails a change in the NM policy of providing a directly on every local dir (which typically maps to every disk) to every task. And tasks are free to choose one or more of those dirs (disks) to write to. This puts the spinning disk head under contention and affects performance of all writers on that disk because seeks are expensive. The thumb rule tends to be to allocate as many number of tasks to a machine as the number of disks (maybe 2x) so as to keep this seek cost low. Should we consider evaluating a change in this policy that gives a container 1 local dir to a container with 1 vdisk. This way for a machine with 6 disks (and 6 vdisks) would have 6 tasks running, each with their own dedicated disk. Off hand its hard to say how this would compare with all 6 disks allocated to all 6 tasks and letting cgroups enforce sharing. If multiple tasks end up choosing the same disk for their writes, then they may not end up getting the allocation that they thought they would get. [Umbrella] Support for Disk as a Resource in YARN -- Key: YARN-2139 URL: https://issues.apache.org/jira/browse/YARN-2139 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Attachments: Disk_IO_Isolation_Scheduling_3.pdf, Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, YARN-2139-prototype.patch YARN should consider disk as another resource for (1) scheduling tasks on nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2437) start-yarn.sh/stop-yarn should give info
[ https://issues.apache.org/jira/browse/YARN-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-2437: -- Assignee: Varun Saxena start-yarn.sh/stop-yarn should give info Key: YARN-2437 URL: https://issues.apache.org/jira/browse/YARN-2437 Project: Hadoop YARN Issue Type: Improvement Components: scripts Reporter: Allen Wittenauer Assignee: Varun Saxena Labels: newbie With the merger and cleanup of the daemon launch code, yarn-daemons.sh no longer prints Starting information. This should be made more of an analog of start-dfs.sh/stop-dfs.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2903) Timeline server span receiver for htrace traces
[ https://issues.apache.org/jira/browse/YARN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-2903: - Attachment: timelinespanreceiver design 1.pdf Timeline server span receiver for htrace traces --- Key: YARN-2903 URL: https://issues.apache.org/jira/browse/YARN-2903 Project: Hadoop YARN Issue Type: Task Components: timelineserver Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: timelinespanreceiver design 1.pdf HDFS is tracing using htrace now, as are other applications including HBase and Accumulo. It would be a nice feature if we enabled writing traces to the timeline server. I envision an htrace SpanReceiver implementation that uses the TimelineClient to store tracing data. The htrace API may end up being a more convenient way to instrument applications to store timeline data in the timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2847) Linux native container executor segfaults if default banned user detected
[ https://issues.apache.org/jira/browse/YARN-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chang li updated YARN-2847: --- Attachment: yarn2847.patch Linux native container executor segfaults if default banned user detected - Key: YARN-2847 URL: https://issues.apache.org/jira/browse/YARN-2847 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: chang li Attachments: yarn2847.patch, yarn2847notest.patch The check_user function in container-executor.c can cause a segmentation fault if banned.users is not provided but the user is detected as one of the default users. In that scenario it will call free_values on a NULL pointer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2847) Linux native container executor segfaults if default banned user detected
[ https://issues.apache.org/jira/browse/YARN-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chang li updated YARN-2847: --- Attachment: yarn2847.patch latest patch, previous one has a minor comment error Linux native container executor segfaults if default banned user detected - Key: YARN-2847 URL: https://issues.apache.org/jira/browse/YARN-2847 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: chang li Attachments: yarn2847.patch, yarn2847.patch, yarn2847notest.patch The check_user function in container-executor.c can cause a segmentation fault if banned.users is not provided but the user is detected as one of the default users. In that scenario it will call free_values on a NULL pointer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2428) LCE default banned user list should have yarn
[ https://issues.apache.org/jira/browse/YARN-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2428: --- Attachment: YARN-2428.patch LCE default banned user list should have yarn - Key: YARN-2428 URL: https://issues.apache.org/jira/browse/YARN-2428 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Allen Wittenauer Assignee: Varun Saxena Priority: Trivial Labels: newbie Attachments: YARN-2428.patch When task-controller was retrofitted to YARN, the default banned user list didn't add yarn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2428) LCE default banned user list should have yarn
[ https://issues.apache.org/jira/browse/YARN-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-2428: -- Assignee: Varun Saxena LCE default banned user list should have yarn - Key: YARN-2428 URL: https://issues.apache.org/jira/browse/YARN-2428 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Allen Wittenauer Assignee: Varun Saxena Priority: Trivial Labels: newbie Attachments: YARN-2428.patch When task-controller was retrofitted to YARN, the default banned user list didn't add yarn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2301) Improve yarn container command
[ https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233486#comment-14233486 ] Hadoop QA commented on YARN-2301: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12684920/YARN-2301.20141203-1.patch against trunk revision 03ab24a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5987//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5987//console This message is automatically generated. Improve yarn container command -- Key: YARN-2301 URL: https://issues.apache.org/jira/browse/YARN-2301 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Jian He Assignee: Naganarasimha G R Labels: usability Attachments: YARN-2301.01.patch, YARN-2301.03.patch, YARN-2301.20141120-1.patch, YARN-2301.20141203-1.patch, YARN-2303.patch While running yarn container -list Application Attempt ID command, some observations: 1) the scheme (e.g. http/https ) before LOG-URL is missing 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to print as time format. 3) finish-time is 0 if container is not yet finished. May be N/A 4) May have an option to run as yarn container -list appId OR yarn application -list-containers appId also. As attempt Id is not shown on console, this is easier for user to just copy the appId and run it, may also be useful for container-preserving AM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2919) Potential race between renew and cancel in DelegationTokenRenwer
Karthik Kambatla created YARN-2919: -- Summary: Potential race between renew and cancel in DelegationTokenRenwer Key: YARN-2919 URL: https://issues.apache.org/jira/browse/YARN-2919 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.6.0 Reporter: Karthik Kambatla Priority: Critical YARN-2874 fixes a deadlock in DelegationTokenRenewer, but there is still a race because of which a renewal in flight isn't interrupted by a cancel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233530#comment-14233530 ] Craig Welch commented on YARN-2637: --- Hmmm, [~leftnoteasy] option 1 does have the possible issue you describe, and the issue with possibly starving all other queues if one queue has the am percent set higher than the others I mentioned above. The approach of only enforcing the limit if at least one application is running was the approach I was thinking of if we went with 2 - the other being to not add the new app in when doing the check (so it's only retroactive to what has started), but I like the former better as it will reduce the overage as much as possible. Obviously, either approach has the potential to allow things to exceed the maxampercent if there are a large number of queues, but there are tradeoffs either way, it's probably a smaller risk... I'll see about a patch for approach 2. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2874) Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps
[ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233543#comment-14233543 ] Tsuyoshi OZAWA commented on YARN-2874: -- [~Naganarasimha], never mind, your patch looks good to me. +1 Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps - Key: YARN-2874 URL: https://issues.apache.org/jira/browse/YARN-2874 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0, 2.5.1 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Blocker Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch When token renewal fails and the application finishes this dead lock can occur Jstack dump : {quote} Found one Java-level deadlock: = DelegationTokenRenewer #181865: waiting to lock monitor 0x00900918 (object 0xc18a9998, a java.util.Collections$SynchronizedSet), which is held by DelayedTokenCanceller DelayedTokenCanceller: waiting to lock monitor 0x04141718 (object 0xc7eae720, a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), which is held by Timer-4 Timer-4: waiting to lock monitor 0x00900918 (object 0xc18a9998, a java.util.Collections$SynchronizedSet), which is held by DelayedTokenCanceller Java stack information for the threads listed above: === DelegationTokenRenewer #181865: at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) - waiting to lock 0xc18a9998 (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) DelayedTokenCanceller: at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) - waiting to lock 0xc7eae720 (a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558) - locked 0xc18a9998 (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599) at java.lang.Thread.run(Thread.java:745) Timer-4: at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) - waiting to lock 0xc18a9998 (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437) - locked 0xc7eae720 (a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Found 1 deadlock. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2874) Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps
[ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233558#comment-14233558 ] Karthik Kambatla commented on YARN-2874: Checking this in. Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps - Key: YARN-2874 URL: https://issues.apache.org/jira/browse/YARN-2874 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0, 2.5.1 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Blocker Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch When token renewal fails and the application finishes this dead lock can occur Jstack dump : {quote} Found one Java-level deadlock: = DelegationTokenRenewer #181865: waiting to lock monitor 0x00900918 (object 0xc18a9998, a java.util.Collections$SynchronizedSet), which is held by DelayedTokenCanceller DelayedTokenCanceller: waiting to lock monitor 0x04141718 (object 0xc7eae720, a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), which is held by Timer-4 Timer-4: waiting to lock monitor 0x00900918 (object 0xc18a9998, a java.util.Collections$SynchronizedSet), which is held by DelayedTokenCanceller Java stack information for the threads listed above: === DelegationTokenRenewer #181865: at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) - waiting to lock 0xc18a9998 (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) DelayedTokenCanceller: at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) - waiting to lock 0xc7eae720 (a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558) - locked 0xc18a9998 (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599) at java.lang.Thread.run(Thread.java:745) Timer-4: at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) - waiting to lock 0xc18a9998 (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437) - locked 0xc7eae720 (a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Found 1 deadlock. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2891) Failed Container Executor does not provide a clear error message
[ https://issues.apache.org/jira/browse/YARN-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233577#comment-14233577 ] Hudson commented on YARN-2891: -- FAILURE: Integrated in Hadoop-trunk-Commit #6645 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6645/]) YARN-2891. Failed Container Executor does not provide a clear error message. Contributed by Dustin Cote. (harsh) (harsh: rev a31e0164912236630c485e5aeb908b43e3a67c61) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c Failed Container Executor does not provide a clear error message Key: YARN-2891 URL: https://issues.apache.org/jira/browse/YARN-2891 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.5.1 Environment: any Reporter: Dustin Cote Assignee: Dustin Cote Priority: Minor Fix For: 2.7.0 Attachments: YARN-2891-1.patch When checking access to directories, the container executor does not provide clear information on which directory actually could not be accessed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2874) Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps
[ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233575#comment-14233575 ] Hudson commented on YARN-2874: -- FAILURE: Integrated in Hadoop-trunk-Commit #6645 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6645/]) YARN-2874. Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps. (Naganarasimha G R via kasha) (kasha: rev 799353e2c7db5af6e40e3521439b5c8a3c5c6a51) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps - Key: YARN-2874 URL: https://issues.apache.org/jira/browse/YARN-2874 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0, 2.5.1 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch When token renewal fails and the application finishes this dead lock can occur Jstack dump : {quote} Found one Java-level deadlock: = DelegationTokenRenewer #181865: waiting to lock monitor 0x00900918 (object 0xc18a9998, a java.util.Collections$SynchronizedSet), which is held by DelayedTokenCanceller DelayedTokenCanceller: waiting to lock monitor 0x04141718 (object 0xc7eae720, a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), which is held by Timer-4 Timer-4: waiting to lock monitor 0x00900918 (object 0xc18a9998, a java.util.Collections$SynchronizedSet), which is held by DelayedTokenCanceller Java stack information for the threads listed above: === DelegationTokenRenewer #181865: at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) - waiting to lock 0xc18a9998 (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) DelayedTokenCanceller: at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) - waiting to lock 0xc7eae720 (a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558) - locked 0xc18a9998 (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599) at java.lang.Thread.run(Thread.java:745) Timer-4: at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) - waiting to lock 0xc18a9998 (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437) - locked 0xc7eae720 (a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) at java.util.TimerThread.mainLoop(Timer.java:555) at
[jira] [Commented] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233586#comment-14233586 ] Tsuyoshi OZAWA commented on YARN-1156: -- Thanks for committing and reviewing, Junping! Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Labels: metrics Fix For: 2.7.0 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, YARN-1156.4.patch, YARN-1156.5.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233605#comment-14233605 ] Karthik Kambatla commented on YARN-2139: bq. currently vdisks is counting the number of physical drives present on the box. We see vdisks as a multiple of the number of physical disks on the box. Again, it is just one of the ways, and we can add more ways to share disk resources in the future. bq. Should we consider evaluating a change in this policy that gives a container 1 local dir to a container with 1 vdisk. This way for a machine with 6 disks (and 6 vdisks) would have 6 tasks running, each with their own dedicated disk. Good point. We were thinking of giving the AM the option to choose the amount of disk IO parallelism at the time of launching the container, as part of the spindle locality work. I see AMs wanting to either (1) pick a single local directory for guaranteed performance or (2) stripe accesses across multiple disks for potentially higher throughput based on other work on the node. Initially, we could provide a global config for all containers - vdisks to span fewest or most disks. [Umbrella] Support for Disk as a Resource in YARN -- Key: YARN-2139 URL: https://issues.apache.org/jira/browse/YARN-2139 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Attachments: Disk_IO_Isolation_Scheduling_3.pdf, Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, YARN-2139-prototype.patch YARN should consider disk as another resource for (1) scheduling tasks on nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2847) Linux native container executor segfaults if default banned user detected
[ https://issues.apache.org/jira/browse/YARN-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233615#comment-14233615 ] Wei Yan commented on YARN-2847: --- Thanks for the fix, [~lichangleo]. There are some unnecessary changes in the latest patch, the blanks. And for the testcase, do we really need the testcase for this fix, given that the testcase requires mapred user. Linux native container executor segfaults if default banned user detected - Key: YARN-2847 URL: https://issues.apache.org/jira/browse/YARN-2847 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: chang li Attachments: yarn2847.patch, yarn2847.patch, yarn2847notest.patch The check_user function in container-executor.c can cause a segmentation fault if banned.users is not provided but the user is detected as one of the default users. In that scenario it will call free_values on a NULL pointer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2914) Potential race condition in ClientSCMMetrics#getInstance()
[ https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233622#comment-14233622 ] Tsuyoshi OZAWA commented on YARN-2914: -- [~varun_saxena], Thanks for your contribution. I think we need to init the singleton object with configuration. On the other hand, getInstance() doesn't take configuration as an argument. This semantic gap prevents us from calling initSingleton inside getInstance. Comments: * We should also take a lock of Singleton.INSTANCE in the method initSingleton. Potential race condition in ClientSCMMetrics#getInstance() -- Key: YARN-2914 URL: https://issues.apache.org/jira/browse/YARN-2914 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2914.patch {code} public static ClientSCMMetrics getInstance() { ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl; if (topMetrics == null) { throw new IllegalStateException( {code} getInstance() doesn't hold lock on Singleton.this This may result in IllegalStateException being thrown prematurely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233630#comment-14233630 ] Carlo Curino commented on YARN-2664: [~mazzu] thanks for the update. Regarding the release audit we should: # Make sure never to add apache license to any file you did not write yourself (please confirm) # Add entries in the LICENSE.txt and NOTICE.txt files to disclaim we are using (d3, nvd3, underscore). Do you need all three? I am uploading what I think are the needed bits ([~jghoman], can you double check this? I followed your advise, but I'd like a double check) I will be looking at the code more closely next. Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, YARN-2664.6.patch, YARN-2664.patch YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2914) Potential race condition in ClientSCMMetrics#getInstance()
[ https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233632#comment-14233632 ] Tsuyoshi OZAWA commented on YARN-2914: -- I found that the configuration which Singleton#init receives is never used. We can call init inside getInstance by passing null to initSingleton or changing the signature of initSingleton not to receive an object of configuration. Do you mind updating? Potential race condition in ClientSCMMetrics#getInstance() -- Key: YARN-2914 URL: https://issues.apache.org/jira/browse/YARN-2914 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2914.patch {code} public static ClientSCMMetrics getInstance() { ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl; if (topMetrics == null) { throw new IllegalStateException( {code} getInstance() doesn't hold lock on Singleton.this This may result in IllegalStateException being thrown prematurely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-2664: --- Attachment: legal.patch Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, YARN-2664.6.patch, YARN-2664.patch, legal.patch YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233644#comment-14233644 ] Wangda Tan commented on YARN-2495: -- [~Naganarasimha], Since size of the patch grows, and it will be hard for new people to review. I suggest to move conf-based node label provider implementation to a separated ticket under YARN-2492? And update title of this ticket accordingly. Thanks, Wangda Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml or using script suggested by [~aw]) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233658#comment-14233658 ] Carlo Curino commented on YARN-2664: [~mazzu], the patch looks good, I will give you a bunch of code-level comments to polish it a little further: # in YarnWebParams.java: can we name those parameter (JSON_USER, JSON_RES_NAME, JSON_FROM, JSON_TO) something more descriptive? Like PLAN_*? # is graph.js your code? If so format it a little more if you can (there are some very long lines). Also no need for declaring this is related to YARN-2664 in the header. # in DataPage.createJSON. Shall we make also a null check for getAllReservations() ? or are we sure it is never null? # in NavBlock would be good to add the Planner link only if reservation are enabled (look in YarnConfiguration for the switch). I am now try to set this up in a small cluster, and see how it looks/functions. Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, YARN-2664.6.patch, YARN-2664.patch, legal.patch YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233657#comment-14233657 ] Matteo Mazzucchelli commented on YARN-2664: --- bq. Make sure never to add apache license to any file you did not write yourself (please confirm) I added the apache license only into graph.js, file that i wrote. \\ bq. Add entries in the LICENSE.txt and NOTICE.txt files to disclaim we are using (d3, nvd3, underscore). Do you need all three? Yes. d3 is the basic library, nvd3 is an extension with some improvements and underscore provides useful functions. Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, YARN-2664.6.patch, YARN-2664.patch, legal.patch YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233660#comment-14233660 ] Carlo Curino commented on YARN-2664: Cool. Please include the changes I have in the legal.patch into your next patch, and address my other comments if you can. Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, YARN-2664.6.patch, YARN-2664.patch, legal.patch YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed
Wangda Tan created YARN-2920: Summary: CapacityScheduler should be notified when labels on nodes changed Key: YARN-2920 URL: https://issues.apache.org/jira/browse/YARN-2920 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, but that is not enough upon labels on nodes changes: - Scheduler should be able to do take actions to running containers. (Like kill/preempt/do-nothing) - Used / available capacity in scheduler should be updated for future planning. We need add a new event to pass such updates to scheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed
[ https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2920: - Issue Type: Sub-task (was: Bug) Parent: YARN-2492 CapacityScheduler should be notified when labels on nodes changed - Key: YARN-2920 URL: https://issues.apache.org/jira/browse/YARN-2920 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, but that is not enough upon labels on nodes changes: - Scheduler should be able to do take actions to running containers. (Like kill/preempt/do-nothing) - Used / available capacity in scheduler should be updated for future planning. We need add a new event to pass such updates to scheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233672#comment-14233672 ] Tsuyoshi OZAWA commented on YARN-2800: -- [~wangda] thanks for your update! Minor nits: {code} + public static final String NODE_LABELS_NOT_ENABLED_ERR = Node labels not + + enabled, you cannot make any changes on node labels, you can set + + YarnConfiguration.NODE_LABELS_ENABLED + + to true to enable this feature, please reference to user guide.; {code} I think we should simplify the error message. How about fixing like this? {code} Label-based scheduling is disabled. Please check + YarnConfiguration.NODE_LABELS_ENABLED; {code} Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature Key: YARN-2800 URL: https://issues.apache.org/jira/browse/YARN-2800 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, YARN-2800-20141119-1.patch In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad use experience. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot get started if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, any operations trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2914) Potential race condition in ClientSCMMetrics#getInstance()
[ https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233677#comment-14233677 ] Chris Trezzo commented on YARN-2914: Thanks [~ted_yu] and [~varun_saxena] for the find and patch! Talking with [~sjlee0], we were thinking that it might be most simple to just get rid of the init method and the enum all together. We can make it a more straightforward singleton pattern with a line like the following: {noformat} private static final CSM = create(); {noformat} The getInstance() method would then just return CSM. It will also be necessary to make the ClientSCMMetrics constructor private. What do you guys think? As another note, SharedCacheUploaderMetrics also has this bug. So we can either change that class as part of this patch or file a separate JIRA. Potential race condition in ClientSCMMetrics#getInstance() -- Key: YARN-2914 URL: https://issues.apache.org/jira/browse/YARN-2914 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2914.patch {code} public static ClientSCMMetrics getInstance() { ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl; if (topMetrics == null) { throw new IllegalStateException( {code} getInstance() doesn't hold lock on Singleton.this This may result in IllegalStateException being thrown prematurely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2914) Potential race condition in ClientSCMMetrics#getInstance()
[ https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233679#comment-14233679 ] Sangjin Lee commented on YARN-2914: --- Just to clarify, {code} private static final ClientSCMMetrics instance = create(); {code} Potential race condition in ClientSCMMetrics#getInstance() -- Key: YARN-2914 URL: https://issues.apache.org/jira/browse/YARN-2914 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2914.patch {code} public static ClientSCMMetrics getInstance() { ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl; if (topMetrics == null) { throw new IllegalStateException( {code} getInstance() doesn't hold lock on Singleton.this This may result in IllegalStateException being thrown prematurely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2921) MockRM#waitForState methods can be too slow and flaky
Karthik Kambatla created YARN-2921: -- Summary: MockRM#waitForState methods can be too slow and flaky Key: YARN-2921 URL: https://issues.apache.org/jira/browse/YARN-2921 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Karthik Kambatla MockRM#waitForState methods currently sleep for too long (2 seconds and 1 second). This leads to slow tests and sometimes failures if the App/AppAttempt moves to another state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2914) Potential race condition in ClientSCMMetrics#getInstance()
[ https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233685#comment-14233685 ] Tsuyoshi OZAWA commented on YARN-2914: -- [~ctrezzo], Thanks for your suggestion! Your idea make sense to me. I prototyped it and it seems to work well. +1 for the design. [~varun_saxena], could you update a patch on Chris's idea? Potential race condition in ClientSCMMetrics#getInstance() -- Key: YARN-2914 URL: https://issues.apache.org/jira/browse/YARN-2914 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2914.patch {code} public static ClientSCMMetrics getInstance() { ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl; if (topMetrics == null) { throw new IllegalStateException( {code} getInstance() doesn't hold lock on Singleton.this This may result in IllegalStateException being thrown prematurely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)
[ https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2900: Attachment: YARN-2900.patch [~zjshen], [~jeagles]: Attaching final patch with the fix and unit tests to verify it. Can you review? Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500) --- Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2914) Potential race condition in SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance()
[ https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2914: - Summary: Potential race condition in SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance() (was: Potential race condition in ClientSCMMetrics#getInstance()) Potential race condition in SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance() - Key: YARN-2914 URL: https://issues.apache.org/jira/browse/YARN-2914 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2914.patch {code} public static ClientSCMMetrics getInstance() { ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl; if (topMetrics == null) { throw new IllegalStateException( {code} getInstance() doesn't hold lock on Singleton.this This may result in IllegalStateException being thrown prematurely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2914) Potential race condition in SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance()
[ https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2914: - Description: {code} public static ClientSCMMetrics getInstance() { ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl; if (topMetrics == null) { throw new IllegalStateException( {code} getInstance() doesn't hold lock on Singleton.this This may result in IllegalStateException being thrown prematurely. [~ctrezzo] reported that SharedCacheUploaderMetrics has also same kind of race condition. was: {code} public static ClientSCMMetrics getInstance() { ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl; if (topMetrics == null) { throw new IllegalStateException( {code} getInstance() doesn't hold lock on Singleton.this This may result in IllegalStateException being thrown prematurely. Potential race condition in SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance() - Key: YARN-2914 URL: https://issues.apache.org/jira/browse/YARN-2914 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2914.patch {code} public static ClientSCMMetrics getInstance() { ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl; if (topMetrics == null) { throw new IllegalStateException( {code} getInstance() doesn't hold lock on Singleton.this This may result in IllegalStateException being thrown prematurely. [~ctrezzo] reported that SharedCacheUploaderMetrics has also same kind of race condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2914) Potential race condition in SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance()
[ https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233689#comment-14233689 ] Tsuyoshi OZAWA commented on YARN-2914: -- Updated description. Potential race condition in SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance() - Key: YARN-2914 URL: https://issues.apache.org/jira/browse/YARN-2914 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2914.patch {code} public static ClientSCMMetrics getInstance() { ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl; if (topMetrics == null) { throw new IllegalStateException( {code} getInstance() doesn't hold lock on Singleton.this This may result in IllegalStateException being thrown prematurely. [~ctrezzo] reported that SharedCacheUploaderMetrics has also same kind of race condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-2664: --- Attachment: screenshot_reservation_UI.pdf Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, YARN-2664.6.patch, YARN-2664.patch, legal.patch, screenshot_reservation_UI.pdf YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233690#comment-14233690 ] Carlo Curino commented on YARN-2664: I am running this on a 35 machine cluster, with a (modified) gridmix generating reservations and submitting jobs. It looks really nice (I am attaching a screenshot), well done [~mazzu]. One simple nice addition would be to show, beside the absolute memory assigned to the reservations, something that gives an idea of the overall plan utilization. For example, you can add next to the absolute values on the Y axis, some reference to the plan percentage (even just for the highest value). Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, YARN-2664.6.patch, YARN-2664.patch, legal.patch, screenshot_reservation_UI.pdf YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2301) Improve yarn container command
[ https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233692#comment-14233692 ] Hadoop QA commented on YARN-2301: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12684920/YARN-2301.20141203-1.patch against trunk revision a31e016. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5989//console This message is automatically generated. Improve yarn container command -- Key: YARN-2301 URL: https://issues.apache.org/jira/browse/YARN-2301 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Jian He Assignee: Naganarasimha G R Labels: usability Attachments: YARN-2301.01.patch, YARN-2301.03.patch, YARN-2301.20141120-1.patch, YARN-2301.20141203-1.patch, YARN-2303.patch While running yarn container -list Application Attempt ID command, some observations: 1) the scheme (e.g. http/https ) before LOG-URL is missing 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to print as time format. 3) finish-time is 0 if container is not yet finished. May be N/A 4) May have an option to run as yarn container -list appId OR yarn application -list-containers appId also. As attempt Id is not shown on console, this is easier for user to just copy the appId and run it, may also be useful for container-preserving AM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2921) MockRM#waitForState methods can be too slow and flaky
[ https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned YARN-2921: Assignee: Tsuyoshi OZAWA MockRM#waitForState methods can be too slow and flaky - Key: YARN-2921 URL: https://issues.apache.org/jira/browse/YARN-2921 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA MockRM#waitForState methods currently sleep for too long (2 seconds and 1 second). This leads to slow tests and sometimes failures if the App/AppAttempt moves to another state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2301) Improve yarn container command
[ https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233694#comment-14233694 ] Jian He commented on YARN-2301: --- looks good overall, - we do not need to expose the setter in RMContext interface {{public void setYarnConfiguration(Configuration yarnConfiguration);}} - changes in TestApplicationClientProtocolOnHA may be not needed. Improve yarn container command -- Key: YARN-2301 URL: https://issues.apache.org/jira/browse/YARN-2301 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Jian He Assignee: Naganarasimha G R Labels: usability Attachments: YARN-2301.01.patch, YARN-2301.03.patch, YARN-2301.20141120-1.patch, YARN-2301.20141203-1.patch, YARN-2303.patch While running yarn container -list Application Attempt ID command, some observations: 1) the scheme (e.g. http/https ) before LOG-URL is missing 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to print as time format. 3) finish-time is 0 if container is not yet finished. May be N/A 4) May have an option to run as yarn container -list appId OR yarn application -list-containers appId also. As attempt Id is not shown on console, this is easier for user to just copy the appId and run it, may also be useful for container-preserving AM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233698#comment-14233698 ] Jakob Homan commented on YARN-2664: --- There's a few extra lines of whitespace, but the actual content looks good to me, per what I understand the current requirements to be. Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, YARN-2664.6.patch, YARN-2664.patch, legal.patch, screenshot_reservation_UI.pdf YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)
[ https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233702#comment-14233702 ] Zhijie Shen commented on YARN-2900: --- Will take a look Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500) --- Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2921) MockRM#waitForState methods can be too slow and flaky
[ https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2921: - Attachment: YARN-2921.001.patch Attaching a first patch. 1. Making polling interval smaller(100msec). 2. Adding waitStateExecutor for polling the state. 3. Using CountDownLatch. MockRM#waitForState methods can be too slow and flaky - Key: YARN-2921 URL: https://issues.apache.org/jira/browse/YARN-2921 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-2921.001.patch MockRM#waitForState methods currently sleep for too long (2 seconds and 1 second). This leads to slow tests and sometimes failures if the App/AppAttempt moves to another state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled
[ https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233729#comment-14233729 ] Hadoop QA commented on YARN-2880: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12684800/YARN-2880.2.patch against trunk revision a31e016. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5988//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5988//console This message is automatically generated. Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled --- Key: YARN-2880 URL: https://issues.apache.org/jira/browse/YARN-2880 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Rohith Attachments: 0001-YARN-2880.patch, YARN-2880.1.patch, YARN-2880.1.patch, YARN-2880.2.patch As suggested by [~ozawa], [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569]. We should have a such test to make sure there will be no regression -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2922) Concurrent Modification Exception in LeafQueue when collecting applications
Jason Tufo created YARN-2922: Summary: Concurrent Modification Exception in LeafQueue when collecting applications Key: YARN-2922 URL: https://issues.apache.org/jira/browse/YARN-2922 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.5.1 Reporter: Jason Tufo java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.collectSchedulerApplications(LeafQueue.java:1618) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getAppsInQueue(CapacityScheduler.java:1119) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:798) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:234) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky
[ https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233741#comment-14233741 ] Karthik Kambatla commented on YARN-2921: Are 2 and 3 required here? MockRM#waitForState methods can be too slow and flaky - Key: YARN-2921 URL: https://issues.apache.org/jira/browse/YARN-2921 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-2921.001.patch MockRM#waitForState methods currently sleep for too long (2 seconds and 1 second). This leads to slow tests and sometimes failures if the App/AppAttempt moves to another state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky
[ https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233745#comment-14233745 ] Tsuyoshi OZAWA commented on YARN-2921: -- If we use the CountDownLatch, we can understand the timeout value more easily: {code} latch.await(40, TimeUnit.SECONDS)) {code} If I shouldn't do this here, I'll revert it. MockRM#waitForState methods can be too slow and flaky - Key: YARN-2921 URL: https://issues.apache.org/jira/browse/YARN-2921 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-2921.001.patch MockRM#waitForState methods currently sleep for too long (2 seconds and 1 second). This leads to slow tests and sometimes failures if the App/AppAttempt moves to another state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky
[ https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233748#comment-14233748 ] Tsuyoshi OZAWA commented on YARN-2921: -- About the MockAM#waitForState, should we fix it on this JIRA? MockRM#waitForState methods can be too slow and flaky - Key: YARN-2921 URL: https://issues.apache.org/jira/browse/YARN-2921 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-2921.001.patch MockRM#waitForState methods currently sleep for too long (2 seconds and 1 second). This leads to slow tests and sometimes failures if the App/AppAttempt moves to another state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky
[ https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233751#comment-14233751 ] Karthik Kambatla commented on YARN-2921: Using the latch would make sense if the threads have any other advantage; otherwise, I would leave both out and control time waited through number of iterations of the for loop. Fixing MockAM#waitFor here would be good too. We should make sure the cumulative wait time stays at least as long as what it is now to avoid any test failures. MockRM#waitForState methods can be too slow and flaky - Key: YARN-2921 URL: https://issues.apache.org/jira/browse/YARN-2921 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-2921.001.patch MockRM#waitForState methods currently sleep for too long (2 seconds and 1 second). This leads to slow tests and sometimes failures if the App/AppAttempt moves to another state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky
[ https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233758#comment-14233758 ] Karthik Kambatla commented on YARN-2921: Other than the smaller sleep, we should also handle the case where the App or AppAttempt enters the required state and then moves to a latter state. e.g. App moving to RUNNING state when we are waiting for it to get ACCEPTED. MockRM#waitForState methods can be too slow and flaky - Key: YARN-2921 URL: https://issues.apache.org/jira/browse/YARN-2921 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-2921.001.patch MockRM#waitForState methods currently sleep for too long (2 seconds and 1 second). This leads to slow tests and sometimes failures if the App/AppAttempt moves to another state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed
[ https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2920: - Target Version/s: 2.7.0 CapacityScheduler should be notified when labels on nodes changed - Key: YARN-2920 URL: https://issues.apache.org/jira/browse/YARN-2920 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2920.1.patch Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, but that is not enough upon labels on nodes changes: - Scheduler should be able to do take actions to running containers. (Like kill/preempt/do-nothing) - Used / available capacity in scheduler should be updated for future planning. We need add a new event to pass such updates to scheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed
[ https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2920: - Attachment: YARN-2920.1.patch Attached ver.1 patch for this CapacityScheduler should be notified when labels on nodes changed - Key: YARN-2920 URL: https://issues.apache.org/jira/browse/YARN-2920 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2920.1.patch Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, but that is not enough upon labels on nodes changes: - Scheduler should be able to do take actions to running containers. (Like kill/preempt/do-nothing) - Used / available capacity in scheduler should be updated for future planning. We need add a new event to pass such updates to scheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2800: - Attachment: YARN-2800-20141203-1.patch [~ozawa], thanks for your comment, make sense to me, updated. [~jianhe], could you take a look please? Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature Key: YARN-2800 URL: https://issues.apache.org/jira/browse/YARN-2800 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad use experience. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot get started if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, any operations trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM
[ https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233812#comment-14233812 ] Wangda Tan commented on YARN-2762: -- Hi [~rohithsharma], Thanks for working on this, the trimming itself looks good to me, but some comments about error message handling I think we should make the error message more consistent, my suggestion is: - If no labels specified when adding/removing labels, the message is No cluster node-labels are specified - If no node-to-labels mapping specified when replace labels, the message is No node-to-labels mappings are specified And we should make the two kinds of error message as a pre-defined final field of RMAdminCLI. Thoughts? RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM -- Key: YARN-2762 URL: https://issues.apache.org/jira/browse/YARN-2762 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.patch All NodeLabel args validation's are done at server side. The same can be done at RMAdminCLI so that unnecessary RPC calls can be avoided. And for the input such as x,y,,z,, no need to add empty string instead can be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2869) CapacityScheduler should trim sub queue names when parse configuration
[ https://issues.apache.org/jira/browse/YARN-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2869: - Attachment: YARN-2869-3.patch Attached ver.3 patch to re-kick Jenkins, not sure which change causes javadocs WARNING CapacityScheduler should trim sub queue names when parse configuration -- Key: YARN-2869 URL: https://issues.apache.org/jira/browse/YARN-2869 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2869-1.patch, YARN-2869-2.patch, YARN-2869-3.patch Currently, capacity scheduler doesn't trim sub queue name when parsing queue names, for example, the configuration {code} configuration property name...root.queues/name value a, b , c/value /property property name...root.b.capacity/name value100/value /property ... /property {code} Will fail with error: {code} java.lang.IllegalArgumentException: Illegal capacity of -1.0 for queue root. a at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getCapacity(CapacitySchedulerConfiguration.java:332) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getCapacityFromConf(LeafQueue.java:196) {code} It will try to find a queues with name a, b , and c, which is apparently wrong, we should do trimming on these sub queue names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2189) Admin service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2189: --- Attachment: YARN-2189-trunk-v6.patch Thanks for the comments [~kasha]. Attached is V6 which addresses most of the comments. Diff between V5 and V6: https://github.com/ctrezzo/hadoop/commit/e8d47fb3e8cea03c4f3545571f1b2c9593f0574e One thing that I didn't change is making SCMAdminProtocolService#checkAcls use RMServerUtils#verifyAccess. I started to do this, but then realized this would require the SharedCacheManager package to depend on the ResourceManager package. I started to move the verifyAccess method to yarn-server-common, but then realized that it uses the RMAuditLogger. I could create a slightly more generic verifyAccess method in yarn-server-common and make both servers use that if you want. Let me know. Thanks! Admin service for cache manager --- Key: YARN-2189 URL: https://issues.apache.org/jira/browse/YARN-2189 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch Implement the admin service for the shared cache manager. This service is responsible for handling administrative commands such as manually running a cleaner task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2922) Concurrent Modification Exception in LeafQueue when collecting applications
[ https://issues.apache.org/jira/browse/YARN-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-2922: Assignee: Rohith Concurrent Modification Exception in LeafQueue when collecting applications --- Key: YARN-2922 URL: https://issues.apache.org/jira/browse/YARN-2922 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.5.1 Reporter: Jason Tufo Assignee: Rohith java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.collectSchedulerApplications(LeafQueue.java:1618) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getAppsInQueue(CapacityScheduler.java:1119) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:798) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:234) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2189) Admin service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233871#comment-14233871 ] Karthik Kambatla commented on YARN-2189: bq. I could create a slightly more generic verifyAccess method in yarn-server-common and make both servers use that if you want. Let me know. If it is not too much trouble, that would be nice. Other than that, there is one unused import: {code} import org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocol; {code} Admin service for cache manager --- Key: YARN-2189 URL: https://issues.apache.org/jira/browse/YARN-2189 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch Implement the admin service for the shared cache manager. This service is responsible for handling administrative commands such as manually running a cleaner task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2918) RM starts up fails if accessible-node-labels are configured to queue without cluster lables
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233877#comment-14233877 ] Rohith commented on YARN-2918: -- IIUC, Adding cluster node labels are supported only through rmadmin or rest API. But for accessible node labels are conifgured via xml file. As admin, he might need to do pre-configure cluster node label and accessible node labels in xml instead of triggering rmadmin command. [~leftnoteasy] Would you help me to understand why the behavior is like above? Probably discussion might have happend regarding this in any other jira but I might not be aware of it !! RM starts up fails if accessible-node-labels are configured to queue without cluster lables --- Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith I configured accessible-node-labels to queue. But RM startup fails with below exception. I see current steps to configure NodeLabel is first need to add via rmadmin and later need to configure for queues. But it will be good if both cluster and queue node labels has consitency in configuring it. {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:982) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1203) Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)