[jira] [Commented] (YARN-312) Add updateNodeResource in ResourceManagerAdministrationProtocol
[ https://issues.apache.org/jira/browse/YARN-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823513#comment-13823513 ] Luke Lu commented on YARN-312: -- v5.1 looks reasonable (the verbosity of pb impl stuff still boggles my mind). +1. Add updateNodeResource in ResourceManagerAdministrationProtocol --- Key: YARN-312 URL: https://issues.apache.org/jira/browse/YARN-312 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.2.0 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-312-v1.patch, YARN-312-v2.patch, YARN-312-v3.patch, YARN-312-v4.1.patch, YARN-312-v4.patch, YARN-312-v5.1.patch, YARN-312-v5.patch Add fundamental RPC (ResourceManagerAdministrationProtocol) to support node's resource change. For design detail, please refer parent JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1401) With zero sleep-delay-before-sigkill.ms, no signal is ever sent
[ https://issues.apache.org/jira/browse/YARN-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823559#comment-13823559 ] Hudson commented on YARN-1401: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #392 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/392/]) YARN-1401. With zero sleep-delay-before-sigkill.ms, no signal is ever sent (Gera Shegalov via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1542038) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java With zero sleep-delay-before-sigkill.ms, no signal is ever sent --- Key: YARN-1401 URL: https://issues.apache.org/jira/browse/YARN-1401 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Fix For: 2.3.0 Attachments: YARN-1401.v02.patch, YARN-401.v01.patch If you set in yarn-site.xml yarn.nodemanager.sleep-delay-before-sigkill.ms=0 then an unresponsive child JVM is never killed. In MRv1, TT used to immediately SIGKILL in this case. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1392) Allow sophisticated app-to-queue placement policies in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823555#comment-13823555 ] Hudson commented on YARN-1392: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #392 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/392/]) YARN-1392: Add new files (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1542106) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SimpleGroupsMapping.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java YARN-1392. Allow sophisticated app-to-queue placement policies in the Fair Scheduler (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1542105) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm Allow sophisticated app-to-queue placement policies in the Fair Scheduler - Key: YARN-1392 URL: https://issues.apache.org/jira/browse/YARN-1392 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-1392-1.patch, YARN-1392-1.patch, YARN-1392-2.patch, YARN-1392-3.patch, YARN-1392.patch Currently the Fair Scheduler supports app-to-queue placement by username. It would be beneficial to allow more sophisticated policies that rely on primary and secondary groups and fallbacks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing
[ https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823558#comment-13823558 ] Hudson commented on YARN-1222: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #392 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/392/]) YARN-1222. Make improvements in ZKRMStateStore for fencing (Karthik Kambatla via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1541995) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ZKUtil.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestZKUtil.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreOperationFailedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreOperationFailedEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/StoreFencedException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java Make improvements in ZKRMStateStore for fencing --- Key: YARN-1222 URL: https://issues.apache.org/jira/browse/YARN-1222 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Fix For: 2.3.0 Attachments: yarn-1222-1.patch, yarn-1222-10.patch, yarn-1222-2.patch, yarn-1222-3.patch, yarn-1222-4.patch, yarn-1222-5.patch, yarn-1222-6.patch, yarn-1222-7.patch, yarn-1222-8.patch, yarn-1222-8.patch, yarn-1222-9.patch Using multi-operations for every ZK interaction. In every operation, automatically creating/deleting a lock znode that is the child of the root znode. This is to achieve fencing by modifying the create/delete permissions on the root znode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-732) YARN support for container isolation on Windows
[ https://issues.apache.org/jira/browse/YARN-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823595#comment-13823595 ] Remus Rusanu commented on YARN-732: --- I uploaded your diff on review board https://reviews.apache.org/r/15575/ YARN support for container isolation on Windows --- Key: YARN-732 URL: https://issues.apache.org/jira/browse/YARN-732 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: trunk-win Reporter: Kyle Leckie Labels: security Fix For: trunk-win Attachments: winutils.diff There is no ContainerExecutor on windows that can launch containers in a manner that creates: 1) container isolation 2) container execution with reduced rights I am working on patches that will add the ability to launch containers in a process with a reduced access token. Update: After examining several approaches I have settled on launching the task as a domain user. I have attached the current winutils diff which is a work in progress. Work remaining: - Create isolated desktop for task processes. - Set integrity of spawned processed to low. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1411) HA config shouldn't affect NodeManager RPC addresses
[ https://issues.apache.org/jira/browse/YARN-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823607#comment-13823607 ] Karthik Kambatla commented on YARN-1411: In the cluster tests, the NM was coming up at the default port (0) and not the specified port. I checked the RM-specific ports and made sure they were all correct but missed the NM ports. Thanks to [~wypoon], we found this problem in further QE. HA config shouldn't affect NodeManager RPC addresses Key: YARN-1411 URL: https://issues.apache.org/jira/browse/YARN-1411 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Labels: ha Attachments: yarn-1411-1.patch, yarn-1411-2.patch When HA is turned on, {{YarnConfiguration#getSoketAddress()}} fetches rpc-addresses corresponding to the specified rm-id. This should only be for RM rpc-addresses. Other confs, like NM rpc-addresses shouldn't be affected by this. Currently, the NM address settings in yarn-site.xml aren't reflected in the actual ports. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing
[ https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823642#comment-13823642 ] Hudson commented on YARN-1222: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1583 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1583/]) YARN-1222. Make improvements in ZKRMStateStore for fencing (Karthik Kambatla via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1541995) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ZKUtil.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestZKUtil.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreOperationFailedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreOperationFailedEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/StoreFencedException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java Make improvements in ZKRMStateStore for fencing --- Key: YARN-1222 URL: https://issues.apache.org/jira/browse/YARN-1222 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Fix For: 2.3.0 Attachments: yarn-1222-1.patch, yarn-1222-10.patch, yarn-1222-2.patch, yarn-1222-3.patch, yarn-1222-4.patch, yarn-1222-5.patch, yarn-1222-6.patch, yarn-1222-7.patch, yarn-1222-8.patch, yarn-1222-8.patch, yarn-1222-9.patch Using multi-operations for every ZK interaction. In every operation, automatically creating/deleting a lock znode that is the child of the root znode. This is to achieve fencing by modifying the create/delete permissions on the root znode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1392) Allow sophisticated app-to-queue placement policies in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823639#comment-13823639 ] Hudson commented on YARN-1392: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1583 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1583/]) YARN-1392: Add new files (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1542106) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SimpleGroupsMapping.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java YARN-1392. Allow sophisticated app-to-queue placement policies in the Fair Scheduler (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1542105) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm Allow sophisticated app-to-queue placement policies in the Fair Scheduler - Key: YARN-1392 URL: https://issues.apache.org/jira/browse/YARN-1392 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-1392-1.patch, YARN-1392-1.patch, YARN-1392-2.patch, YARN-1392-3.patch, YARN-1392.patch Currently the Fair Scheduler supports app-to-queue placement by username. It would be beneficial to allow more sophisticated policies that rely on primary and secondary groups and fallbacks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1401) With zero sleep-delay-before-sigkill.ms, no signal is ever sent
[ https://issues.apache.org/jira/browse/YARN-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823643#comment-13823643 ] Hudson commented on YARN-1401: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1583 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1583/]) YARN-1401. With zero sleep-delay-before-sigkill.ms, no signal is ever sent (Gera Shegalov via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1542038) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java With zero sleep-delay-before-sigkill.ms, no signal is ever sent --- Key: YARN-1401 URL: https://issues.apache.org/jira/browse/YARN-1401 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Fix For: 2.3.0 Attachments: YARN-1401.v02.patch, YARN-401.v01.patch If you set in yarn-site.xml yarn.nodemanager.sleep-delay-before-sigkill.ms=0 then an unresponsive child JVM is never killed. In MRv1, TT used to immediately SIGKILL in this case. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1401) With zero sleep-delay-before-sigkill.ms, no signal is ever sent
[ https://issues.apache.org/jira/browse/YARN-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823662#comment-13823662 ] Hudson commented on YARN-1401: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1609/]) YARN-1401. With zero sleep-delay-before-sigkill.ms, no signal is ever sent (Gera Shegalov via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1542038) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java With zero sleep-delay-before-sigkill.ms, no signal is ever sent --- Key: YARN-1401 URL: https://issues.apache.org/jira/browse/YARN-1401 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Fix For: 2.3.0 Attachments: YARN-1401.v02.patch, YARN-401.v01.patch If you set in yarn-site.xml yarn.nodemanager.sleep-delay-before-sigkill.ms=0 then an unresponsive child JVM is never killed. In MRv1, TT used to immediately SIGKILL in this case. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing
[ https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823661#comment-13823661 ] Hudson commented on YARN-1222: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1609/]) YARN-1222. Make improvements in ZKRMStateStore for fencing (Karthik Kambatla via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1541995) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ZKUtil.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestZKUtil.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreOperationFailedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreOperationFailedEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/StoreFencedException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java Make improvements in ZKRMStateStore for fencing --- Key: YARN-1222 URL: https://issues.apache.org/jira/browse/YARN-1222 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Fix For: 2.3.0 Attachments: yarn-1222-1.patch, yarn-1222-10.patch, yarn-1222-2.patch, yarn-1222-3.patch, yarn-1222-4.patch, yarn-1222-5.patch, yarn-1222-6.patch, yarn-1222-7.patch, yarn-1222-8.patch, yarn-1222-8.patch, yarn-1222-9.patch Using multi-operations for every ZK interaction. In every operation, automatically creating/deleting a lock znode that is the child of the root znode. This is to achieve fencing by modifying the create/delete permissions on the root znode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1271) Text file busy errors launching containers again
[ https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823672#comment-13823672 ] Vaughn E. Clinton commented on YARN-1271: - Can someone point me to link that can explain how the launch_container.sh is suppose to get to a local file system during a run? I can't find any documentation about how this and other scripts are shipped out across the allocation and I feel this would be helpful towards a better understanding of related failures. Text file busy errors launching containers again -- Key: YARN-1271 URL: https://issues.apache.org/jira/browse/YARN-1271 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.2.0 Attachments: YARN-1271-branch-2.patch, YARN-1271.patch The error is shown below in the comments. MAPREDUCE-2374 fixed this by removing -c when running the container launch script. It looks like the -c got brought back during the windows branch merge, so we should remove it again. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1412) Allocating Containers on a particular Node in Yarn
[ https://issues.apache.org/jira/browse/YARN-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaurav gupta updated YARN-1412: --- Description: I am trying to allocate containers on a particular node in Yarn but Yarn is returning me containers on different node although the requested node has resources available. Here is the snippet of the code that I am using AMRMClientContainerRequest amRmClient = AMRMClient.createAMRMClient();; String host = h1; Resource capability = Records.newRecord(Resource.class); capability.setMemory(memory); nodes = new String[] {host}; // in order to request a host, we also have to request the rack racks = new String[] {/default-rack}; ListContainerRequest containerRequests = new ArrayListContainerRequest(); ListContainerId releasedContainers = new ArrayListContainerId(); containerRequests.add(new ContainerRequest(capability, nodes, racks, Priority.newInstance(priority))); if (containerRequests.size() 0) { LOG.info(Asking RM for containers: + containerRequests); for (ContainerRequest cr : containerRequests) { LOG.info(Requested container: {}, cr.toString()); amRmClient.addContainerRequest(cr); } } for (ContainerId containerId : releasedContainers) { LOG.info(Released container, id={}, containerId.getId()); amRmClient.releaseAssignedContainer(containerId); } return amRmClient.allocate(0); was: I am trying to allocate containers on a particular node in Yarn but Yarn is returning me containers on different node although the requested node has resources available. Here is the snippet of the code that I am using AMRMClientContainerRequest amRmClient = AMRMClient.createAMRMClient();; String host = h1; Resource capability = Records.newRecord(Resource.class); capability.setMemory(memory); nodes = new String[] {host}; // in order to request a host, we also have to request the rack racks = new String[] {/default-rack}; ListContainerRequest containerRequests = new ArrayListContainerRequest(); ListContainerId releasedContainers = new ArrayListContainerId(); containerRequests.add(new ContainerRequest(capability, nodes, racks, Priority.newInstance(priority),false)); if (containerRequests.size() 0) { LOG.info(Asking RM for containers: + containerRequests); for (ContainerRequest cr : containerRequests) { LOG.info(Requested container: {}, cr.toString()); amRmClient.addContainerRequest(cr); } } for (ContainerId containerId : releasedContainers) { LOG.info(Released container, id={}, containerId.getId()); amRmClient.releaseAssignedContainer(containerId); } return amRmClient.allocate(0); Allocating Containers on a particular Node in Yarn -- Key: YARN-1412 URL: https://issues.apache.org/jira/browse/YARN-1412 Project: Hadoop YARN Issue Type: Bug Environment: centos, Hadoop 2.2.0 Reporter: gaurav gupta I am trying to allocate containers on a particular node in Yarn but Yarn is returning me containers on different node although the requested node has resources available. Here is the snippet of the code that I am using AMRMClientContainerRequest amRmClient = AMRMClient.createAMRMClient();; String host = h1; Resource capability = Records.newRecord(Resource.class); capability.setMemory(memory); nodes = new String[] {host}; // in order to request a host, we also have to request the rack racks = new String[] {/default-rack}; ListContainerRequest containerRequests = new ArrayListContainerRequest(); ListContainerId releasedContainers = new ArrayListContainerId(); containerRequests.add(new ContainerRequest(capability, nodes, racks, Priority.newInstance(priority))); if (containerRequests.size() 0) { LOG.info(Asking RM for containers: + containerRequests); for (ContainerRequest cr : containerRequests) { LOG.info(Requested container: {}, cr.toString()); amRmClient.addContainerRequest(cr); } } for (ContainerId containerId : releasedContainers) { LOG.info(Released container, id={}, containerId.getId()); amRmClient.releaseAssignedContainer(containerId); } return amRmClient.allocate(0); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1411) HA config shouldn't affect NodeManager RPC addresses
[ https://issues.apache.org/jira/browse/YARN-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823870#comment-13823870 ] Bikas Saha commented on YARN-1411: -- That explains why the NM would come up. Can we please make this a helper method in HAUtil? Or alternatively move RPC_ADDRESS_CONF_KEYS into YarnConfiguration? {code}+HAUtil.RPC_ADDRESS_CONF_KEYS.contains(name)) {{code} HA config shouldn't affect NodeManager RPC addresses Key: YARN-1411 URL: https://issues.apache.org/jira/browse/YARN-1411 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Labels: ha Attachments: yarn-1411-1.patch, yarn-1411-2.patch When HA is turned on, {{YarnConfiguration#getSoketAddress()}} fetches rpc-addresses corresponding to the specified rm-id. This should only be for RM rpc-addresses. Other confs, like NM rpc-addresses shouldn't be affected by this. Currently, the NM address settings in yarn-site.xml aren't reflected in the actual ports. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1412) Allocating Containers on a particular Node in Yarn
[ https://issues.apache.org/jira/browse/YARN-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823902#comment-13823902 ] gaurav gupta commented on YARN-1412: Here is the synopsis of the various combinations Node_Set Rack_Set Relax locality Yes No FALSE I get back on the node, but then fallback doesn't work Yes No TRUEI don't get back the correct node Yes YesT/F I don't get back the correct node I am attaching the logs when Node is Yes and Rack is False and Relax is true. The containers for which it is not working is container_1384534729839_0001_01_02 and container_1384534729839_0001_01_04 2013-11-15 09:00:38,116 ResourceManager Event Processor DEBUG fica.FiCaSchedulerApp (FiCaSchedulerApp.java:showRequests(335)) - showRequests: application=application_1384534729839_0001 headRoom=memory:9091072, vCores:0 currentConsumption=2048 2013-11-15 09:00:38,116 ResourceManager Event Processor DEBUG fica.FiCaSchedulerApp (FiCaSchedulerApp.java:showRequests(339)) - showRequests: application=application_1384534729839_0001 request={Priority: 0, Capability: memory:8192, vCores:1, # Containers: 1, Location: /default-rack, Relax Locality: true} 2013-11-15 09:00:38,116 ResourceManager Event Processor DEBUG fica.FiCaSchedulerApp (FiCaSchedulerApp.java:showRequests(339)) - showRequests: application=application_1384534729839_0001 request={Priority: 0, Capability: memory:8192, vCores:1, # Containers: 1, Location: *, Relax Locality: true} 2013-11-15 09:00:38,116 IPC Server handler 43 on 8031 DEBUG security.UserGroupInformation (UserGroupInformation.java:logPrivilegedAction(1513)) - PrivilegedAction as:hadoop (auth:SIMPLE) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) 2013-11-15 09:00:38,116 ResourceManager Event Processor DEBUG fica.FiCaSchedulerApp (FiCaSchedulerApp.java:showRequests(339)) - showRequests: application=application_1384534729839_0001 request={Priority: 0, Capability: memory:8192, vCores:1, # Containers: 1, Location: node10.morado.com, Relax Locality: true} 2013-11-15 09:00:38,117 ResourceManager Event Processor DEBUG fica.FiCaSchedulerApp (FiCaSchedulerApp.java:showRequests(335)) - showRequests: application=application_1384534729839_0001 headRoom=memory:9091072, vCores:0 currentConsumption=2048 2013-11-15 09:00:38,117 ResourceManager Event Processor DEBUG fica.FiCaSchedulerApp (FiCaSchedulerApp.java:showRequests(339)) - showRequests: application=application_1384534729839_0001 request={Priority: 1, Capability: memory:8192, vCores:1, # Containers: 1, Location: *, Relax Locality: true} 2013-11-15 09:00:38,117 AsyncDispatcher event handler DEBUG event.AsyncDispatcher (AsyncDispatcher.java:dispatch(125)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType: STATUS_UPDATE 2013-11-15 09:00:38,117 ResourceManager Event Processor DEBUG fica.FiCaSchedulerApp (FiCaSchedulerApp.java:showRequests(335)) - showRequests: application=application_1384534729839_0001 headRoom=memory:9091072, vCores:0 currentConsumption=2048 2013-11-15 09:00:38,117 AsyncDispatcher event handler DEBUG rmnode.RMNodeImpl (RMNodeImpl.java:handle(354)) - Processing node6.morado.com:39327 of type STATUS_UPDATE 2013-11-15 09:00:38,117 ResourceManager Event Processor DEBUG fica.FiCaSchedulerApp (FiCaSchedulerApp.java:showRequests(339)) - showRequests: application=application_1384534729839_0001 request={Priority: 2, Capability: memory:8192, vCores:1, # Containers: 1, Location: /default-rack, Relax Locality: true} 2013-11-15 09:00:38,117 AsyncDispatcher event handler DEBUG event.AsyncDispatcher (AsyncDispatcher.java:dispatch(125)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType: NODE_UPDATE 2013-11-15 09:00:38,118 ResourceManager Event Processor DEBUG fica.FiCaSchedulerApp (FiCaSchedulerApp.java:showRequests(339)) - showRequests: application=application_1384534729839_0001 request={Priority: 2, Capability: memory:8192, vCores:1, # Containers: 1, Location: *, Relax Locality: true} 2013-11-15 09:00:38,118 ResourceManager Event Processor DEBUG fica.FiCaSchedulerApp (FiCaSchedulerApp.java:showRequests(339)) - showRequests: application=application_1384534729839_0001 request={Priority: 2, Capability: memory:8192, vCores:1, # Containers: 1, Location: node18.morado.com, Relax Locality: true} 2013-11-15 09:00:38,118 ResourceManager Event Processor DEBUG capacity.LeafQueue (LeafQueue.java:computeUserLimit(1056)) - User limit computation for gaurav in queue default userLimit=100 userLimitFactor=1.0 required: memory:8192, vCores:1 consumed: memory:2048, vCores:1 limit: memory:9093120, vCores:1 queueCapacity: memory:9093120, vCores:1 qconsumed:
[jira] [Updated] (YARN-1411) HA config shouldn't affect NodeManager RPC addresses
[ https://issues.apache.org/jira/browse/YARN-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1411: --- Attachment: yarn-1411-3.patch Moved RPC_ADDRESS_CONF_KEYS to YarnConfiguration HA config shouldn't affect NodeManager RPC addresses Key: YARN-1411 URL: https://issues.apache.org/jira/browse/YARN-1411 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Labels: ha Attachments: yarn-1411-1.patch, yarn-1411-2.patch, yarn-1411-3.patch When HA is turned on, {{YarnConfiguration#getSoketAddress()}} fetches rpc-addresses corresponding to the specified rm-id. This should only be for RM rpc-addresses. Other confs, like NM rpc-addresses shouldn't be affected by this. Currently, the NM address settings in yarn-site.xml aren't reflected in the actual ports. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1411) HA config shouldn't affect NodeManager RPC addresses
[ https://issues.apache.org/jira/browse/YARN-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823967#comment-13823967 ] Hadoop QA commented on YARN-1411: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614102/yarn-1411-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2462//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2462//console This message is automatically generated. HA config shouldn't affect NodeManager RPC addresses Key: YARN-1411 URL: https://issues.apache.org/jira/browse/YARN-1411 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Labels: ha Attachments: yarn-1411-1.patch, yarn-1411-2.patch, yarn-1411-3.patch When HA is turned on, {{YarnConfiguration#getSoketAddress()}} fetches rpc-addresses corresponding to the specified rm-id. This should only be for RM rpc-addresses. Other confs, like NM rpc-addresses shouldn't be affected by this. Currently, the NM address settings in yarn-site.xml aren't reflected in the actual ports. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1411) HA config shouldn't affect NodeManager RPC addresses
[ https://issues.apache.org/jira/browse/YARN-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823974#comment-13823974 ] Hudson commented on YARN-1411: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4747 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4747/]) YARN-1411. HA config shouldn't affect NodeManager RPC addresses (Karthik Kambatla via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1542367) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestHAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java HA config shouldn't affect NodeManager RPC addresses Key: YARN-1411 URL: https://issues.apache.org/jira/browse/YARN-1411 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Labels: ha Attachments: yarn-1411-1.patch, yarn-1411-2.patch, yarn-1411-3.patch When HA is turned on, {{YarnConfiguration#getSoketAddress()}} fetches rpc-addresses corresponding to the specified rm-id. This should only be for RM rpc-addresses. Other confs, like NM rpc-addresses shouldn't be affected by this. Currently, the NM address settings in yarn-site.xml aren't reflected in the actual ports. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh
[ https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823989#comment-13823989 ] Sandy Ryza commented on YARN-584: - Thanks for making these changes. The nodes in ExpandNode and StoreExpandedNode correspond to scheduler queues, right? My suggestion was to add Queue to the names to make this correspondence more explicit. Am I misunderstanding? In fair scheduler web UI, queues unexpand on refresh Key: YARN-584 URL: https://issues.apache.org/jira/browse/YARN-584 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Labels: newbie Attachments: YARN-584-branch-2.2.0.patch, YARN-584-branch-2.2.0.patch, YARN-584-branch-2.2.0.patch, YARN-584-branch-2.2.0.patch In the fair scheduler web UI, you can expand queue information. Refreshing the page causes the expansions to go away, which is annoying for someone who wants to monitor the scheduler page and needs to reopen all the queues they care about each time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1242) AHS start as independent process
[ https://issues.apache.org/jira/browse/YARN-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823987#comment-13823987 ] Zhijie Shen commented on YARN-1242: --- 1. Do not depend on RM's log4j {code} + CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/rm-config/log4j.properties {code} 2. YARN_HISTORYSERVER_HEAPSIZE should be commented in yarn-env as well 3. hadoop-yarn-dist.xml needs to be updated as well. Would you please double check the complete project to see whether there's some other stuff missing for creating a correct distribution? Thanks! 4. Would you please verify starting AHS locally, in particular starting AHS on the machine that RM is not there (including start it as daemon)? Then, you can verify whether AHS is completely independent of RM. AHS start as independent process Key: YARN-1242 URL: https://issues.apache.org/jira/browse/YARN-1242 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Attachments: YARN-1242-1.patch, YARN-1242-2.patch Add the command in yarn and yarn.cmd to start and stop AHS -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1332) In TestAMRMClient, replace assertTrue with assertEquals where possible
[ https://issues.apache.org/jira/browse/YARN-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824032#comment-13824032 ] Sandy Ryza commented on YARN-1332: -- Thanks for picking this up Sebastian. There are a few places assertEquals could be substituted for assertTrue for non-zero values. For example: {code} assertTrue(allocatedContainerCount == containersRequestedAny); assertTrue(amClient.release.size() == 2); {code} Would you mind replacing those too? In TestAMRMClient, replace assertTrue with assertEquals where possible -- Key: YARN-1332 URL: https://issues.apache.org/jira/browse/YARN-1332 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Priority: Minor Labels: newbie Attachments: YARN-1332.patch TestAMRMClient uses a lot of assertTrue(amClient.ask.size() == 0) where assertEquals(0, amClient.ask.size()) would make it easier to see why it's failing at a glance. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1407) apps REST API filters queries by YarnApplicationState, but includes RMAppStates in response
[ https://issues.apache.org/jira/browse/YARN-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824088#comment-13824088 ] Zhijie Shen commented on YARN-1407: --- NavBlock needs to be fixed as well. apps REST API filters queries by YarnApplicationState, but includes RMAppStates in response --- Key: YARN-1407 URL: https://issues.apache.org/jira/browse/YARN-1407 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1407-1.patch, YARN-1407.patch RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values that come from it. It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change. The change would only reduce the set of possible strings that the API returns, so I think not. We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
Siqi Li created YARN-1414: - Summary: with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs Key: YARN-1414 URL: https://issues.apache.org/jira/browse/YARN-1414 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Siqi Li Assignee: Siqi Li -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1407) RM Web UI and REST APIs should uniformly use YarnApplicationState
[ https://issues.apache.org/jira/browse/YARN-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1407: -- Description: RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values or list filters that come from it. However, some Blocks and AppInfo are still using RMAppState. It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change. The change would only reduce the set of possible strings that the API returns, so I think not. We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1). was: RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values that come from it. It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change. The change would only reduce the set of possible strings that the API returns, so I think not. We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1). Summary: RM Web UI and REST APIs should uniformly use YarnApplicationState (was: apps REST API filters queries by YarnApplicationState, but includes RMAppStates in response) RM Web UI and REST APIs should uniformly use YarnApplicationState - Key: YARN-1407 URL: https://issues.apache.org/jira/browse/YARN-1407 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1407-1.patch, YARN-1407.patch RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values or list filters that come from it. However, some Blocks and AppInfo are still using RMAppState. It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change. The change would only reduce the set of possible strings that the API returns, so I think not. We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1407) RM Web UI and REST APIs should uniformly use YarnApplicationState
[ https://issues.apache.org/jira/browse/YARN-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1407: - Attachment: YARN-1407-2.patch RM Web UI and REST APIs should uniformly use YarnApplicationState - Key: YARN-1407 URL: https://issues.apache.org/jira/browse/YARN-1407 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1407-1.patch, YARN-1407-2.patch, YARN-1407.patch RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values or list filters that come from it. However, some Blocks and AppInfo are still using RMAppState. It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change. The change would only reduce the set of possible strings that the API returns, so I think not. We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1407) RM Web UI and REST APIs should uniformly use YarnApplicationState
[ https://issues.apache.org/jira/browse/YARN-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824094#comment-13824094 ] Sandy Ryza commented on YARN-1407: -- Thanks Zhijie, good catch. Uploading a new aptch. RM Web UI and REST APIs should uniformly use YarnApplicationState - Key: YARN-1407 URL: https://issues.apache.org/jira/browse/YARN-1407 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1407-1.patch, YARN-1407-2.patch, YARN-1407.patch RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values or list filters that come from it. However, some Blocks and AppInfo are still using RMAppState. It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change. The change would only reduce the set of possible strings that the API returns, so I think not. We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1414: -- Attachment: YARN-1221-subtask.v1.patch.txt with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs - Key: YARN-1414 URL: https://issues.apache.org/jira/browse/YARN-1414 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Fix For: 2.2.0 Attachments: YARN-1221-subtask.v1.patch.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1414: -- Affects Version/s: 2.0.5-alpha with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs - Key: YARN-1414 URL: https://issues.apache.org/jira/browse/YARN-1414 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Fix For: 2.2.0 Attachments: YARN-1221-subtask.v1.patch.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1407) RM Web UI and REST APIs should uniformly use YarnApplicationState
[ https://issues.apache.org/jira/browse/YARN-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824128#comment-13824128 ] Hadoop QA commented on YARN-1407: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614127/YARN-1407-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2463//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2463//console This message is automatically generated. RM Web UI and REST APIs should uniformly use YarnApplicationState - Key: YARN-1407 URL: https://issues.apache.org/jira/browse/YARN-1407 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1407-1.patch, YARN-1407-2.patch, YARN-1407.patch RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values or list filters that come from it. However, some Blocks and AppInfo are still using RMAppState. It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change. The change would only reduce the set of possible strings that the API returns, so I think not. We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1239) Save version information in the state store
[ https://issues.apache.org/jira/browse/YARN-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1239: -- Attachment: YARN-1239.3.patch Save version information in the state store --- Key: YARN-1239 URL: https://issues.apache.org/jira/browse/YARN-1239 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-1239.1.patch, YARN-1239.2.patch, YARN-1239.3.patch, YARN-1239.patch When creating root dir for the first time we should write version 1. If root dir exists then we should check that the version in the state store matches the version from config. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1403: - Attachment: YARN-1403.patch Separate out configuration loading from QueueManager in the Fair Scheduler -- Key: YARN-1403 URL: https://issues.apache.org/jira/browse/YARN-1403 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1403.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1239) Save version information in the state store
[ https://issues.apache.org/jira/browse/YARN-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824138#comment-13824138 ] Jian He commented on YARN-1239: --- Thanks [~ozawa] for the reviews. bq. DUMMY_VERSION_INFO and VERSION_INFO should be initialized as 1L, not 1 bq. These code should pass long value. Fixed. bq. We can define strings of error messages as static final in RMStateStore I think it's fine as that static final field will only be used by test case. I wanted to assert the exception type instead of exception msg, but turns out that exception is stringfied and pass into the ExitException. Save version information in the state store --- Key: YARN-1239 URL: https://issues.apache.org/jira/browse/YARN-1239 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-1239.1.patch, YARN-1239.2.patch, YARN-1239.3.patch, YARN-1239.patch When creating root dir for the first time we should write version 1. If root dir exists then we should check that the version in the state store matches the version from config. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824140#comment-13824140 ] Hadoop QA commented on YARN-1414: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614129/YARN-1221-subtask.v1.patch.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2464//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2464//console This message is automatically generated. with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs - Key: YARN-1414 URL: https://issues.apache.org/jira/browse/YARN-1414 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Fix For: 2.2.0 Attachments: YARN-1221-subtask.v1.patch.txt -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1405) RM should crash and print permission error for nonwritable/readable local path in yarn.resourcemanager.fs.state-store.uri
[ https://issues.apache.org/jira/browse/YARN-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-1405. --- Resolution: Cannot Reproduce RM should crash and print permission error for nonwritable/readable local path in yarn.resourcemanager.fs.state-store.uri - Key: YARN-1405 URL: https://issues.apache.org/jira/browse/YARN-1405 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Yesha Vora Enable yarn.resourcemanager.recovery.enabled=true and Pass a local path to yarn.resourcemanager.fs.state-store.uri. such as file:///tmp/MYTMP if the directory /tmp/MYTMP is not readable or writable, RM should crash and should print Permission denied Error Currently, RM throws java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist Error. RM returns Exiting status 1 but RM process does not shutdown. Snapshot of Resource manager log: 2013-09-27 18:31:36,621 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for nm-tokens 2013-09-27 18:31:36,694 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(640)) - Failed to load/recover state java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:379) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1478) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1518) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:188) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:635) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855) 2013-09-27 18:31:36,697 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1241: - Attachment: YARN-1241-9.patch In Fair Scheduler maxRunningApps does not work for non-leaf queues -- Key: YARN-1241 URL: https://issues.apache.org/jira/browse/YARN-1241 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, YARN-1241-4.patch, YARN-1241-5.patch, YARN-1241-6.patch, YARN-1241-7.patch, YARN-1241-8.patch, YARN-1241-9.patch, YARN-1241.patch Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824146#comment-13824146 ] Sandy Ryza commented on YARN-1241: -- Rebased patch after YARN-1392 In Fair Scheduler maxRunningApps does not work for non-leaf queues -- Key: YARN-1241 URL: https://issues.apache.org/jira/browse/YARN-1241 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, YARN-1241-4.patch, YARN-1241-5.patch, YARN-1241-6.patch, YARN-1241-7.patch, YARN-1241-8.patch, YARN-1241-9.patch, YARN-1241.patch Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1415) In scheduler UI, including used memory in Memory Total seems to be inaccurate
Siqi Li created YARN-1415: - Summary: In scheduler UI, including used memory in Memory Total seems to be inaccurate Key: YARN-1415 URL: https://issues.apache.org/jira/browse/YARN-1415 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siqi Li Attachments: 1.png, 2.png -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1415) In scheduler UI, including used memory in Memory Total seems to be inaccurate
[ https://issues.apache.org/jira/browse/YARN-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1415: -- Attachment: 2.png 1.png In scheduler UI, including used memory in Memory Total seems to be inaccurate --- Key: YARN-1415 URL: https://issues.apache.org/jira/browse/YARN-1415 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Siqi Li Fix For: 2.1.0-beta Attachments: 1.png, 2.png -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again
[ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated YARN-90: - Attachment: YARN-90.patch Thanks a lot Nigel and Song. Making the changes that I requested to push it over the line. The same patch applies cleanly to branch-2 as well. Could someone kindly review and commit it? NodeManager should identify failed disks becoming good back again - Key: YARN-90 URL: https://issues.apache.org/jira/browse/YARN-90 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Ravi Gummadi Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time back). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1415) In scheduler UI, including used memory in Memory Total seems to be inaccurate
[ https://issues.apache.org/jira/browse/YARN-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1415: -- Description: Memory Total is currently a sum of availableMB, allocatedMB, and reservedMB. It seems that the term availableMB actually means total memory, since it doesn't get decreased when some jobs use a certain amount of memory. Hence, the Memory Total should not include allocatedMB, or availableMB doesn't get updated properly. In scheduler UI, including used memory in Memory Total seems to be inaccurate --- Key: YARN-1415 URL: https://issues.apache.org/jira/browse/YARN-1415 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Siqi Li Fix For: 2.1.0-beta Attachments: 1.png, 2.png Memory Total is currently a sum of availableMB, allocatedMB, and reservedMB. It seems that the term availableMB actually means total memory, since it doesn't get decreased when some jobs use a certain amount of memory. Hence, the Memory Total should not include allocatedMB, or availableMB doesn't get updated properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1415) In scheduler UI, including used memory in Memory Total seems to be inaccurate
[ https://issues.apache.org/jira/browse/YARN-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824154#comment-13824154 ] Siqi Li commented on YARN-1415: --- The first image shows that no job is running, and total memory is 1.9 TB The second image shows that a simple wordcount job is running, and using 49 GB memory. However, the total memory also got increased by 49 GB. The same issue will occur when some jobs reserve a certain amount of memory. It will get added into the total memory. In scheduler UI, including used memory in Memory Total seems to be inaccurate --- Key: YARN-1415 URL: https://issues.apache.org/jira/browse/YARN-1415 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Siqi Li Fix For: 2.1.0-beta Attachments: 1.png, 2.png Memory Total is currently a sum of availableMB, allocatedMB, and reservedMB. It seems that the term availableMB actually means total memory, since it doesn't get decreased when some jobs use a certain amount of memory. Hence, the Memory Total should not include allocatedMB, or availableMB doesn't get updated properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
Omkar Vinit Joshi created YARN-1416: --- Summary: InvalidStateTransitions getting reported in multiple test cases even though they pass Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1415) In scheduler UI, including used memory in Memory Total seems to be inaccurate
[ https://issues.apache.org/jira/browse/YARN-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824161#comment-13824161 ] Sandy Ryza commented on YARN-1415: -- availableMB is meant to only include non-allocated memory. So the issue is more likely that availableMB is not being updated properly. In scheduler UI, including used memory in Memory Total seems to be inaccurate --- Key: YARN-1415 URL: https://issues.apache.org/jira/browse/YARN-1415 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Siqi Li Fix For: 2.1.0-beta Attachments: 1.png, 2.png Memory Total is currently a sum of availableMB, allocatedMB, and reservedMB. It seems that the term availableMB actually means total memory, since it doesn't get decreased when some jobs use a certain amount of memory. Hence, the Memory Total should not include allocatedMB, or availableMB doesn't get updated properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1239) Save version information in the state store
[ https://issues.apache.org/jira/browse/YARN-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824172#comment-13824172 ] Hadoop QA commented on YARN-1239: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614138/YARN-1239.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2465//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2465//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2465//console This message is automatically generated. Save version information in the state store --- Key: YARN-1239 URL: https://issues.apache.org/jira/browse/YARN-1239 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-1239.1.patch, YARN-1239.2.patch, YARN-1239.3.patch, YARN-1239.patch When creating root dir for the first time we should write version 1. If root dir exists then we should check that the version in the state store matches the version from config. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824171#comment-13824171 ] Hadoop QA commented on YARN-1403: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614139/YARN-1403.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2466//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2466//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2466//console This message is automatically generated. Separate out configuration loading from QueueManager in the Fair Scheduler -- Key: YARN-1403 URL: https://issues.apache.org/jira/browse/YARN-1403 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1403.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1417) RM may issue expired container tokens to AM while issuing new containers.
[ https://issues.apache.org/jira/browse/YARN-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824175#comment-13824175 ] Omkar Vinit Joshi commented on YARN-1417: - Fixing this as a part of YARN-713 where I am restructuring the token generation logic. RM may issue expired container tokens to AM while issuing new containers. - Key: YARN-1417 URL: https://issues.apache.org/jira/browse/YARN-1417 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Today we create new container token when we create container in RM as a part of schedule cycle. However that container may get reserved or assigned. If the container gets reserved and remains like that (in reserved state) for more than container token expiry interval then RM will end up issuing container with expired token. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1417) RM may issue expired container tokens to AM while issuing new containers.
Omkar Vinit Joshi created YARN-1417: --- Summary: RM may issue expired container tokens to AM while issuing new containers. Key: YARN-1417 URL: https://issues.apache.org/jira/browse/YARN-1417 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Today we create new container token when we create container in RM as a part of schedule cycle. However that container may get reserved or assigned. If the container gets reserved and remains like that (in reserved state) for more than container token expiry interval then RM will end up issuing container with expired token. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1266) inheriting Application client and History Protocol from base protocol and implement PB service and clients.
[ https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824178#comment-13824178 ] Zhijie Shen commented on YARN-1266: --- IMHO, application_base_protocol.proto should not be necessary, because the base interface is to extract the common code, not to be directly used from the RPC interface. Then, 1. There's no need to mark ApplicationClientProtocolPB as well 2. ApplicationClientProtocolPB and ApplicationHistoryProtocolPB don't need to extend ApplicationBaseProtocolService.BlockingInterface inheriting Application client and History Protocol from base protocol and implement PB service and clients. --- Key: YARN-1266 URL: https://issues.apache.org/jira/browse/YARN-1266 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1266-1.patch, YARN-1266-2.patch, YARN-1266-3.patch Adding ApplicationHistoryProtocolPBService to make web apps to work and changing yarn to run AHS as a seprate process -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824184#comment-13824184 ] Hadoop QA commented on YARN-1241: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614142/YARN-1241-9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2467//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2467//console This message is automatically generated. In Fair Scheduler maxRunningApps does not work for non-leaf queues -- Key: YARN-1241 URL: https://issues.apache.org/jira/browse/YARN-1241 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, YARN-1241-4.patch, YARN-1241-5.patch, YARN-1241-6.patch, YARN-1241-7.patch, YARN-1241-8.patch, YARN-1241-9.patch, YARN-1241.patch Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1332) In TestAMRMClient, replace assertTrue with assertEquals where possible
[ https://issues.apache.org/jira/browse/YARN-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824181#comment-13824181 ] Sebastian Wong commented on YARN-1332: -- Sure thing, would you like me to replace the assertTrue's for things like assertTrue(allocatedContainerCount == containersRequestedAny? In TestAMRMClient, replace assertTrue with assertEquals where possible -- Key: YARN-1332 URL: https://issues.apache.org/jira/browse/YARN-1332 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sebastian Wong Priority: Minor Labels: newbie Attachments: YARN-1332.patch TestAMRMClient uses a lot of assertTrue(amClient.ask.size() == 0) where assertEquals(0, amClient.ask.size()) would make it easier to see why it's failing at a glance. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824221#comment-13824221 ] Hadoop QA commented on YARN-1403: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614151/YARN-1403-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2468//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2468//console This message is automatically generated. Separate out configuration loading from QueueManager in the Fair Scheduler -- Key: YARN-1403 URL: https://issues.apache.org/jira/browse/YARN-1403 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1403-1.patch, YARN-1403.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1332) In TestAMRMClient, replace assertTrue with assertEquals where possible
[ https://issues.apache.org/jira/browse/YARN-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Wong updated YARN-1332: - Attachment: YARN-1332-2.patch Replaced all the assertTrue's that I could find in the file. Passes all build tests in yarn-client as far as I know. In TestAMRMClient, replace assertTrue with assertEquals where possible -- Key: YARN-1332 URL: https://issues.apache.org/jira/browse/YARN-1332 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sebastian Wong Priority: Minor Labels: newbie Attachments: YARN-1332-2.patch, YARN-1332.patch TestAMRMClient uses a lot of assertTrue(amClient.ask.size() == 0) where assertEquals(0, amClient.ask.size()) would make it easier to see why it's failing at a glance. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1389. --- Resolution: Duplicate ApplicationClientProtocol and ApplicationHistoryProtocol is going to share the common base interface (YARN-1266). Therefore, we don't need to duplicate the APIs in both protocols. Close it as duplicate. ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs -- Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
[ https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824224#comment-13824224 ] Jian He commented on YARN-1416: --- Thanks for reporting this. The problem is that duplicate events are sent in the unit tests. Thankfully, no issue with core code :) upload a patch: - remove duplicate events. - some side changes: -- Rename RMAppSavingTransition to RMAppNewlySavingTransition -- Ignore APP_NEW_SAVED event at Final_Saving state, this event is ignorable because app is possible to move from New_Saving to Final_Saving and APP_NEW_SAVED will come. -- Remove the test scenarios for testing ignoring APP_NEW_SAVED at Failed/Killed state, as that should not happen. InvalidStateTransitions getting reported in multiple test cases even though they pass - Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
[ https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1416: -- Attachment: YARN-1416.1.patch InvalidStateTransitions getting reported in multiple test cases even though they pass - Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Attachments: YARN-1416.1.patch It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Reopened] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal reopened YARN-1389: - ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs -- Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824208#comment-13824208 ] Mayank Bansal commented on YARN-1389: - Hi [~zjshen] This we needed for implementing the attempt reports and container reports for YARN CLI. Reopening it Thanks, Mayank ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs -- Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
[ https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824233#comment-13824233 ] Hadoop QA commented on YARN-1416: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614157/YARN-1416.1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2469//console This message is automatically generated. InvalidStateTransitions getting reported in multiple test cases even though they pass - Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Attachments: YARN-1416.1.patch It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1332) In TestAMRMClient, replace assertTrue with assertEquals where possible
[ https://issues.apache.org/jira/browse/YARN-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824187#comment-13824187 ] Sandy Ryza commented on YARN-1332: -- Yes please. In TestAMRMClient, replace assertTrue with assertEquals where possible -- Key: YARN-1332 URL: https://issues.apache.org/jira/browse/YARN-1332 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sebastian Wong Priority: Minor Labels: newbie Attachments: YARN-1332.patch TestAMRMClient uses a lot of assertTrue(amClient.ask.size() == 0) where assertEquals(0, amClient.ask.size()) would make it easier to see why it's failing at a glance. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1403: - Attachment: YARN-1403-1.patch Separate out configuration loading from QueueManager in the Fair Scheduler -- Key: YARN-1403 URL: https://issues.apache.org/jira/browse/YARN-1403 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1403-1.patch, YARN-1403.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824194#comment-13824194 ] Sandy Ryza commented on YARN-1403: -- Uploaded a patch that should fix the findbugs warning. I don't think the test failure is related because the test doesn't even use the fair scheduler and the patch only touches code in the fair scheduler. Separate out configuration loading from QueueManager in the Fair Scheduler -- Key: YARN-1403 URL: https://issues.apache.org/jira/browse/YARN-1403 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1403-1.patch, YARN-1403.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
[ https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1416: -- Attachment: YARN-1416.1.patch InvalidStateTransitions getting reported in multiple test cases even though they pass - Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Attachments: YARN-1416.1.patch, YARN-1416.1.patch It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1418) Add Tracing to YARN
Masatake Iwasaki created YARN-1418: -- Summary: Add Tracing to YARN Key: YARN-1418 URL: https://issues.apache.org/jira/browse/YARN-1418 Project: Hadoop YARN Issue Type: Improvement Components: api, nodemanager, resourcemanager Reporter: Masatake Iwasaki Adding tracing using HTrace in the same way as HBASE-6449 and HDFS-5274. The most part of changes needed for basis such as RPC seems to be almost ready in HDFS-5274. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824268#comment-13824268 ] Alejandro Abdelnur commented on YARN-1403: -- * AllocationFileLoaderService.java: ** start()/stop() should set a volatile boolean 'running' to true/false the reloadThread should loop while 'running'. The stop() should interrupt the thread for force a wake up if sleeping. ** reloadThread run(), the try block should include the reload, then when interrupted by stop() would skip the reloading if exiting. ** reloadAllocs(), we are not charge by the character, method name should be reloadAllocations() ** if (allocFile == null) return; use {} ** what happens if reloadListener.queueConfigurationReloaded(info); throws an exception? in what state things end up? ** not sure the logic using lastReloadAttemptFailed is correct, in the exception handling in thread run() * QueueConfiguration.java ** QueueConfiguration() constructor, shouldn't placementpolicy be the default? * QueueManager.java ** shouldn't this be a composite service? ** it is starting but not stopping the AllocationFileLoaderService ** the initialize() setting the reload-listener is too hidden, this should be done next to where the AllocationFileService is created. Wouldn't be simpler/cleaner if the QueueManager should be a service that encapsulates the reloading, queue allocations, ACLs and queue placement. And the FS should just see methods of it. Separate out configuration loading from QueueManager in the Fair Scheduler -- Key: YARN-1403 URL: https://issues.apache.org/jira/browse/YARN-1403 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1403-1.patch, YARN-1403.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-967: --- Attachment: YARN-967-3.patch Attaching patch with CLI changes and use YARN CLi to use history class as well Thanks, Mayank [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-2.patch, YARN-967-3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1416) InvalidStateTransitions getting reported in multiple test cases even though they pass
[ https://issues.apache.org/jira/browse/YARN-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824280#comment-13824280 ] Hadoop QA commented on YARN-1416: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614171/YARN-1416.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2470//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2470//console This message is automatically generated. InvalidStateTransitions getting reported in multiple test cases even though they pass - Key: YARN-1416 URL: https://issues.apache.org/jira/browse/YARN-1416 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Attachments: YARN-1416.1.patch, YARN-1416.1.patch It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
[ https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824281#comment-13824281 ] Hadoop QA commented on YARN-967: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614173/YARN-967-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2471//console This message is automatically generated. [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data --- Key: YARN-967 URL: https://issues.apache.org/jira/browse/YARN-967 Project: Hadoop YARN Issue Type: Sub-task Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-967-1.patch, YARN-967-2.patch, YARN-967-3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-713: --- Attachment: YARN-713.2.patch ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Priority: Critical Fix For: 2.3.0 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-312) Add updateNodeResource in ResourceManagerAdministrationProtocol
[ https://issues.apache.org/jira/browse/YARN-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824370#comment-13824370 ] Junping Du commented on YARN-312: - Thanks Luke for review and comments. Yes. It looks a little verbosity but we already have cases i.e. NodeStatusPBImpl which is more complex there. So I guess this is not a problem? :) Add updateNodeResource in ResourceManagerAdministrationProtocol --- Key: YARN-312 URL: https://issues.apache.org/jira/browse/YARN-312 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.2.0 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-312-v1.patch, YARN-312-v2.patch, YARN-312-v3.patch, YARN-312-v4.1.patch, YARN-312-v4.patch, YARN-312-v5.1.patch, YARN-312-v5.patch Add fundamental RPC (ResourceManagerAdministrationProtocol) to support node's resource change. For design detail, please refer parent JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Moved] (YARN-1419) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7
[ https://issues.apache.org/jira/browse/YARN-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles moved MAPREDUCE-5630 to YARN-1419: -- Component/s: (was: scheduler) scheduler Target Version/s: 3.0.0, 2.3.0, 0.23.10 (was: 3.0.0, 2.3.0, 0.23.10) Affects Version/s: (was: 0.23.10) (was: 2.3.0) (was: 3.0.0) 0.23.10 2.3.0 3.0.0 Key: YARN-1419 (was: MAPREDUCE-5630) Project: Hadoop YARN (was: Hadoop Map/Reduce) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 Key: YARN-1419 URL: https://issues.apache.org/jira/browse/YARN-1419 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.0.0, 2.3.0, 0.23.10 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Minor Labels: java7 QueueMetrics holds its data in a static variable causing metrics to bleed over from test to test. clearQueueMetrics is to be called for tests that need to measure metrics correctly for a single test. jdk7 comes into play since tests are run out of order, and in the case make the metrics unreliable. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1419) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7
[ https://issues.apache.org/jira/browse/YARN-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1419: -- Attachment: YARN-1419.patch TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 Key: YARN-1419 URL: https://issues.apache.org/jira/browse/YARN-1419 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.0.0, 2.3.0, 0.23.10 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Minor Labels: java7 Attachments: YARN-1419.patch QueueMetrics holds its data in a static variable causing metrics to bleed over from test to test. clearQueueMetrics is to be called for tests that need to measure metrics correctly for a single test. jdk7 comes into play since tests are run out of order, and in the case make the metrics unreliable. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1419) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7
[ https://issues.apache.org/jira/browse/YARN-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824394#comment-13824394 ] Hadoop QA commented on YARN-1419: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614201/YARN-1419.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2472//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2472//console This message is automatically generated. TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 Key: YARN-1419 URL: https://issues.apache.org/jira/browse/YARN-1419 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.0.0, 2.3.0, 0.23.10 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Minor Labels: java7 Attachments: YARN-1419.patch QueueMetrics holds its data in a static variable causing metrics to bleed over from test to test. clearQueueMetrics is to be called for tests that need to measure metrics correctly for a single test. jdk7 comes into play since tests are run out of order, and in the case make the metrics unreliable. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1419) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7
[ https://issues.apache.org/jira/browse/YARN-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-1419: -- Attachment: YARN-1419.patch Instead of heavily changing the QueueMetrics class and its use of static class variables and it not unregistering the beans, I've chosen to take a simpler approach of just measuring the apps submitted delta. TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 Key: YARN-1419 URL: https://issues.apache.org/jira/browse/YARN-1419 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.0.0, 2.3.0, 0.23.10 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Minor Labels: java7 Attachments: YARN-1419.patch, YARN-1419.patch QueueMetrics holds its data in a static variable causing metrics to bleed over from test to test. clearQueueMetrics is to be called for tests that need to measure metrics correctly for a single test. jdk7 comes into play since tests are run out of order, and in the case make the metrics unreliable. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1419) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7
[ https://issues.apache.org/jira/browse/YARN-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824404#comment-13824404 ] Hadoop QA commented on YARN-1419: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614207/YARN-1419.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2473//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2473//console This message is automatically generated. TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 Key: YARN-1419 URL: https://issues.apache.org/jira/browse/YARN-1419 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.0.0, 2.3.0, 0.23.10 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Minor Labels: java7 Attachments: YARN-1419.patch, YARN-1419.patch QueueMetrics holds its data in a static variable causing metrics to bleed over from test to test. clearQueueMetrics is to be called for tests that need to measure metrics correctly for a single test. jdk7 comes into play since tests are run out of order, and in the case make the metrics unreliable. -- This message was sent by Atlassian JIRA (v6.1#6144)