[jira] [Commented] (YARN-1428) RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state
[ https://issues.apache.org/jira/browse/YARN-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905095#comment-13905095 ] Hudson commented on YARN-1428: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #486 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/486/]) YARN-1428. Fixed RM to write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state. (Contributed by Zhijie Shen) (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569585) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/RMApplicationHistoryWriter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state --- Key: YARN-1428 URL: https://issues.apache.org/jira/browse/YARN-1428 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1428.1.patch, YARN-1428.2.patch, YARN-1428.3-branch-2.patch, YARN-1428.3.patch ApplicationFinishData and ApplicationAttemptFinishData are written in the final transitions of RMApp/RMAppAttempt respectively. However, in the transitions, getState() is not getting the state that RMApp/RMAppAttempt is going to enter, but prior one. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1590) _HOST doesn't expand properly for RM, NM, ProxyServer and JHS
[ https://issues.apache.org/jira/browse/YARN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905097#comment-13905097 ] Hudson commented on YARN-1590: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #486 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/486/]) YARN-1590. Fixed ResourceManager, web-app proxy and MR JobHistoryServer to expand _HOST properly in their kerberos principles. Contributed by Mohammad Kamrul Islam. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569537) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServer.java _HOST doesn't expand properly for RM, NM, ProxyServer and JHS - Key: YARN-1590 URL: https://issues.apache.org/jira/browse/YARN-1590 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.2.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Fix For: 2.4.0 Attachments: YARN-1590.1.patch, YARN-1590.2.patch, YARN-1590.3.patch, YARN-1590.4.patch _HOST is not properly substituted when we use VIP address. Currently it always used the host name of the machine and disregard the VIP address. It is true mainly for RM, NM, WebProxy, and JHS rpc service. Looks like it is working fine for webservice authentication. On the other hand, the same thing is working fine for NN and SNN in RPC as well as webservice. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1724) Race condition in Fair Scheduler when continuous scheduling is turned on
[ https://issues.apache.org/jira/browse/YARN-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905094#comment-13905094 ] Hudson commented on YARN-1724: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #486 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/486/]) YARN-1724. Race condition in Fair Scheduler when continuous scheduling is turned on (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569447) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java Race condition in Fair Scheduler when continuous scheduling is turned on - Key: YARN-1724 URL: https://issues.apache.org/jira/browse/YARN-1724 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical Fix For: 2.4.0 Attachments: YARN-1724-1.patch, YARN-1724.patch If nodes resource allocations change during Collections.sort(nodeIdList, nodeAvailableResourceComparator); we'll hit: java.lang.IllegalArgumentException: Comparison method violates its general contract! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1721) When moving app between queues in Fair Scheduler, grab lock on FSSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905089#comment-13905089 ] Hudson commented on YARN-1721: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #486 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/486/]) YARN-1721. When moving app between queues in Fair Scheduler, grab lock on FSSchedulerApp (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569443) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java When moving app between queues in Fair Scheduler, grab lock on FSSchedulerApp - Key: YARN-1721 URL: https://issues.apache.org/jira/browse/YARN-1721 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical Fix For: 2.4.0 Attachments: YARN-1721-1.patch, YARN-1721.patch FairScheduler.moveApplication should grab lock on FSSchedulerApp, so that allocate() can't be modifying it at the same time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1171) Add default queue properties to Fair Scheduler documentation
[ https://issues.apache.org/jira/browse/YARN-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905105#comment-13905105 ] Hadoop QA commented on YARN-1171: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629713/YARN-1171-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3120//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3120//console This message is automatically generated. Add default queue properties to Fair Scheduler documentation - Key: YARN-1171 URL: https://issues.apache.org/jira/browse/YARN-1171 Project: Hadoop YARN Issue Type: Improvement Components: documentation, scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Karthik Kambatla Attachments: YARN-1171-1.patch The Fair Scheduler doc is missing the following properties. - defaultMinSharePreemptionTimeout - queueMaxAppsDefault -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905321#comment-13905321 ] Hadoop QA commented on YARN-1071: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629714/YARN-1071.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMNodeTransitions The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3118//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3118//console This message is automatically generated. ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Jian He Attachments: YARN-1071.1.patch I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905322#comment-13905322 ] Hadoop QA commented on YARN-713: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629700/YARN-713.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3119//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3119//console This message is automatically generated. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.3.patch, YARN-713.4.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905363#comment-13905363 ] Hadoop QA commented on YARN-1666: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629697/YARN-1666.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3121//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3121//console This message is automatically generated. Make admin refreshNodes work across RM failover --- Key: YARN-1666 URL: https://issues.apache.org/jira/browse/YARN-1666 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, YARN-1666.6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1724) Race condition in Fair Scheduler when continuous scheduling is turned on
[ https://issues.apache.org/jira/browse/YARN-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905441#comment-13905441 ] Hudson commented on YARN-1724: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1678 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1678/]) YARN-1724. Race condition in Fair Scheduler when continuous scheduling is turned on (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569447) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java Race condition in Fair Scheduler when continuous scheduling is turned on - Key: YARN-1724 URL: https://issues.apache.org/jira/browse/YARN-1724 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical Fix For: 2.4.0 Attachments: YARN-1724-1.patch, YARN-1724.patch If nodes resource allocations change during Collections.sort(nodeIdList, nodeAvailableResourceComparator); we'll hit: java.lang.IllegalArgumentException: Comparison method violates its general contract! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1721) When moving app between queues in Fair Scheduler, grab lock on FSSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905436#comment-13905436 ] Hudson commented on YARN-1721: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1678 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1678/]) YARN-1721. When moving app between queues in Fair Scheduler, grab lock on FSSchedulerApp (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569443) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java When moving app between queues in Fair Scheduler, grab lock on FSSchedulerApp - Key: YARN-1721 URL: https://issues.apache.org/jira/browse/YARN-1721 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical Fix For: 2.4.0 Attachments: YARN-1721-1.patch, YARN-1721.patch FairScheduler.moveApplication should grab lock on FSSchedulerApp, so that allocate() can't be modifying it at the same time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1590) _HOST doesn't expand properly for RM, NM, ProxyServer and JHS
[ https://issues.apache.org/jira/browse/YARN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905444#comment-13905444 ] Hudson commented on YARN-1590: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1678 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1678/]) YARN-1590. Fixed ResourceManager, web-app proxy and MR JobHistoryServer to expand _HOST properly in their kerberos principles. Contributed by Mohammad Kamrul Islam. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569537) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServer.java _HOST doesn't expand properly for RM, NM, ProxyServer and JHS - Key: YARN-1590 URL: https://issues.apache.org/jira/browse/YARN-1590 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.2.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Fix For: 2.4.0 Attachments: YARN-1590.1.patch, YARN-1590.2.patch, YARN-1590.3.patch, YARN-1590.4.patch _HOST is not properly substituted when we use VIP address. Currently it always used the host name of the machine and disregard the VIP address. It is true mainly for RM, NM, WebProxy, and JHS rpc service. Looks like it is working fine for webservice authentication. On the other hand, the same thing is working fine for NN and SNN in RPC as well as webservice. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1428) RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state
[ https://issues.apache.org/jira/browse/YARN-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905442#comment-13905442 ] Hudson commented on YARN-1428: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1678 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1678/]) YARN-1428. Fixed RM to write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state. (Contributed by Zhijie Shen) (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569585) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/RMApplicationHistoryWriter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state --- Key: YARN-1428 URL: https://issues.apache.org/jira/browse/YARN-1428 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1428.1.patch, YARN-1428.2.patch, YARN-1428.3-branch-2.patch, YARN-1428.3.patch ApplicationFinishData and ApplicationAttemptFinishData are written in the final transitions of RMApp/RMAppAttempt respectively. However, in the transitions, getState() is not getting the state that RMApp/RMAppAttempt is going to enter, but prior one. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1724) Race condition in Fair Scheduler when continuous scheduling is turned on
[ https://issues.apache.org/jira/browse/YARN-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905515#comment-13905515 ] Hudson commented on YARN-1724: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1703 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1703/]) YARN-1724. Race condition in Fair Scheduler when continuous scheduling is turned on (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569447) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java Race condition in Fair Scheduler when continuous scheduling is turned on - Key: YARN-1724 URL: https://issues.apache.org/jira/browse/YARN-1724 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical Fix For: 2.4.0 Attachments: YARN-1724-1.patch, YARN-1724.patch If nodes resource allocations change during Collections.sort(nodeIdList, nodeAvailableResourceComparator); we'll hit: java.lang.IllegalArgumentException: Comparison method violates its general contract! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1721) When moving app between queues in Fair Scheduler, grab lock on FSSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905510#comment-13905510 ] Hudson commented on YARN-1721: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1703 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1703/]) YARN-1721. When moving app between queues in Fair Scheduler, grab lock on FSSchedulerApp (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569443) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java When moving app between queues in Fair Scheduler, grab lock on FSSchedulerApp - Key: YARN-1721 URL: https://issues.apache.org/jira/browse/YARN-1721 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical Fix For: 2.4.0 Attachments: YARN-1721-1.patch, YARN-1721.patch FairScheduler.moveApplication should grab lock on FSSchedulerApp, so that allocate() can't be modifying it at the same time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1590) _HOST doesn't expand properly for RM, NM, ProxyServer and JHS
[ https://issues.apache.org/jira/browse/YARN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905518#comment-13905518 ] Hudson commented on YARN-1590: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1703 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1703/]) YARN-1590. Fixed ResourceManager, web-app proxy and MR JobHistoryServer to expand _HOST properly in their kerberos principles. Contributed by Mohammad Kamrul Islam. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569537) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServer.java _HOST doesn't expand properly for RM, NM, ProxyServer and JHS - Key: YARN-1590 URL: https://issues.apache.org/jira/browse/YARN-1590 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.2.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Fix For: 2.4.0 Attachments: YARN-1590.1.patch, YARN-1590.2.patch, YARN-1590.3.patch, YARN-1590.4.patch _HOST is not properly substituted when we use VIP address. Currently it always used the host name of the machine and disregard the VIP address. It is true mainly for RM, NM, WebProxy, and JHS rpc service. Looks like it is working fine for webservice authentication. On the other hand, the same thing is working fine for NN and SNN in RPC as well as webservice. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1428) RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state
[ https://issues.apache.org/jira/browse/YARN-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905516#comment-13905516 ] Hudson commented on YARN-1428: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1703 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1703/]) YARN-1428. Fixed RM to write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state. (Contributed by Zhijie Shen) (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569585) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/RMApplicationHistoryWriter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state --- Key: YARN-1428 URL: https://issues.apache.org/jira/browse/YARN-1428 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1428.1.patch, YARN-1428.2.patch, YARN-1428.3-branch-2.patch, YARN-1428.3.patch ApplicationFinishData and ApplicationAttemptFinishData are written in the final transitions of RMApp/RMAppAttempt respectively. However, in the transitions, getState() is not getting the state that RMApp/RMAppAttempt is going to enter, but prior one. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (YARN-1694) RM is shutting down when an NM is added to cluster without updating the hostname in /etc/hosts
[ https://issues.apache.org/jira/browse/YARN-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-1694. --- Resolution: Duplicate Yes it is. RM is shutting down when an NM is added to cluster without updating the hostname in /etc/hosts -- Key: YARN-1694 URL: https://issues.apache.org/jira/browse/YARN-1694 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Sunil G Priority: Critical A New NM is added to cluster, but the hostname mapping of this NM is not updated in /etc/hosts in RM. NM registration is successful without any problems. When a job is submitted, RM shuts down with below exception. 2013-10-04 04:37:37,611 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.IllegalArgumentException: java.net.UnknownHostException: host-10-18-40-120 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247) at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1296) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1344) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1210) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:870) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:707) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:751) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:93) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:449) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.UnknownHostException: host-10-18-40-120 ... 15 more 2013-10-04 04:37:37,614 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1479) Invalid NaN values in Hadoop REST API JSON response
[ https://issues.apache.org/jira/browse/YARN-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905740#comment-13905740 ] Jonathan Eagles commented on YARN-1479: --- +1. Making a minor tweak to the sleep time since it was causing the test to take 1 minute longer than needed on my box. Invalid NaN values in Hadoop REST API JSON response --- Key: YARN-1479 URL: https://issues.apache.org/jira/browse/YARN-1479 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.6, 2.0.4-alpha Reporter: Kendall Thrapp Assignee: Chen He Attachments: Yarn-1479.patch, Yarn-1479v2.patch I've been occasionally coming across instances where Hadoop's Cluster Applications REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API) has returned JSON that PHP's json_decode function failed to parse. I've tracked the syntax error down to the presence of the unquoted word NaN appearing as a value in the JSON. For example: progress:NaN, NaN is not part of the JSON spec, so its presence renders the whole JSON string invalid. Hadoop needs to return something other than NaN in this case -- perhaps an empty string or the quoted string NaN. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905754#comment-13905754 ] Vinod Kumar Vavilapalli commented on YARN-1666: --- +1, looks good. Checking this in. Make admin refreshNodes work across RM failover --- Key: YARN-1666 URL: https://issues.apache.org/jira/browse/YARN-1666 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, YARN-1666.6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1479) Invalid NaN values in Hadoop REST API JSON response
[ https://issues.apache.org/jira/browse/YARN-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905765#comment-13905765 ] Hudson commented on YARN-1479: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5189 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5189/]) YARN-1479. Invalid NaN values in Hadoop REST API JSON response (Chen He via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569853) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java Invalid NaN values in Hadoop REST API JSON response --- Key: YARN-1479 URL: https://issues.apache.org/jira/browse/YARN-1479 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.6, 2.0.4-alpha Reporter: Kendall Thrapp Assignee: Chen He Fix For: 3.0.0, 2.5.0 Attachments: Yarn-1479.patch, Yarn-1479v2.patch I've been occasionally coming across instances where Hadoop's Cluster Applications REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API) has returned JSON that PHP's json_decode function failed to parse. I've tracked the syntax error down to the presence of the unquoted word NaN appearing as a value in the JSON. For example: progress:NaN, NaN is not part of the JSON spec, so its presence renders the whole JSON string invalid. Hadoop needs to return something other than NaN in this case -- perhaps an empty string or the quoted string NaN. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905780#comment-13905780 ] Hudson commented on YARN-1666: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5190 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5190/]) YARN-1666. Modified RM HA handling of include/exclude node-lists to be available across RM failover by making using of a remote configuration-provider. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569856) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/HostsFileReader.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/ConfigurationProvider.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/FileSystemBasedConfigurationProvider.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/LocalConfigurationProvider.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/hadoop-policy.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml Make admin refreshNodes work across RM failover --- Key: YARN-1666 URL: https://issues.apache.org/jira/browse/YARN-1666 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, YARN-1666.6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1071: -- Attachment: YARN-1071.2.patch ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Jian He Attachments: YARN-1071.1.patch, YARN-1071.2.patch I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905785#comment-13905785 ] Hadoop QA commented on YARN-1071: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629826/YARN-1071.2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3122//console This message is automatically generated. ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Jian He Attachments: YARN-1071.1.patch, YARN-1071.2.patch I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905799#comment-13905799 ] Mit Desai commented on YARN-1281: - Is this failure just related to the test or is there some bug in hadoop? TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1071: -- Attachment: YARN-1071.3.patch ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Jian He Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1297) Miscellaneous Fair Scheduler speedups
[ https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1297: --- Description: I ran the Fair Scheduler's core scheduling loop through a profiler tool and identified a bunch of minimally invasive changes that can shave off a few milliseconds. The main one is demoting a couple INFO log messages to DEBUG, which brought my benchmark down from 16000 ms to 6000. A few others (which had way less of an impact) were * Most of the time in comparisons was being spent in Math.signum. I switched this to direct ifs and elses and it halved the percent of time spent in comparisons. * I removed some unnecessary instantiations of Resource objects * I made it so that queues' usage wasn't calculated from the applications up each time getResourceUsage was called. was: I ran the Fair Scheduler's core scheduling loop through a profiler to and identified a bunch of minimally invasive changes that can shave off a few milliseconds. The main one is demoting a couple INFO log messages to DEBUG, which brought my benchmark down from 16000 ms to 6000. A few others (which had way less of an impact) were * Most of the time in comparisons was being spent in Math.signum. I switched this to direct ifs and elses and it halved the percent of time spent in comparisons. * I removed some unnecessary instantiations of Resource objects * I made it so that queues' usage wasn't calculated from the applications up each time getResourceUsage was called. Miscellaneous Fair Scheduler speedups - Key: YARN-1297 URL: https://issues.apache.org/jira/browse/YARN-1297 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1297-1.patch, YARN-1297-2.patch, YARN-1297.patch, YARN-1297.patch I ran the Fair Scheduler's core scheduling loop through a profiler tool and identified a bunch of minimally invasive changes that can shave off a few milliseconds. The main one is demoting a couple INFO log messages to DEBUG, which brought my benchmark down from 16000 ms to 6000. A few others (which had way less of an impact) were * Most of the time in comparisons was being spent in Math.signum. I switched this to direct ifs and elses and it halved the percent of time spent in comparisons. * I removed some unnecessary instantiations of Resource objects * I made it so that queues' usage wasn't calculated from the applications up each time getResourceUsage was called. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1297) Miscellaneous Fair Scheduler speedups
[ https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905816#comment-13905816 ] Karthik Kambatla commented on YARN-1297: +1 Miscellaneous Fair Scheduler speedups - Key: YARN-1297 URL: https://issues.apache.org/jira/browse/YARN-1297 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1297-1.patch, YARN-1297-2.patch, YARN-1297.patch, YARN-1297.patch I ran the Fair Scheduler's core scheduling loop through a profiler tool and identified a bunch of minimally invasive changes that can shave off a few milliseconds. The main one is demoting a couple INFO log messages to DEBUG, which brought my benchmark down from 16000 ms to 6000. A few others (which had way less of an impact) were * Most of the time in comparisons was being spent in Math.signum. I switched this to direct ifs and elses and it halved the percent of time spent in comparisons. * I removed some unnecessary instantiations of Resource objects * I made it so that queues' usage wasn't calculated from the applications up each time getResourceUsage was called. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1428) RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state
[ https://issues.apache.org/jira/browse/YARN-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905814#comment-13905814 ] Jian He commented on YARN-1428: --- Committed to branch-2.4 also. RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state --- Key: YARN-1428 URL: https://issues.apache.org/jira/browse/YARN-1428 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1428.1.patch, YARN-1428.2.patch, YARN-1428.3-branch-2.patch, YARN-1428.3.patch ApplicationFinishData and ApplicationAttemptFinishData are written in the final transitions of RMApp/RMAppAttempt respectively. However, in the transitions, getState() is not getting the state that RMApp/RMAppAttempt is going to enter, but prior one. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905803#comment-13905803 ] Karthik Kambatla commented on YARN-1678: Thanks Sandy. Looks good to me except for the following nit. * If we are editing javadoc, we should add all the params. Or, we should not add any at all. {code} * * @param reserved * Whether there's already a container reserved for this app on the node. {code} Fair scheduler gabs incessantly about reservations -- Key: YARN-1678 URL: https://issues.apache.org/jira/browse/YARN-1678 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1678-1.patch, YARN-1678.patch Come on FS. We really don't need to know every time a node with a reservation on it heartbeats. {code} 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Trying to fulfill reservation for application appattempt_1390547864213_0347_01 on node: host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Making reservation: node=a2330.halxg.cloudera.com app_id=application_1390547864213_0347 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1390547864213_0347 reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8, currently has 6 at priority 0; currentReservation 6144 2014-01-29 03:48:16,044 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Updated reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 for application org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1171) Add default queue properties to Fair Scheduler documentation
[ https://issues.apache.org/jira/browse/YARN-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905831#comment-13905831 ] Sandy Ryza commented on YARN-1171: -- +1, LGTM. Thanks Naren! Add default queue properties to Fair Scheduler documentation - Key: YARN-1171 URL: https://issues.apache.org/jira/browse/YARN-1171 Project: Hadoop YARN Issue Type: Improvement Components: documentation, scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Karthik Kambatla Attachments: YARN-1171-1.patch The Fair Scheduler doc is missing the following properties. - defaultMinSharePreemptionTimeout - queueMaxAppsDefault -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905833#comment-13905833 ] Karthik Kambatla commented on YARN-1281: I believe it is just related to the test, as other testing didn't reveal anything. Haven't been able to reliably reproduce it either. TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
[ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905888#comment-13905888 ] Hadoop QA commented on YARN-1071: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629834/YARN-1071.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3123//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3123//console This message is automatically generated. ResourceManager's decommissioned and lost node count is 0 after restart --- Key: YARN-1071 URL: https://issues.apache.org/jira/browse/YARN-1071 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Jian He Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 1, NumLostNMs : 2, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} NumActiveNMs : 3, NumDecommissionedNMs : 0, NumLostNMs : 0, NumUnhealthyNMs : 0, NumRebootedNMs : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1735) AvailableMB in QueueMetrics is the same as AllocateMB
Siqi Li created YARN-1735: - Summary: AvailableMB in QueueMetrics is the same as AllocateMB Key: YARN-1735 URL: https://issues.apache.org/jira/browse/YARN-1735 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1735) AvailableMB in QueueMetrics is the same as AllocateMB
[ https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1735: -- Component/s: scheduler resourcemanager AvailableMB in QueueMetrics is the same as AllocateMB - Key: YARN-1735 URL: https://issues.apache.org/jira/browse/YARN-1735 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Reporter: Siqi Li in Viz graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the pool max allocation. The spikes are quite confusing since the availableMB is set as the fair share of Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1735) AvailableMB in QueueMetrics is the same as AllocateMB
[ https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1735: -- Description: in Viz graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the pool max allocation. Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. AvailableMB in QueueMetrics is the same as AllocateMB - Key: YARN-1735 URL: https://issues.apache.org/jira/browse/YARN-1735 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Reporter: Siqi Li in Viz graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the pool max allocation. Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1735) AvailableMB in QueueMetrics is the same as AllocateMB
[ https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1735: -- Description: in Viz graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the pool max allocation. The spikes are quite confusing since the availableMB is set as the fair share of Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. was: in Viz graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the pool max allocation. Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. AvailableMB in QueueMetrics is the same as AllocateMB - Key: YARN-1735 URL: https://issues.apache.org/jira/browse/YARN-1735 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Reporter: Siqi Li in Viz graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the pool max allocation. The spikes are quite confusing since the availableMB is set as the fair share of Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1735) AvailableMB in QueueMetrics is the same as AllocateMB
[ https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1735: -- Description: in Viz graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the pool max allocation. The spikes are quite confusing since the availableMB is set as the fair share of each queue and the fair share of each queue is bond by their allowed max resource. Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. was: in Viz graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the pool max allocation. The spikes are quite confusing since the availableMB is set as the fair share of Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. AvailableMB in QueueMetrics is the same as AllocateMB - Key: YARN-1735 URL: https://issues.apache.org/jira/browse/YARN-1735 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Reporter: Siqi Li in Viz graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the pool max allocation. The spikes are quite confusing since the availableMB is set as the fair share of each queue and the fair share of each queue is bond by their allowed max resource. Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906020#comment-13906020 ] Mit Desai commented on YARN-1281: - I had tried it on my machine and it was passing too. Just wanted to make sure it is a test issue and not a real bug TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved YARN-1281. - Resolution: Cannot Reproduce Target Version/s: (was: ) This JIRA has been open for a long time and the issue does not seem to be reproducible. I am closing it for now. We can open it again if we find out that it is failing again. TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1171) Add default queue properties to Fair Scheduler documentation
[ https://issues.apache.org/jira/browse/YARN-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906100#comment-13906100 ] Naren Koneru commented on YARN-1171: Changed the issue to reflect the current state of code vs documentation and fixed the documentation. Add default queue properties to Fair Scheduler documentation - Key: YARN-1171 URL: https://issues.apache.org/jira/browse/YARN-1171 Project: Hadoop YARN Issue Type: Improvement Components: documentation, scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Karthik Kambatla Attachments: YARN-1171-1.patch The Fair Scheduler doc is missing the following properties. - defaultMinSharePreemptionTimeout - queueMaxAppsDefault -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-713: - Attachment: YARN-713.5.patch ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.3.patch, YARN-713.4.patch, YARN-713.5.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1171) Add default queue properties to Fair Scheduler documentation
[ https://issues.apache.org/jira/browse/YARN-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1171: - Assignee: Naren Koneru (was: Karthik Kambatla) Add default queue properties to Fair Scheduler documentation - Key: YARN-1171 URL: https://issues.apache.org/jira/browse/YARN-1171 Project: Hadoop YARN Issue Type: Improvement Components: documentation, scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Naren Koneru Attachments: YARN-1171-1.patch The Fair Scheduler doc is missing the following properties. - defaultMinSharePreemptionTimeout - queueMaxAppsDefault -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906102#comment-13906102 ] Jian He commented on YARN-713: -- bq. May be resend the container-allocated event in a thread after 500ms Agree, better than a tight loop. Uploaded a new patch that fixed the comments. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.3.patch, YARN-713.4.patch, YARN-713.5.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-713: - Attachment: YARN-713.6.patch New patch with minor more fix ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.3.patch, YARN-713.4.patch, YARN-713.5.patch, YARN-713.6.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906125#comment-13906125 ] Karthik Kambatla commented on YARN-1281: I actually see this failing in our nightly builds every so often. It is just that, I haven't figured out a way to reliably reproduce it. TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1171) Add default queue properties to Fair Scheduler documentation
[ https://issues.apache.org/jira/browse/YARN-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1171: - Fix Version/s: (was: 2.5.0) 2.4.0 Add default queue properties to Fair Scheduler documentation - Key: YARN-1171 URL: https://issues.apache.org/jira/browse/YARN-1171 Project: Hadoop YARN Issue Type: Improvement Components: documentation, scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Naren Koneru Fix For: 2.4.0 Attachments: YARN-1171-1.patch The Fair Scheduler doc is missing the following properties. - defaultMinSharePreemptionTimeout - queueMaxAppsDefault -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1171) Add default queue properties to Fair Scheduler documentation
[ https://issues.apache.org/jira/browse/YARN-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906128#comment-13906128 ] Sandy Ryza commented on YARN-1171: -- and branch-2.4 Add default queue properties to Fair Scheduler documentation - Key: YARN-1171 URL: https://issues.apache.org/jira/browse/YARN-1171 Project: Hadoop YARN Issue Type: Improvement Components: documentation, scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Naren Koneru Fix For: 2.4.0 Attachments: YARN-1171-1.patch The Fair Scheduler doc is missing the following properties. - defaultMinSharePreemptionTimeout - queueMaxAppsDefault -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1736) In Fair Scheduler, AppSchedulable.assignContainer Priority argument is redundant with ResourceRequest
Sandy Ryza created YARN-1736: Summary: In Fair Scheduler, AppSchedulable.assignContainer Priority argument is redundant with ResourceRequest Key: YARN-1736 URL: https://issues.apache.org/jira/browse/YARN-1736 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Priority: Minor The ResourceRequest includes a Priority, so no need to pass in a Priority alongside it -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1678: - Attachment: YARN-1678-1.patch Fair scheduler gabs incessantly about reservations -- Key: YARN-1678 URL: https://issues.apache.org/jira/browse/YARN-1678 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1678-1.patch, YARN-1678-1.patch, YARN-1678.patch Come on FS. We really don't need to know every time a node with a reservation on it heartbeats. {code} 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Trying to fulfill reservation for application appattempt_1390547864213_0347_01 on node: host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Making reservation: node=a2330.halxg.cloudera.com app_id=application_1390547864213_0347 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1390547864213_0347 reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8, currently has 6 at priority 0; currentReservation 6144 2014-01-29 03:48:16,044 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Updated reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 for application org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-986) YARN should use cluster-id as token service address
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906154#comment-13906154 ] Vinod Kumar Vavilapalli commented on YARN-986: -- Any update? Can we take it over if you can't find time? Tx. YARN should use cluster-id as token service address --- Key: YARN-986 URL: https://issues.apache.org/jira/browse/YARN-986 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Karthik Kambatla Priority: Blocker This needs to be done to support non-ip based fail over of RM. Once the server sets the token service address to be this generic ClusterId/ServiceId, clients can translate it to appropriate final IP and then be able to select tokens via TokenSelectors. Some workarounds for other related issues were put in place at YARN-945. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906202#comment-13906202 ] Karthik Kambatla commented on YARN-1678: +1, pending Jenkins. Thanks Sandy. Fair scheduler gabs incessantly about reservations -- Key: YARN-1678 URL: https://issues.apache.org/jira/browse/YARN-1678 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1678-1.patch, YARN-1678-1.patch, YARN-1678.patch Come on FS. We really don't need to know every time a node with a reservation on it heartbeats. {code} 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Trying to fulfill reservation for application appattempt_1390547864213_0347_01 on node: host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Making reservation: node=a2330.halxg.cloudera.com app_id=application_1390547864213_0347 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1390547864213_0347 reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8, currently has 6 at priority 0; currentReservation 6144 2014-01-29 03:48:16,044 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Updated reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 for application org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-986) YARN should use cluster-id as token service address
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906203#comment-13906203 ] Karthik Kambatla commented on YARN-986: --- Was pulled away for something else. I have spent some time on this and have addressed the preliminary issues - running into others that I am actively debugging. Let me keep digging until the end of this week. If I don't make much progress, someone else can take it up. YARN should use cluster-id as token service address --- Key: YARN-986 URL: https://issues.apache.org/jira/browse/YARN-986 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Karthik Kambatla Priority: Blocker This needs to be done to support non-ip based fail over of RM. Once the server sets the token service address to be this generic ClusterId/ServiceId, clients can translate it to appropriate final IP and then be able to select tokens via TokenSelectors. Some workarounds for other related issues were put in place at YARN-945. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1736) In Fair Scheduler, AppSchedulable.assignContainer Priority argument is redundant with ResourceRequest
[ https://issues.apache.org/jira/browse/YARN-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naren Koneru reassigned YARN-1736: -- Assignee: Naren Koneru In Fair Scheduler, AppSchedulable.assignContainer Priority argument is redundant with ResourceRequest - Key: YARN-1736 URL: https://issues.apache.org/jira/browse/YARN-1736 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Naren Koneru Priority: Minor The ResourceRequest includes a Priority, so no need to pass in a Priority alongside it -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1734: Attachment: YARN-1734.2.patch RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906224#comment-13906224 ] Xuan Gong commented on YARN-1734: - After YARN-1 is checked in, we do have InputStream object returned from ConfigurationProvider, so let us keep it. The new patch includes changes in AdminService. I create a set which includes function name, parameter type and parameter object for all refresh*s. And will manually call them after transitionToActive. In that case, the active RM can get the updated configuration. A test case is also included. RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Reopened] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reopened YARN-1281: - I see Karthik. Reopening it TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906248#comment-13906248 ] Hadoop QA commented on YARN-713: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629893/YARN-713.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3124//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3124//console This message is automatically generated. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.3.patch, YARN-713.4.patch, YARN-713.5.patch, YARN-713.6.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906287#comment-13906287 ] Vinod Kumar Vavilapalli commented on YARN-713: -- +1, looks good. Checking this in. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.3.patch, YARN-713.4.patch, YARN-713.5.patch, YARN-713.6.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906290#comment-13906290 ] Hadoop QA commented on YARN-713: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629893/YARN-713.6.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3127//console This message is automatically generated. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.3.patch, YARN-713.4.patch, YARN-713.5.patch, YARN-713.6.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906304#comment-13906304 ] Arpit Agarwal commented on YARN-713: Is this error in branch-2.4 related? {code} WARN: Please see http://www.slf4j.org/codes.html for an explanation. [ERROR] COMPILATION ERROR : [ERROR] /Users/aagarwal/src/commit/branch-2.4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/Allocation.java:[32,16] cannot find symbol symbol : class RecordFactory location: class org.apache.hadoop.yarn.server.resourcemanager.scheduler.Allocation [ERROR] /Users/aagarwal/src/commit/branch-2.4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/Allocation.java:[33,6] cannot find symbol {code} Thanks. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Fix For: 2.4.0 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.3.patch, YARN-713.4.patch, YARN-713.5.patch, YARN-713.6.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906301#comment-13906301 ] Hudson commented on YARN-713: - SUCCESS: Integrated in Hadoop-trunk-Commit #5192 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5192/]) YARN-713. Fixed ResourceManager to not crash while building tokens when DNS issues happen transmittently. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569979) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptContainerAllocatedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/Allocation.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/NMTokenSecretManagerInRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Fix For: 2.4.0 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.3.patch, YARN-713.4.patch, YARN-713.5.patch, YARN-713.6.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1171) Add default queue properties to Fair Scheduler documentation
[ https://issues.apache.org/jira/browse/YARN-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906297#comment-13906297 ] Hudson commented on YARN-1171: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5192 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5192/]) YARN-1171. Add default queue properties to Fair Scheduler documentation (Naren Koneru via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569923) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm Add default queue properties to Fair Scheduler documentation - Key: YARN-1171 URL: https://issues.apache.org/jira/browse/YARN-1171 Project: Hadoop YARN Issue Type: Improvement Components: documentation, scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Naren Koneru Fix For: 2.4.0 Attachments: YARN-1171-1.patch The Fair Scheduler doc is missing the following properties. - defaultMinSharePreemptionTimeout - queueMaxAppsDefault -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1718) Fix a couple isTerminals in Fair Scheduler queue placement rules
[ https://issues.apache.org/jira/browse/YARN-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906299#comment-13906299 ] Hudson commented on YARN-1718: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5192 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5192/]) YARN-1718. Fix a couple isTerminals in Fair Scheduler queue placement rules (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1569928) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java Fix a couple isTerminals in Fair Scheduler queue placement rules - Key: YARN-1718 URL: https://issues.apache.org/jira/browse/YARN-1718 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.5.0 Attachments: YARN-1718.patch SecondaryGroupExistingQueue and Default are incorrect -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906352#comment-13906352 ] Hadoop QA commented on YARN-1734: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629914/YARN-1734.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3126//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3126//console This message is automatically generated. RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1588) Rebind NM tokens for previous attempt's running containers to the new attempt
[ https://issues.apache.org/jira/browse/YARN-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1588: -- Attachment: YARN-1588.3.patch Rebind NM tokens for previous attempt's running containers to the new attempt - Key: YARN-1588 URL: https://issues.apache.org/jira/browse/YARN-1588 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1588.1.patch, YARN-1588.1.patch, YARN-1588.2.patch, YARN-1588.3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1588) Rebind NM tokens for previous attempt's running containers to the new attempt
[ https://issues.apache.org/jira/browse/YARN-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906361#comment-13906361 ] Jian He commented on YARN-1588: --- Fixed the naming getContainersFromPreviousAttempt to be plural and rebased on top of YARN-713. Rebind NM tokens for previous attempt's running containers to the new attempt - Key: YARN-1588 URL: https://issues.apache.org/jira/browse/YARN-1588 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1588.1.patch, YARN-1588.1.patch, YARN-1588.2.patch, YARN-1588.3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1731) ResourceManager should record killed ApplicationMasters for History
[ https://issues.apache.org/jira/browse/YARN-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-1731: Attachment: YARN-1731.patch Updated patch ResourceManager should record killed ApplicationMasters for History --- Key: YARN-1731 URL: https://issues.apache.org/jira/browse/YARN-1731 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-1731.patch, YARN-1731.patch Yarn changes required for MAPREDUCE-5641 to make the RM record when an AM is killed so the JHS (or something else) can know about it). See MAPREDUCE-5641 for the design I'm trying to follow. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906425#comment-13906425 ] Vinod Kumar Vavilapalli commented on YARN-713: -- Yes, I did see the issue on branch-2.4 during review itself and fixed it manually. Forgot during commit. Fixing it rightaway. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Fix For: 2.4.0 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.3.patch, YARN-713.4.patch, YARN-713.5.patch, YARN-713.6.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906445#comment-13906445 ] Xuan Gong commented on YARN-1734: - bq. Why is refreshAdminAcls() required to be done when transitioning state? It is possible that previous active rm has updated the AdminAcls. In that case, the current user may not have permission to do transitionToActive or transitionToStandby. That is why I want to do the checking before transitioning the state. RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906452#comment-13906452 ] Vinod Kumar Vavilapalli commented on YARN-713: -- Done. Compiled branches branch-2 and branch-2.4 and things look okay now. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Fix For: 2.4.0 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.3.patch, YARN-713.4.patch, YARN-713.5.patch, YARN-713.6.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1588) Rebind NM tokens for previous attempt's running containers to the new attempt
[ https://issues.apache.org/jira/browse/YARN-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906458#comment-13906458 ] Hadoop QA commented on YARN-1588: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629939/YARN-1588.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3128//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3128//console This message is automatically generated. Rebind NM tokens for previous attempt's running containers to the new attempt - Key: YARN-1588 URL: https://issues.apache.org/jira/browse/YARN-1588 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1588.1.patch, YARN-1588.1.patch, YARN-1588.2.patch, YARN-1588.3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1734: Attachment: YARN-1734.3.patch RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906468#comment-13906468 ] Arpit Agarwal commented on YARN-713: Thanks Vinod! ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Fix For: 2.4.0 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.3.patch, YARN-713.4.patch, YARN-713.5.patch, YARN-713.6.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1734: Attachment: YARN-1734.4.patch RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906526#comment-13906526 ] Xuan Gong commented on YARN-1734: - throw out the IOException instead of just log exceptions. {code} try { refreshAdminAcls(false); } catch (YarnException ex) { throw new IOException(Can not execute refreshAdminAcls, ex); } {code} RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cindy Li updated YARN-1525: --- Attachment: YARN1525.patch Vinod, made changes according to your comment. resetting RM_ID allows me to find the address of the corresponding RM_ID. But I set it back afterwards. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906547#comment-13906547 ] Hadoop QA commented on YARN-1734: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629961/YARN-1734.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3129//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3129//console This message is automatically generated. RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1363) Get / Cancel / Renew delegation token api should be non blocking
[ https://issues.apache.org/jira/browse/YARN-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1363: -- Attachment: YARN-1363.6.patch Create a new patch: 1. Update against the latest trunk 2. Refactor some code 3. Make cancel/renew in RMDelegationTokenIndentifier async as well 4. Fix some test issues. Get / Cancel / Renew delegation token api should be non blocking Key: YARN-1363 URL: https://issues.apache.org/jira/browse/YARN-1363 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Zhijie Shen Attachments: YARN-1363.1.patch, YARN-1363.2.patch, YARN-1363.3.patch, YARN-1363.4.patch, YARN-1363.5.patch, YARN-1363.6.patch Today GetDelgationToken, CancelDelegationToken and RenewDelegationToken are all blocking apis. * As a part of these calls we try to update RMStateStore and that may slow it down. * Now as we have limited number of client request handlers we may fill up client handlers quickly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906588#comment-13906588 ] Hadoop QA commented on YARN-1734: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629982/YARN-1734.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3131//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3131//console This message is automatically generated. RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1734: Attachment: YARN-1734.5.patch RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906592#comment-13906592 ] Xuan Gong commented on YARN-1734: - should do the same (throw out the IOException instead of just log exceptions) for other refresh*s RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906591#comment-13906591 ] Hadoop QA commented on YARN-1525: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629988/YARN1525.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3130//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3130//console This message is automatically generated. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906614#comment-13906614 ] Hadoop QA commented on YARN-1734: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629992/YARN-1734.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3133//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3133//console This message is automatically generated. RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1410: Attachment: YARN-1410.4.patch Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906632#comment-13906632 ] Xuan Gong commented on YARN-1410: - Offline discussed with [~vinodkv]. We still use the check duplication before submit application method. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1726) ResourceSchedulerWrapper failed due to the AbstractYarnScheduler introduced in YARN-1041
[ https://issues.apache.org/jira/browse/YARN-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906646#comment-13906646 ] Wei Yan commented on YARN-1726: --- Thanks, [~vinodkv]. Sure, I'll update a testcase. ResourceSchedulerWrapper failed due to the AbstractYarnScheduler introduced in YARN-1041 Key: YARN-1726 URL: https://issues.apache.org/jira/browse/YARN-1726 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Attachments: YARN-1726.patch The YARN scheduler simulator failed when running Fair Scheduler, due to AbstractYarnScheduler introduced in YARN-1041. The ResourceSchedulerWrapper should inherit AbstractYarnScheduler, instead of implementing ResourceScheduler interface directly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906664#comment-13906664 ] Hadoop QA commented on YARN-1410: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629998/YARN-1410.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3134//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3134//console This message is automatically generated. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906670#comment-13906670 ] Xuan Gong commented on YARN-1410: - test case failure is not related Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1363) Get / Cancel / Renew delegation token api should be non blocking
[ https://issues.apache.org/jira/browse/YARN-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906669#comment-13906669 ] Hadoop QA commented on YARN-1363: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629991/YARN-1363.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3132//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3132//console This message is automatically generated. Get / Cancel / Renew delegation token api should be non blocking Key: YARN-1363 URL: https://issues.apache.org/jira/browse/YARN-1363 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Zhijie Shen Attachments: YARN-1363.1.patch, YARN-1363.2.patch, YARN-1363.3.patch, YARN-1363.4.patch, YARN-1363.5.patch, YARN-1363.6.patch Today GetDelgationToken, CancelDelegationToken and RenewDelegationToken are all blocking apis. * As a part of these calls we try to update RMStateStore and that may slow it down. * Now as we have limited number of client request handlers we may fill up client handlers quickly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906703#comment-13906703 ] Bikas Saha commented on YARN-1410: -- What are the pros and cons of duplicate checking before submission vs saving the RPC request id along with the stored application submission context? The big con of dup checking before submission is adding an extra hop that is pure overhead in 99+% of the submissions. Unrelated to the above choice what is the decision on the annotations issue raised by Karthik above? Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)