[jira] [Commented] (YARN-1527) yarn rmadmin command prints wrong usage info:
[ https://issues.apache.org/jira/browse/YARN-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857375#comment-13857375 ] Akira AJISAKA commented on YARN-1527: - Thank you for reviewing and committing, [~jianhe]! yarn rmadmin command prints wrong usage info: - Key: YARN-1527 URL: https://issues.apache.org/jira/browse/YARN-1527 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jian He Assignee: Akira AJISAKA Labels: newbie Fix For: 2.4.0 Attachments: YARN-1527.patch The usage should be: yarn rmadmin, instead of java RMAdmin, and the -refreshQueues should be in the second line. {code} Usage: java RMAdmin -refreshQueues -refreshNodes -refreshSuperUserGroupsConfiguration -refreshUserToGroupsMappings -refreshAdminAcls -refreshServiceAcl -getGroups [username] -help [cmd] -transitionToActive serviceId -transitionToStandby serviceId -failover [--forcefence] [--forceactive] serviceId serviceId -getServiceState serviceId -checkHealth serviceId {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1474) Make schedulers services
[ https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1474: - Attachment: YARN-1474.1.patch Created a patch based on YARN-1172's approach. Make schedulers services Key: YARN-1474 URL: https://issues.apache.org/jira/browse/YARN-1474 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1474.1.patch Schedulers currently have a reinitialize but no start and stop. Fitting them into the YARN service model would make things more coherent. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1540) Add an easy way to turn on HA
[ https://issues.apache.org/jira/browse/YARN-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857380#comment-13857380 ] Tsuyoshi OZAWA commented on YARN-1540: -- +1. As I mentioned in YARN-dev ML, we need to set lots configs now like this: https://gist.github.com/oza/7055279 Add an easy way to turn on HA - Key: YARN-1540 URL: https://issues.apache.org/jira/browse/YARN-1540 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Users will have to modify the configuration significantly to turn on HA. It would be nice to have a simpler way of doing this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1481) Move internal services logic from AdminService to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1481: --- Attachment: yarn-1481-addendum.patch Addendum patch that fixes the synchronization for AdminService#isRMActive. Move internal services logic from AdminService to ResourceManager - Key: YARN-1481 URL: https://issues.apache.org/jira/browse/YARN-1481 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.4.0 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, yarn-1481-addendum.patch This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues - Not easy to follow RM's service life cycle -- RM adds only AdminService as its service directly. -- Other services are added to RM when AdminService's init calls RM.activeServices.init() - Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Reopened] (YARN-1481) Move internal services logic from AdminService to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reopened YARN-1481: Re-opening to fix the synchronization. Move internal services logic from AdminService to ResourceManager - Key: YARN-1481 URL: https://issues.apache.org/jira/browse/YARN-1481 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.4.0 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, yarn-1481-addendum.patch This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues - Not easy to follow RM's service life cycle -- RM adds only AdminService as its service directly. -- Other services are added to RM when AdminService's init calls RM.activeServices.init() - Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857386#comment-13857386 ] Karthik Kambatla commented on YARN-1029: bq. Please take care of it wherever appropriate. Re-opened YARN-1481 to take care of it there. If it isn't too much trouble, please take a look at it. bq. Again, if we organize the newly added code such that its a common event for any module to inform the RM about a fatal error then we are good for the future. Embedded elector can use that event instead of a custom named event. Oh! I understand it now - will add a RMFatalErrorEvent, the handler for which just terminates the RM. And, update RMStateStoreOperationFailedEvent to use that event instead of calling terminate directly. bq. I am sorry I could not understand your comment explaining how the test passes with these timeouts. # ZK timeout comes from RM_ZK_TIMEOUT_MS (2 seconds), the failover could take as long as this. MiniYARNCluster#getActiveRMIndex() waits for this duration to find the active RM. # NM-RM connection is verified after a successful failover. The timeout there corresponds to the maximum time taken by failovers until the NM connects to an RM. 5 seconds seems a long enough time for this. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857390#comment-13857390 ] Hadoop QA commented on YARN-1481: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620598/yarn-1481-addendum.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2737//console This message is automatically generated. Move internal services logic from AdminService to ResourceManager - Key: YARN-1481 URL: https://issues.apache.org/jira/browse/YARN-1481 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.4.0 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, yarn-1481-addendum.patch This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues - Not easy to follow RM's service life cycle -- RM adds only AdminService as its service directly. -- Other services are added to RM when AdminService's init calls RM.activeServices.init() - Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1474) Make schedulers services
[ https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857394#comment-13857394 ] Hadoop QA commented on YARN-1474: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620597/YARN-1474.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2736//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2736//console This message is automatically generated. Make schedulers services Key: YARN-1474 URL: https://issues.apache.org/jira/browse/YARN-1474 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1474.1.patch Schedulers currently have a reinitialize but no start and stop. Fitting them into the YARN service model would make things more coherent. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-647) historyServer can't show container's log when aggregation is not enabled
[ https://issues.apache.org/jira/browse/YARN-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857414#comment-13857414 ] shenhong commented on YARN-647: --- Thanks Zhijie! Like caolong, we also set yarn.nodemanager.log.retain-seconds=259200, so NM local logs won't be deleted after container stop, I think if yarn.log-aggregation-enable=false and yarn.nodemanager.log.retain-seconds0, we can change the logsLink . historyServer can't show container's log when aggregation is not enabled Key: YARN-647 URL: https://issues.apache.org/jira/browse/YARN-647 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0 Environment: yarn.log-aggregation-enable=false , HistoryServer will show like this: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Reporter: shenhong Assignee: shenhong Attachments: yarn-647.patch When yarn.log-aggregation-enable is seted to false, after a MR_App complete, we can't view the container's log from the HistoryServer, it shows message like: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Since we don't want to aggregate the container's log, because it will be a pressure to namenode. but sometimes we also want to take a look at container's log. Should we show the container's log across HistoryServer even if yarn.log-aggregation-enable is seted to false. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-647) historyServer can't show container's log when aggregation is not enabled
[ https://issues.apache.org/jira/browse/YARN-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong updated YARN-647: -- Attachment: yarn-647-2.patch add a new patch historyServer can't show container's log when aggregation is not enabled Key: YARN-647 URL: https://issues.apache.org/jira/browse/YARN-647 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0 Environment: yarn.log-aggregation-enable=false , HistoryServer will show like this: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Reporter: shenhong Assignee: shenhong Attachments: yarn-647-2.patch, yarn-647.patch When yarn.log-aggregation-enable is seted to false, after a MR_App complete, we can't view the container's log from the HistoryServer, it shows message like: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Since we don't want to aggregate the container's log, because it will be a pressure to namenode. but sometimes we also want to take a look at container's log. Should we show the container's log across HistoryServer even if yarn.log-aggregation-enable is seted to false. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-647) historyServer can't show container's log when aggregation is not enabled
[ https://issues.apache.org/jira/browse/YARN-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857431#comment-13857431 ] Hadoop QA commented on YARN-647: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620602/yarn-647-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2738//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2738//console This message is automatically generated. historyServer can't show container's log when aggregation is not enabled Key: YARN-647 URL: https://issues.apache.org/jira/browse/YARN-647 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0 Environment: yarn.log-aggregation-enable=false , HistoryServer will show like this: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Reporter: shenhong Assignee: shenhong Attachments: yarn-647-2.patch, yarn-647.patch When yarn.log-aggregation-enable is seted to false, after a MR_App complete, we can't view the container's log from the HistoryServer, it shows message like: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Since we don't want to aggregate the container's log, because it will be a pressure to namenode. but sometimes we also want to take a look at container's log. Should we show the container's log across HistoryServer even if yarn.log-aggregation-enable is seted to false. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1527) yarn rmadmin command prints wrong usage info:
[ https://issues.apache.org/jira/browse/YARN-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857438#comment-13857438 ] Hudson commented on YARN-1527: -- FAILURE: Integrated in Hadoop-Yarn-trunk #434 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/434/]) YARN-1527. Fixed yarn rmadmin command to print the correct usage info. Contributed by Akira AJISAKA. (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553596) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java yarn rmadmin command prints wrong usage info: - Key: YARN-1527 URL: https://issues.apache.org/jira/browse/YARN-1527 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jian He Assignee: Akira AJISAKA Labels: newbie Fix For: 2.4.0 Attachments: YARN-1527.patch The usage should be: yarn rmadmin, instead of java RMAdmin, and the -refreshQueues should be in the second line. {code} Usage: java RMAdmin -refreshQueues -refreshNodes -refreshSuperUserGroupsConfiguration -refreshUserToGroupsMappings -refreshAdminAcls -refreshServiceAcl -getGroups [username] -help [cmd] -transitionToActive serviceId -transitionToStandby serviceId -failover [--forcefence] [--forceactive] serviceId serviceId -getServiceState serviceId -checkHealth serviceId {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1523) Use StandbyException instead of RMNotYetReadyException
[ https://issues.apache.org/jira/browse/YARN-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857436#comment-13857436 ] Hudson commented on YARN-1523: -- FAILURE: Integrated in Hadoop-Yarn-trunk #434 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/434/]) YARN-1523. Use StandbyException instead of RMNotYetReadyException (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553616) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/RMNotYetActiveException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java Use StandbyException instead of RMNotYetReadyException -- Key: YARN-1523 URL: https://issues.apache.org/jira/browse/YARN-1523 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Bikas Saha Assignee: Karthik Kambatla Fix For: 2.4.0 Attachments: yarn-1523-1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1527) yarn rmadmin command prints wrong usage info:
[ https://issues.apache.org/jira/browse/YARN-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857485#comment-13857485 ] Hudson commented on YARN-1527: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1625 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1625/]) YARN-1527. Fixed yarn rmadmin command to print the correct usage info. Contributed by Akira AJISAKA. (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553596) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java yarn rmadmin command prints wrong usage info: - Key: YARN-1527 URL: https://issues.apache.org/jira/browse/YARN-1527 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jian He Assignee: Akira AJISAKA Labels: newbie Fix For: 2.4.0 Attachments: YARN-1527.patch The usage should be: yarn rmadmin, instead of java RMAdmin, and the -refreshQueues should be in the second line. {code} Usage: java RMAdmin -refreshQueues -refreshNodes -refreshSuperUserGroupsConfiguration -refreshUserToGroupsMappings -refreshAdminAcls -refreshServiceAcl -getGroups [username] -help [cmd] -transitionToActive serviceId -transitionToStandby serviceId -failover [--forcefence] [--forceactive] serviceId serviceId -getServiceState serviceId -checkHealth serviceId {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1523) Use StandbyException instead of RMNotYetReadyException
[ https://issues.apache.org/jira/browse/YARN-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857483#comment-13857483 ] Hudson commented on YARN-1523: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1625 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1625/]) YARN-1523. Use StandbyException instead of RMNotYetReadyException (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553616) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/RMNotYetActiveException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java Use StandbyException instead of RMNotYetReadyException -- Key: YARN-1523 URL: https://issues.apache.org/jira/browse/YARN-1523 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Bikas Saha Assignee: Karthik Kambatla Fix For: 2.4.0 Attachments: yarn-1523-1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1523) Use StandbyException instead of RMNotYetReadyException
[ https://issues.apache.org/jira/browse/YARN-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857507#comment-13857507 ] Hudson commented on YARN-1523: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1651 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1651/]) YARN-1523. Use StandbyException instead of RMNotYetReadyException (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553616) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/RMNotYetActiveException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java Use StandbyException instead of RMNotYetReadyException -- Key: YARN-1523 URL: https://issues.apache.org/jira/browse/YARN-1523 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Bikas Saha Assignee: Karthik Kambatla Fix For: 2.4.0 Attachments: yarn-1523-1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1527) yarn rmadmin command prints wrong usage info:
[ https://issues.apache.org/jira/browse/YARN-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857509#comment-13857509 ] Hudson commented on YARN-1527: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1651 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1651/]) YARN-1527. Fixed yarn rmadmin command to print the correct usage info. Contributed by Akira AJISAKA. (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553596) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java yarn rmadmin command prints wrong usage info: - Key: YARN-1527 URL: https://issues.apache.org/jira/browse/YARN-1527 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jian He Assignee: Akira AJISAKA Labels: newbie Fix For: 2.4.0 Attachments: YARN-1527.patch The usage should be: yarn rmadmin, instead of java RMAdmin, and the -refreshQueues should be in the second line. {code} Usage: java RMAdmin -refreshQueues -refreshNodes -refreshSuperUserGroupsConfiguration -refreshUserToGroupsMappings -refreshAdminAcls -refreshServiceAcl -getGroups [username] -help [cmd] -transitionToActive serviceId -transitionToStandby serviceId -failover [--forcefence] [--forceactive] serviceId serviceId -getServiceState serviceId -checkHealth serviceId {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1481) Move internal services logic from AdminService to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1481: --- Attachment: yarn-1481-addendum.patch Move internal services logic from AdminService to ResourceManager - Key: YARN-1481 URL: https://issues.apache.org/jira/browse/YARN-1481 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.4.0 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, yarn-1481-addendum.patch, yarn-1481-addendum.patch This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues - Not easy to follow RM's service life cycle -- RM adds only AdminService as its service directly. -- Other services are added to RM when AdminService's init calls RM.activeServices.init() - Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857594#comment-13857594 ] Hadoop QA commented on YARN-1481: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620625/yarn-1481-addendum.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2739//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2739//console This message is automatically generated. Move internal services logic from AdminService to ResourceManager - Key: YARN-1481 URL: https://issues.apache.org/jira/browse/YARN-1481 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.4.0 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, yarn-1481-addendum.patch, yarn-1481-addendum.patch This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues - Not easy to follow RM's service life cycle -- RM adds only AdminService as its service directly. -- Other services are added to RM when AdminService's init calls RM.activeServices.init() - Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857611#comment-13857611 ] Sandy Ryza commented on YARN-1481: -- +1 to the addendum patch Move internal services logic from AdminService to ResourceManager - Key: YARN-1481 URL: https://issues.apache.org/jira/browse/YARN-1481 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.4.0 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, yarn-1481-addendum.patch, yarn-1481-addendum.patch This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues - Not easy to follow RM's service life cycle -- RM adds only AdminService as its service directly. -- Other services are added to RM when AdminService's init calls RM.activeServices.init() - Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1545) [Umbrella] Prevent DoS of YARN components by putting in limits
Vinod Kumar Vavilapalli created YARN-1545: - Summary: [Umbrella] Prevent DoS of YARN components by putting in limits Key: YARN-1545 URL: https://issues.apache.org/jira/browse/YARN-1545 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli I did a pass and found many places that can cause DoS on various YARN services. Need to fix them. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1546) Prevent DoS of ApplicationClientProtocol by putting in limits
Vinod Kumar Vavilapalli created YARN-1546: - Summary: Prevent DoS of ApplicationClientProtocol by putting in limits Key: YARN-1546 URL: https://issues.apache.org/jira/browse/YARN-1546 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Points of DoS in ApplicationClientProtocol - Get new AppId: App IDs are ints and so can be exhausted. - Unbounded submit-app playload: queueName, ContainerLaunchContext fields etc in ApplicationSubmissionContext - Unbounded byte-buffers as part of tokens during renew/cancel Delegation Token -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1547) Prevent DoS of ApplicationMasterProtocol by putting in limits
Vinod Kumar Vavilapalli created YARN-1547: - Summary: Prevent DoS of ApplicationMasterProtocol by putting in limits Key: YARN-1547 URL: https://issues.apache.org/jira/browse/YARN-1547 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Points of DoS in ApplicationMasterProtocol - Host and trackingURL in RegisterApplicationMasterRequest - Diagnostics, final trackingURL in FinishApplicationMasterRequest - Unlimited number of resourceAsks, containersToBeReleased and resourceBlacklistRequest in AllocateRequest -- Unbounded number of priorities and/or resourceRequests in each ask. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1548) Prevent DoS of ContainerManagementProtocol by putting in limits
Vinod Kumar Vavilapalli created YARN-1548: - Summary: Prevent DoS of ContainerManagementProtocol by putting in limits Key: YARN-1548 URL: https://issues.apache.org/jira/browse/YARN-1548 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Points of concern in ContainerManagementProtocol - Unbounded number of requests in StartContainersRequest - Similarly, StartContainerRequest can have uncontrolled token buffers and localResources, environment, commands, serviceData acls as part of ContainerLaunchContext - Unlimited number of requests in StopContainersRequest and GetContainerStatusesRequest -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.
[ https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1541: -- Attachment: YARN-1541.2.patch Added test in TestRM to test AM host and roc port are invalidated after attempt is done. Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information. Key: YARN-1541 URL: https://issues.apache.org/jira/browse/YARN-1541 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-1541.1.patch, YARN-1541.2.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1493: -- Attachment: YARN-1493.7.patch Upload the patch to only make schedulers aware of the app, this patch is the same as YARN-1493.4.patch Schedulers don't recognize apps separately from app-attempts Key: YARN-1493 URL: https://issues.apache.org/jira/browse/YARN-1493 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch Today, scheduler is tied to attempt only. We need to separate app-level handling logic in scheduler. We can add new app-level events to the scheduler and separate the app-level logic out. This is good for work-preserving AM restart, RM restart, and also needed for differentiating app-level metrics and attempt-level metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1493: -- Attachment: YARN-1493.7.patch Schedulers don't recognize apps separately from app-attempts Key: YARN-1493 URL: https://issues.apache.org/jira/browse/YARN-1493 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch Today, scheduler is tied to attempt only. We need to separate app-level handling logic in scheduler. We can add new app-level events to the scheduler and separate the app-level logic out. This is good for work-preserving AM restart, RM restart, and also needed for differentiating app-level metrics and attempt-level metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857672#comment-13857672 ] Jian He commented on YARN-1493: --- New patch changes: - Make schedulers send the App_accepted/App_rejected event to the RMApp instead of RMAppAttempt. - Create two new events AppAddedSchedulerEvent and AppRemovedSchedulerEvent for adding and removing apps in the schedulers. - Change the state transition to start a new attempt until the app is accepted by the scheduler. - Rename SchedulerApplication to SchedulerApplicationAttempt, and created a new SchedulerApplication for keeping track of the app-specific info. Schedulers don't recognize apps separately from app-attempts Key: YARN-1493 URL: https://issues.apache.org/jira/browse/YARN-1493 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch Today, scheduler is tied to attempt only. We need to separate app-level handling logic in scheduler. We can add new app-level events to the scheduler and separate the app-level logic out. This is good for work-preserving AM restart, RM restart, and also needed for differentiating app-level metrics and attempt-level metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.
[ https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857674#comment-13857674 ] Hadoop QA commented on YARN-1541: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620632/YARN-1541.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2740//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2740//console This message is automatically generated. Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information. Key: YARN-1541 URL: https://issues.apache.org/jira/browse/YARN-1541 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-1541.1.patch, YARN-1541.2.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857677#comment-13857677 ] Hadoop QA commented on YARN-1493: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620638/YARN-1493.7.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2741//console This message is automatically generated. Schedulers don't recognize apps separately from app-attempts Key: YARN-1493 URL: https://issues.apache.org/jira/browse/YARN-1493 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch Today, scheduler is tied to attempt only. We need to separate app-level handling logic in scheduler. We can add new app-level events to the scheduler and separate the app-level logic out. This is good for work-preserving AM restart, RM restart, and also needed for differentiating app-level metrics and attempt-level metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1539) Queue admin ACLs should NOT be similar to submit-acls w.r.t hierarchy.
[ https://issues.apache.org/jira/browse/YARN-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1539: -- Priority: Major (was: Critical) Hm.. you are right. I check my notes again and cross-verified in code. The real bug my notes had was about queue-admins needing acls across the hierarchy to be able to submit jobs. Irrespective of the operation (submit/kill), if I am a queue-admin for a leaf-queue, I should be able to simply perform the operation irrespective of my permissions on the parent queue. Lowering priority given this. Can you please confirm FairScheduler's behaviour w.r.r this? We should refactoring this to be non-scheduler specific. Queue admin ACLs should NOT be similar to submit-acls w.r.t hierarchy. -- Key: YARN-1539 URL: https://issues.apache.org/jira/browse/YARN-1539 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Today, Queue admin ACLs are similar to submit-acls w.r.t hierarchy in that if one has to be able to administer a queue, he/she should be an admin of all the queues in the ancestry - an unnecessary burden. This was added in YARN-899 and I believe is wrong semantics as well as implementation. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.
[ https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857695#comment-13857695 ] Vinod Kumar Vavilapalli commented on YARN-1541: --- It's interesting to realize via the test that we are invalidating host/port information for finished apps. We shouldn't perhaps be doing that for succeeded apps? Apps may use this similar to final-tracking URL after YARN-1225. In addition one more place where this invalidation is really helpful is when an AM crashed or expired and client doesn't get the same address while the RM is in the process of launching a new AM. Can you add a test for that? Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information. Key: YARN-1541 URL: https://issues.apache.org/jira/browse/YARN-1541 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-1541.1.patch, YARN-1541.2.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857702#comment-13857702 ] Sandy Ryza commented on YARN-1493: -- Thanks for clarifying about the FinalTransition, Jian. Took another look at the patch and I'm +1 once it applies and the findbugs issues are resolved. Schedulers don't recognize apps separately from app-attempts Key: YARN-1493 URL: https://issues.apache.org/jira/browse/YARN-1493 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch Today, scheduler is tied to attempt only. We need to separate app-level handling logic in scheduler. We can add new app-level events to the scheduler and separate the app-level logic out. This is good for work-preserving AM restart, RM restart, and also needed for differentiating app-level metrics and attempt-level metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing
[ https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857726#comment-13857726 ] Jian He commented on YARN-1121: --- Now that we have the isThreadAlive check, I also believe it will prevent serviceStop from getting stuck. RMStateStore should flush all pending store events before closing - Key: YARN-1121 URL: https://issues.apache.org/jira/browse/YARN-1121 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1121.1.patch, YARN-1121.10.patch, YARN-1121.2.patch, YARN-1121.2.patch, YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, YARN-1121.6.patch, YARN-1121.7.patch, YARN-1121.8.patch, YARN-1121.9.patch on serviceStop it should wait for all internal pending events to drain before stopping. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857740#comment-13857740 ] Hudson commented on YARN-1481: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4931 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4931/]) YARN-1481. Addendum patch to fix synchronization in AdminService (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553738) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java Move internal services logic from AdminService to ResourceManager - Key: YARN-1481 URL: https://issues.apache.org/jira/browse/YARN-1481 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.4.0 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, yarn-1481-addendum.patch, yarn-1481-addendum.patch This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues - Not easy to follow RM's service life cycle -- RM adds only AdminService as its service directly. -- Other services are added to RM when AdminService's init calls RM.activeServices.init() - Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1029: --- Attachment: yarn-1029-5.patch Patch that adds RMFatalEvent to handle all fatal RM events, embedded elector uses this. [~vinodkv], I believe the latest patch addresses all of Bikas' comments. Please take a look at the patch (at least MiniYARNCluster changes) when possible. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1474) Make schedulers services
[ https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857746#comment-13857746 ] Karthik Kambatla commented on YARN-1474: I don't think the approach used in YARN-1172 is the best approach for schedulers. We have different schedulers with different internal states and hence differ in their setup and cleanup. Also, given they are all part of YARN, we could force them all individually to be services. Here, it might make more sense to convert existing schedulers to services before changing the corresponding instantiation code in the ResourceManager. [~sandyr], [~vinodkv] - thoughts? Make schedulers services Key: YARN-1474 URL: https://issues.apache.org/jira/browse/YARN-1474 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1474.1.patch Schedulers currently have a reinitialize but no start and stop. Fitting them into the YARN service model would make things more coherent. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.
[ https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857761#comment-13857761 ] Jian He commented on YARN-1541: --- bq. We shouldn't perhaps be doing that for succeeded apps? Yup, new patch not invalidating the host and port for succeeded apps Also added the test when between AM restarts. Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information. Key: YARN-1541 URL: https://issues.apache.org/jira/browse/YARN-1541 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.
[ https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857764#comment-13857764 ] Hadoop QA commented on YARN-1541: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620659/YARN-1541.3.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2743//console This message is automatically generated. Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information. Key: YARN-1541 URL: https://issues.apache.org/jira/browse/YARN-1541 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.
[ https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1541: -- Attachment: YARN-1541.3.patch Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information. Key: YARN-1541 URL: https://issues.apache.org/jira/browse/YARN-1541 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch, YARN-1541.3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857776#comment-13857776 ] Hadoop QA commented on YARN-1463: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620590/YARN-1463-20131226.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2742//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2742//console This message is automatically generated. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1463-20131226.txt, YARN-1463.v1.patch, YARN-1463.v2.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857785#comment-13857785 ] Karthik Kambatla commented on YARN-1463: Thanks for taking this up, Vinod. The changes look good to me. +1. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1463-20131226.txt, YARN-1463.v1.patch, YARN-1463.v2.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857786#comment-13857786 ] Karthik Kambatla commented on YARN-1463: Will commit this later in the day, if no one has any objections. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1463-20131226.txt, YARN-1463.v1.patch, YARN-1463.v2.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.
[ https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857798#comment-13857798 ] Hadoop QA commented on YARN-1541: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620662/YARN-1541.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2744//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2744//console This message is automatically generated. Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information. Key: YARN-1541 URL: https://issues.apache.org/jira/browse/YARN-1541 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch, YARN-1541.3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857831#comment-13857831 ] Jian He commented on YARN-1493: --- bq. When submission is rejected by a parent queue, you need to call removeApplication. This existed before but your patch removed it. The earlier addApplication is renamed to addApplicationAttempt, this addApplicationAttempt is called when the SchedulerAttemptAddedEvent comes. So we are not adding or removing any application data structure in the leaf queue at all, we are adding/removing attempt in the leaf queue. bq. finishApplicationAttempt: Should Inform the parent queue so that it can call finishApplicationAttempt itself. Similarly for submitApplicationAttempt. ParetQueue’s finishApplicationAttempt and submitApplicationAttempt logic is empty, ParetQueue only deal with app-specific logic in the current implementation. Do we still want to call parentQueue in attempt-specific APIs? bq. We shouldn’t move to ACCEPTED directly before informing scheduler in case of recovery? YARN-1507 is saving the application after app is accepted. So after YARN-1507, an app is saved meaning it is accepted. Maybe leave it for now and fix it in YARN-1507 ? bq. RMAppEventType.ATTEMPT_FAILED event should not come in at ACCEPTED state? This is possible because, RMAppRecoveredTransition is changed to return ACCEPTED state, and waiting for the AttemptFailed event to come (waiting for the previous AM to exit) I changed it to ACCEPTED state instead of RUNNING because as said after YARN-1507, an app is saved meaning it is ACCEPTED. the app may not necessarily be at RUNNING state earlier. bq. When can this happen? During recovery? May be we should fix that correctly? This can happen because I changed app to return ACCEPTED state on recovery, and on recovery the app once again go through the scheduler and triggers one more APP_ACCEPTED event at ACCEPTED state. bq. TestFairScheduler: Why the conditional? because testAclSubmitApplication is testing app2 to be null (AssertNull(The application was allowed, app2)), the app is rejected and no app exists. Schedulers don't recognize apps separately from app-attempts Key: YARN-1493 URL: https://issues.apache.org/jira/browse/YARN-1493 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch Today, scheduler is tied to attempt only. We need to separate app-level handling logic in scheduler. We can add new app-level events to the scheduler and separate the app-level logic out. This is good for work-preserving AM restart, RM restart, and also needed for differentiating app-level metrics and attempt-level metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857839#comment-13857839 ] Jian He commented on YARN-1493: --- bq. The information about submission to various queues is lost? It is not needed? Investigated that the queue passed in the earlier code is only used in ParentQueue.submitApplication for comparing the queue's name with the parentQueue's name for validation. None of the test is using this requirement. And Also, for every single test, the test is using the same queue. Schedulers don't recognize apps separately from app-attempts Key: YARN-1493 URL: https://issues.apache.org/jira/browse/YARN-1493 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch Today, scheduler is tied to attempt only. We need to separate app-level handling logic in scheduler. We can add new app-level events to the scheduler and separate the app-level logic out. This is good for work-preserving AM restart, RM restart, and also needed for differentiating app-level metrics and attempt-level metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.
[ https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1541: -- Attachment: YARN-1541.4.patch The test was failing because, the unit test timeout was set too small, Upload a new patch. Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information. Key: YARN-1541 URL: https://issues.apache.org/jira/browse/YARN-1541 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch, YARN-1541.3.patch, YARN-1541.4.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1461: --- Attachment: yarn-1461-3.patch Updated patch for REST based accesses to work. Verified on a pseudo-dist cluster along with the patch posted for MAPREDUCE-5699. RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857860#comment-13857860 ] Hadoop QA commented on YARN-1461: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620677/yarn-1461-3.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2746//console This message is automatically generated. RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.
[ https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857863#comment-13857863 ] Hadoop QA commented on YARN-1541: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620676/YARN-1541.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2745//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2745//console This message is automatically generated. Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information. Key: YARN-1541 URL: https://issues.apache.org/jira/browse/YARN-1541 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch, YARN-1541.3.patch, YARN-1541.4.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857888#comment-13857888 ] Vinod Kumar Vavilapalli commented on YARN-1493: --- bq. The earlier addApplication is renamed to addApplicationAttempt, this addApplicationAttempt is called when the SchedulerAttemptAddedEvent comes. So we are not adding or removing any application data structure in the leaf queue at all, we are adding/removing attempt in the leaf queue. Yeah, that is the current state. Let's do the same to future-proof it. bq. ParetQueue’s finishApplicationAttempt and submitApplicationAttempt logic is empty, ParetQueue only deal with app-specific logic in the current implementation. Do we still want to call parentQueue in attempt-specific APIs? Again, that's today. Let's do it the way one would see it in future. bq. YARN-1507 is saving the application after app is accepted. So after YARN-1507, an app is saved meaning it is accepted. Maybe leave it for now and fix it in YARN-1507 ? Sure, you should leave a code-comment nonetheless pointing to the JIRA. bq. I changed it to ACCEPTED state instead of RUNNING because as said after YARN-1507, an app is saved meaning it is ACCEPTED. the app may not necessarily be at RUNNING state earlier. Again, let's leave a code comment saying the same. bq. This can happen because I changed app to return ACCEPTED state on recovery, and on recovery the app once again go through the scheduler and triggers one more APP_ACCEPTED event at ACCEPTED state. Ditto. Schedulers don't recognize apps separately from app-attempts Key: YARN-1493 URL: https://issues.apache.org/jira/browse/YARN-1493 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch Today, scheduler is tied to attempt only. We need to separate app-level handling logic in scheduler. We can add new app-level events to the scheduler and separate the app-level logic out. This is good for work-preserving AM restart, RM restart, and also needed for differentiating app-level metrics and attempt-level metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857904#comment-13857904 ] Vinod Kumar Vavilapalli commented on YARN-1493: --- bq. When submission is rejected by a parent queue, you need to call removeApplication. This existed before but your patch removed it. I withdraw this comment, misread it. Also, in LeafQueue, ParentQueue etc, please put @Override annotations where necessary. It avoids confusion about internal and implemented methods. Schedulers don't recognize apps separately from app-attempts Key: YARN-1493 URL: https://issues.apache.org/jira/browse/YARN-1493 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch Today, scheduler is tied to attempt only. We need to separate app-level handling logic in scheduler. We can add new app-level events to the scheduler and separate the app-level logic out. This is good for work-preserving AM restart, RM restart, and also needed for differentiating app-level metrics and attempt-level metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing
[ https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857915#comment-13857915 ] Vinod Kumar Vavilapalli commented on YARN-1121: --- This is unnecessary locking to go into the heart of our dispatcher loop. Let's do it in the loop only if drainEventsOnStop is enabled? Even otherwise, I don't see how the code in the dispatcher loop is useful other than optimizing for the 1 sec wait in stop. I think the main change is checking in stop if the dispatcher thread is alive. Just doing that should be enough? {code} while (!drained eventHandlingThread.isAlive()) { Thread.sleep(1000); } {code} This should take care of all races and pay a small 1 second cost during stop for some race conditions. Instead of the locking cost for every event. Some JVMs do bias away the lock, but it can be avoided altogether.. RMStateStore should flush all pending store events before closing - Key: YARN-1121 URL: https://issues.apache.org/jira/browse/YARN-1121 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1121.1.patch, YARN-1121.10.patch, YARN-1121.2.patch, YARN-1121.2.patch, YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, YARN-1121.6.patch, YARN-1121.7.patch, YARN-1121.8.patch, YARN-1121.9.patch on serviceStop it should wait for all internal pending events to drain before stopping. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857916#comment-13857916 ] Vinod Kumar Vavilapalli commented on YARN-1399: --- Tags are a way to filter or search for applications. Let's not conflate with their display and allow users to give arbitrary formats. I'd push for case-insensitive, limited-length, and may be just ASCII charset. It seems limiting, but that's more than enough. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857918#comment-13857918 ] Vinod Kumar Vavilapalli commented on YARN-1399: --- One thing that just occurred to me. Tags or the source/group originally proposed won't help the oozie case as described on YARN-1390. Or to be more accurate, they make it unwieldy. Let's say oozie uses a tag workflow_123_566 for all apps in a workflow, any other application from any other user SHOULD not set that tag. Or run the risk of getting killed by oozie. That seems like unintended behavior. To avoid it, we'll need to depend on oozie to not kill as a privileged user. Further, I could make any other user's application-search cumbersome by reusing his/her tags for my own applications. Seems like the tag-search should be linked to and limited by some other entity like user - search for apps matching a tag for a given user/queue etc. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing
[ https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857956#comment-13857956 ] Bikas Saha commented on YARN-1121: -- From the patch is wasnt clear to me that the lock was in the main loop. I should have applied the patch to the code and seen the adjoining code. I agree it does not make sense to put a lock in the inner loop for every event. It may not be a 1 sec wait. These are all state store operations to an external system with an unbounded time to complete and one thread will be spinning all that time. Perhaps we could only to the notify when blockNewEvents is set to true. That would be when we are actually waiting for the drain signal. RMStateStore should flush all pending store events before closing - Key: YARN-1121 URL: https://issues.apache.org/jira/browse/YARN-1121 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1121.1.patch, YARN-1121.10.patch, YARN-1121.2.patch, YARN-1121.2.patch, YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, YARN-1121.6.patch, YARN-1121.7.patch, YARN-1121.8.patch, YARN-1121.9.patch on serviceStop it should wait for all internal pending events to drain before stopping. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1463) Tests should avoid starting http-server where possible or creates spnego keytab/principals
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1463: --- Summary: Tests should avoid starting http-server where possible or creates spnego keytab/principals (was: TestContainerManagerSecurity#testContainerManager fails) Tests should avoid starting http-server where possible or creates spnego keytab/principals -- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1463-20131226.txt, YARN-1463.v1.patch, YARN-1463.v2.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857975#comment-13857975 ] Bikas Saha commented on YARN-1029: -- Thanks for addressing the comments. I was expecting RMStateStoreOperationFailedEvent to be replaced by the new RMFatalErrorEvent just like the Embedded elector event got replaced. Not much use in the store sending an event to the RM and then the RM sending an event to itself again, right? Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857977#comment-13857977 ] Bikas Saha commented on YARN-1481: -- Its not clear why removing the synchronization is safe. This private method is called from multiple public methods. If the earlier behavior was that the Active state of the RM would not be visible until the locking setter method had completely transitioned the RM to active then the new code might change that behavior. The callers would see the RM as active as soon as the state variable changes, even though the RM might be in the process of becoming active. Move internal services logic from AdminService to ResourceManager - Key: YARN-1481 URL: https://issues.apache.org/jira/browse/YARN-1481 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.4.0 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, yarn-1481-addendum.patch, yarn-1481-addendum.patch This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues - Not easy to follow RM's service life cycle -- RM adds only AdminService as its service directly. -- Other services are added to RM when AdminService's init calls RM.activeServices.init() - Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857979#comment-13857979 ] Karthik Kambatla commented on YARN-1481: RMContextImpl has the HA state. Both getter and setter methods are synchronized on the HA state. bq. The callers would see the RM as active as soon as the state variable changes, even though the RM might be in the process of becoming active. The setter is called only after the corresponding transition. Ref: ResourceManager#transitionTo* Am I missing something? Move internal services logic from AdminService to ResourceManager - Key: YARN-1481 URL: https://issues.apache.org/jira/browse/YARN-1481 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.4.0 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, yarn-1481-addendum.patch, yarn-1481-addendum.patch This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues - Not easy to follow RM's service life cycle -- RM adds only AdminService as its service directly. -- Other services are added to RM when AdminService's init calls RM.activeServices.init() - Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857980#comment-13857980 ] Karthik Kambatla commented on YARN-1029: RMStateStoreOperationFailedEvent is not always fatal and might not require terminating the RM; events of type RMStateStoreOperationFailedEventType.FENCED require the RM to transition to standby, and terminate the RM if the transition fails. bq. Not much use in the store sending an event to the RM and then the RM sending an event to itself again, right? Right. That was the reason for my reluctance earlier. But, I guess this addresses any future fatal events. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)