[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting
[ https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817995#comment-13817995 ] Sandy Ryza commented on YARN-807: - Build is failing because of the runaway process / javah problem seen in other JIRAs > When querying apps by queue, iterating over all apps is inefficient and > limiting > - > > Key: YARN-807 > URL: https://issues.apache.org/jira/browse/YARN-807 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-807-1.patch, YARN-807.patch > > > The question "which apps are in queue x" can be asked via the RM REST APIs, > through the ClientRMService, and through the command line. In all these > cases, the question is answered by scanning through every RMApp and filtering > by the app's queue name. > All schedulers maintain a mapping of queues to applications. I think it > would make more sense to ask the schedulers which applications are in a given > queue. This is what was done in MR1. This would also have the advantage of > allowing a parent queue to return all the applications on leaf queues under > it, and allow queue name aliases, as in the way that "root.default" and > "default" refer to the same queue in the fair scheduler. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting
[ https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817989#comment-13817989 ] Hadoop QA commented on YARN-807: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612967/YARN-807-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2400//console This message is automatically generated. > When querying apps by queue, iterating over all apps is inefficient and > limiting > - > > Key: YARN-807 > URL: https://issues.apache.org/jira/browse/YARN-807 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-807-1.patch, YARN-807.patch > > > The question "which apps are in queue x" can be asked via the RM REST APIs, > through the ClientRMService, and through the command line. In all these > cases, the question is answered by scanning through every RMApp and filtering > by the app's queue name. > All schedulers maintain a mapping of queues to applications. I think it > would make more sense to ask the schedulers which applications are in a given > queue. This is what was done in MR1. This would also have the advantage of > allowing a parent queue to return all the applications on leaf queues under > it, and allow queue name aliases, as in the way that "root.default" and > "default" refer to the same queue in the fair scheduler. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting
[ https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817985#comment-13817985 ] Sandy Ryza commented on YARN-807: - Attaching a rebased patch. It also modifies getAppsInQueue to only return IDs, not the full scheduler application. I'd prefer not to have this depend on YARN-1317. If they're both ready at close to the same time, I'm happy to have YARN-1317 go in first and do the work of rebasing this on top of it. > When querying apps by queue, iterating over all apps is inefficient and > limiting > - > > Key: YARN-807 > URL: https://issues.apache.org/jira/browse/YARN-807 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-807-1.patch, YARN-807.patch > > > The question "which apps are in queue x" can be asked via the RM REST APIs, > through the ClientRMService, and through the command line. In all these > cases, the question is answered by scanning through every RMApp and filtering > by the app's queue name. > All schedulers maintain a mapping of queues to applications. I think it > would make more sense to ask the schedulers which applications are in a given > queue. This is what was done in MR1. This would also have the advantage of > allowing a parent queue to return all the applications on leaf queues under > it, and allow queue name aliases, as in the way that "root.default" and > "default" refer to the same queue in the fair scheduler. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting
[ https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-807: Attachment: YARN-807-1.patch > When querying apps by queue, iterating over all apps is inefficient and > limiting > - > > Key: YARN-807 > URL: https://issues.apache.org/jira/browse/YARN-807 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-807-1.patch, YARN-807.patch > > > The question "which apps are in queue x" can be asked via the RM REST APIs, > through the ClientRMService, and through the command line. In all these > cases, the question is answered by scanning through every RMApp and filtering > by the app's queue name. > All schedulers maintain a mapping of queues to applications. I think it > would make more sense to ask the schedulers which applications are in a given > queue. This is what was done in MR1. This would also have the advantage of > allowing a parent queue to return all the applications on leaf queues under > it, and allow queue name aliases, as in the way that "root.default" and > "default" refer to the same queue in the fair scheduler. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real
[ https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817975#comment-13817975 ] Hadoop QA commented on YARN-1210: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612966/YARN-1210.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2399//console This message is automatically generated. > During RM restart, RM should start a new attempt only when previous attempt > exits for real > -- > > Key: YARN-1210 > URL: https://issues.apache.org/jira/browse/YARN-1210 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Omkar Vinit Joshi > Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch, > YARN-1210.4.patch, YARN-1210.4.patch > > > When RM recovers, it can wait for existing AMs to contact RM back and then > kill them forcefully before even starting a new AM. Worst case, RM will start > a new AppAttempt after waiting for 10 mins ( the expiry interval). This way > we'll minimize multiple AMs racing with each other. This can help issues with > downstream components like Pig, Hive and Oozie during RM restart. > In the mean while, new apps will proceed as usual as existing apps wait for > recovery. > This can continue to be useful after work-preserving restart, so that AMs > which can properly sync back up with RM can continue to run and those that > don't are guaranteed to be killed before starting a new attempt. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real
[ https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1210: Attachment: YARN-1210.4.patch > During RM restart, RM should start a new attempt only when previous attempt > exits for real > -- > > Key: YARN-1210 > URL: https://issues.apache.org/jira/browse/YARN-1210 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Omkar Vinit Joshi > Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch, > YARN-1210.4.patch, YARN-1210.4.patch > > > When RM recovers, it can wait for existing AMs to contact RM back and then > kill them forcefully before even starting a new AM. Worst case, RM will start > a new AppAttempt after waiting for 10 mins ( the expiry interval). This way > we'll minimize multiple AMs racing with each other. This can help issues with > downstream components like Pig, Hive and Oozie during RM restart. > In the mean while, new apps will proceed as usual as existing apps wait for > recovery. > This can continue to be useful after work-preserving restart, so that AMs > which can properly sync back up with RM can continue to run and those that > don't are guaranteed to be killed before starting a new attempt. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real
[ https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817964#comment-13817964 ] Hadoop QA commented on YARN-1210: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612955/YARN-1210.4.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2398//console This message is automatically generated. > During RM restart, RM should start a new attempt only when previous attempt > exits for real > -- > > Key: YARN-1210 > URL: https://issues.apache.org/jira/browse/YARN-1210 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Omkar Vinit Joshi > Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch, > YARN-1210.4.patch > > > When RM recovers, it can wait for existing AMs to contact RM back and then > kill them forcefully before even starting a new AM. Worst case, RM will start > a new AppAttempt after waiting for 10 mins ( the expiry interval). This way > we'll minimize multiple AMs racing with each other. This can help issues with > downstream components like Pig, Hive and Oozie during RM restart. > In the mean while, new apps will proceed as usual as existing apps wait for > recovery. > This can continue to be useful after work-preserving restart, so that AMs > which can properly sync back up with RM can continue to run and those that > don't are guaranteed to be killed before starting a new attempt. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real
[ https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1210: Attachment: YARN-1210.4.patch > During RM restart, RM should start a new attempt only when previous attempt > exits for real > -- > > Key: YARN-1210 > URL: https://issues.apache.org/jira/browse/YARN-1210 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Omkar Vinit Joshi > Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch, > YARN-1210.4.patch > > > When RM recovers, it can wait for existing AMs to contact RM back and then > kill them forcefully before even starting a new AM. Worst case, RM will start > a new AppAttempt after waiting for 10 mins ( the expiry interval). This way > we'll minimize multiple AMs racing with each other. This can help issues with > downstream components like Pig, Hive and Oozie during RM restart. > In the mean while, new apps will proceed as usual as existing apps wait for > recovery. > This can continue to be useful after work-preserving restart, so that AMs > which can properly sync back up with RM can continue to run and those that > don't are guaranteed to be killed before starting a new attempt. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real
[ https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817941#comment-13817941 ] Omkar Vinit Joshi commented on YARN-1210: - Attaching rebased patch. I slightly modified the logic for RMRestart app recovery code. * If application doesn't have any attempt then it will start new attempt when we do submitApplication as a part of recovery. * If application has 1 more application attempts then the attempt recovery will take place in 2 steps. ** All the application attempts except the last attempt will be recovered first. ** When we do submitApplication as a part of application recovery we will replay the last attempt. *** If last attempt doesn't have any finalRecoveredState stored then it will be considered as the one for which AM may or may not have been started/finished. So we will move this application attempt into LAUNCHED state, add it to AMLivenessMonitor and move application to RUNNING state. *** If last attempt was in either FAILED/KILLED/FINISHED state then we will replay that attempt's BaseFinalTransition by recovering attempt synchronously here. Adding test to cover below scenarios * New application attempt is not started until previous AM container finish event is reported back to RM as a part of nm registration. * If previous AM container finish event is never reported back (i.e. node manager on which this AM container was running also went down) in that case AMLivenessMonitor should time out previous attempt and start new attempt. * If all the stored attempts had finished then new attempt should be started immediately. > During RM restart, RM should start a new attempt only when previous attempt > exits for real > -- > > Key: YARN-1210 > URL: https://issues.apache.org/jira/browse/YARN-1210 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Omkar Vinit Joshi > Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch > > > When RM recovers, it can wait for existing AMs to contact RM back and then > kill them forcefully before even starting a new AM. Worst case, RM will start > a new AppAttempt after waiting for 10 mins ( the expiry interval). This way > we'll minimize multiple AMs racing with each other. This can help issues with > downstream components like Pig, Hive and Oozie during RM restart. > In the mean while, new apps will proceed as usual as existing apps wait for > recovery. > This can continue to be useful after work-preserving restart, so that AMs > which can properly sync back up with RM can continue to run and those that > don't are guaranteed to be killed before starting a new attempt. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh
[ https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817931#comment-13817931 ] Sandy Ryza commented on YARN-584: - I applied the patch and tested it as well and it appears to work well. One thing I noticed - if I expand a parent queue and subqueue, and then close the parent queue, but not the subqueue, they both appear as open when I refresh the page. Would this be difficult to fix? > In fair scheduler web UI, queues unexpand on refresh > > > Key: YARN-584 > URL: https://issues.apache.org/jira/browse/YARN-584 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza > Labels: newbie > Attachments: YARN-584-branch-2.2.0.patch > > > In the fair scheduler web UI, you can expand queue information. Refreshing > the page causes the expansions to go away, which is annoying for someone who > wants to monitor the scheduler page and needs to reopen all the queues they > care about each time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh
[ https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817928#comment-13817928 ] Sandy Ryza commented on YARN-584: - bq. One question can't we put this in some already existing common class ? If you know let me know else will try to find, if not able to get any class with such common usage then will go on with adding SchedulerPageUtil class. I looked and couldn't find an existing class where this would fit well. I think adding a new one is fine. bq. I added my patch to checked out trunk using patch -p0 < YARN-584-branch-2.2.0.patch ( at root folder ) How up to date is the version of trunk you checked out? > In fair scheduler web UI, queues unexpand on refresh > > > Key: YARN-584 > URL: https://issues.apache.org/jira/browse/YARN-584 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza > Labels: newbie > Attachments: YARN-584-branch-2.2.0.patch > > > In the fair scheduler web UI, you can expand queue information. Refreshing > the page causes the expansions to go away, which is annoying for someone who > wants to monitor the scheduler page and needs to reopen all the queues they > care about each time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing
[ https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817915#comment-13817915 ] Hudson commented on YARN-1121: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4707 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4707/]) YARN-1121. Changed ResourceManager's state-store to drain all events on shut-down. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1540232) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java > RMStateStore should flush all pending store events before closing > - > > Key: YARN-1121 > URL: https://issues.apache.org/jira/browse/YARN-1121 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Jian He > Fix For: 2.3.0 > > Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, > YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, > YARN-1121.6.patch, YARN-1121.7.patch > > > on serviceStop it should wait for all internal pending events to drain before > stopping. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing
[ https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817908#comment-13817908 ] Vinod Kumar Vavilapalli commented on YARN-1121: --- Looks good overall. Checking this in. One thing of note is that you are removing locking for service-life-cycle methods in RMStateStore. I verified that it seems fine - events coming in during serviceStop are ignored due to draining and other blocking calls are okay to happen. > RMStateStore should flush all pending store events before closing > - > > Key: YARN-1121 > URL: https://issues.apache.org/jira/browse/YARN-1121 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Jian He > Fix For: 2.2.1 > > Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, > YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, > YARN-1121.6.patch, YARN-1121.7.patch > > > on serviceStop it should wait for all internal pending events to drain before > stopping. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh
[ https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817871#comment-13817871 ] Harshit Daga commented on YARN-584: --- 1. Sandy logic is same for both the schedulers , will add a new class as you suggested to prevent code duplication. One question can't we put this in some already existing common class ? If you know let me know else will try to find, if not able to get any class with such common usage then will go on with adding SchedulerPageUtil class. With respect to coding pattern will do the changes and update the patch. 2. Steps taken to manually verify patch 1. Tested the patch on my machine with queues and sub queues example : a. root -> [root.queue1 and root.queue2] b. root->[ root.queue1->[root.queue1.queuea , root.queue1.queueb] and root.queue2] Test Cases i. First time page is loading properly : success. ii. Open (Expand) a queue and reload the page , the queue that was opened earlier should be open after page is reloaded : success. iii. Expand (Open) and Unexpand (Close) few queue's and reload the page again, the queue which were unexpanded before page reload should be unexpanded even after reload and the one which were in expanded state should be in expanded state ie. page should look the same even after reload with respect to queues open/close state. : success. Tested in chrome (version 30.0.1599.101), Safari(Version 6.1 (7537.71)) and Firefox(version 20.0) System : Mac OS X (version 10.7.5) 3. For "The patch appears to cause the build to fail." I added my patch to checked out trunk using patch -p0 < YARN-584-branch-2.2.0.patch ( at root folder ) and then build it using mvn install -DskipTests and I got a BUILD SUCCESS can you why log is showing build fail ? > In fair scheduler web UI, queues unexpand on refresh > > > Key: YARN-584 > URL: https://issues.apache.org/jira/browse/YARN-584 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza > Labels: newbie > Attachments: YARN-584-branch-2.2.0.patch > > > In the fair scheduler web UI, you can expand queue information. Refreshing > the page causes the expansions to go away, which is annoying for someone who > wants to monitor the scheduler page and needs to reopen all the queues they > care about each time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete
[ https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817831#comment-13817831 ] Hadoop QA commented on YARN-1279: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612890/YARN-1279.9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.v2.TestUberAM {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2397//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2397//console This message is automatically generated. > Expose a client API to allow clients to figure if log aggregation is complete > - > > Key: YARN-1279 > URL: https://issues.apache.org/jira/browse/YARN-1279 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Xuan Gong > Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, > YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, > YARN-1279.5.patch, YARN-1279.6.patch, YARN-1279.7.patch, YARN-1279.8.patch, > YARN-1279.8.patch, YARN-1279.9.patch > > > Expose a client API to allow clients to figure if log aggregation is complete -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete
[ https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817797#comment-13817797 ] Jian He commented on YARN-1279: --- - updateLogAggregationStatus, doesn't need write-lock protection, state-machine has write-lock protection already. - LOG_AGGREGATION_WATTING_MS, unit can be seconds instead of millisecond, like LOG_AGGREGATION_WAIT_SECONDS - LogAggregationState.COMPLETED rename to FINISHED ? - why RMApp Failed state doesn't receive LOG_AGGREGATION_STATUS_UPDATE event ? - Use the wrong configure value. {code} this.logAggregationTimeOut = YarnConfiguration.DEFAULT_LOG_AGGREGATION_RETAIN_CHECK_INTERVAL_SECONDS; {code} - I think better unit tests would be using MockRM to submit a job and finish that job. Use MockNM.nodeHeartBeat() method, inside which customize NodeStatus with ApplicationLogAggregationStatus, and call that method to interact with RM. And also use ClientRMService.getApplicationReport to assert the expected logAggregationState. In this way, we cover the whole picture including NM side changes and client side changes. You can see example from TestRMRestart.testRMRestartSucceededApp. Given YARN-1376 is not that big, we can incorporate that into this patch also. > Expose a client API to allow clients to figure if log aggregation is complete > - > > Key: YARN-1279 > URL: https://issues.apache.org/jira/browse/YARN-1279 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Xuan Gong > Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, > YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, > YARN-1279.5.patch, YARN-1279.6.patch, YARN-1279.7.patch, YARN-1279.8.patch, > YARN-1279.8.patch, YARN-1279.9.patch > > > Expose a client API to allow clients to figure if log aggregation is complete -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete
[ https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817692#comment-13817692 ] Xuan Gong commented on YARN-1279: - bq. this code logic can be simplified to say, if exceeds timeout period return FAILED or Timeout, otherwise return In_Progress. And so we can remove the logAggregationTimeOutDisabled boolean. Make senses. Delete logAggregationTimeOutDisabled boolean. If the clients set logAggregationTimeOut value as negative number, will use default value instead > Expose a client API to allow clients to figure if log aggregation is complete > - > > Key: YARN-1279 > URL: https://issues.apache.org/jira/browse/YARN-1279 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Xuan Gong > Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, > YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, > YARN-1279.5.patch, YARN-1279.6.patch, YARN-1279.7.patch, YARN-1279.8.patch, > YARN-1279.8.patch, YARN-1279.9.patch > > > Expose a client API to allow clients to figure if log aggregation is complete -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete
[ https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1279: Attachment: YARN-1279.9.patch > Expose a client API to allow clients to figure if log aggregation is complete > - > > Key: YARN-1279 > URL: https://issues.apache.org/jira/browse/YARN-1279 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Xuan Gong > Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, > YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, > YARN-1279.5.patch, YARN-1279.6.patch, YARN-1279.7.patch, YARN-1279.8.patch, > YARN-1279.8.patch, YARN-1279.9.patch > > > Expose a client API to allow clients to figure if log aggregation is complete -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1390) Add applicationSource to ApplicationSubmissionContext and RMApp
[ https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817586#comment-13817586 ] Robert Kanter commented on YARN-1390: - Ultimately, what we want is a way to "tag" jobs in some way with the Oozie action ID so that we can find them in the case where the AM launcher job fails but the action's AM does not, in order to properly handle that situation. (It would also allow us to finally add a long-requested feature of Oozie being able to actually kill running actions instead of letting them finish.) The idea of having an "applicationSource" or multiple applicationTypes was to make this more generic than an "oozieActionID" field so other projects could use this feature for their own purposes as well. > Add applicationSource to ApplicationSubmissionContext and RMApp > --- > > Key: YARN-1390 > URL: https://issues.apache.org/jira/browse/YARN-1390 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > In addition to other fields like application-type (added in YARN-563), it is > useful to have an applicationSource field to track the source of an > application. The application source can be useful in (1) fetching only those > applications a user is interested in, (2) potentially adding source-specific > optimizations in the future. > Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop > etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing
[ https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817560#comment-13817560 ] Bikas Saha commented on YARN-1222: -- Quick comments 1) The new event is not following the convention we have for events. Events are grouped by the destination of the events ie the handler. So all RMStateStoreEvent are handled by the state store. We have now a new class of event that are handled by the ResourceManager. So we should not overload the RMStateStoreEvents. Lets create a new type that is handled by the new handler in the ResourceManager. When HA is enabled then on exception we should transitionToStandby() but not exit. When HA is not enabled then we should die like we currently do. 2) I dont quite get why the ResourceManager would send a failed_store event back to the store who had sent it to the RM in the first place. From 1) above RM should either transitionToStandby or die when it gets that event. > Make improvements in ZKRMStateStore for fencing > --- > > Key: YARN-1222 > URL: https://issues.apache.org/jira/browse/YARN-1222 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, > yarn-1222-4.patch, yarn-1222-5.patch, yarn-1222-6.patch > > > Using multi-operations for every ZK interaction. > In every operation, automatically creating/deleting a lock znode that is the > child of the root znode. This is to achieve fencing by modifying the > create/delete permissions on the root znode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1390) Add applicationSource to ApplicationSubmissionContext and RMApp
[ https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817535#comment-13817535 ] Vinod Kumar Vavilapalli commented on YARN-1390: --- Don't see the point of multiple appTypes. A Pig query is a pig query and jobs spawned for a Pig query should be of type Pig. I think the problem is that MR hardcodes the app-type. If we change that to be pluggable, then it should be enough for you? Please change the title to reflect your requirement and not the solution. Tx. > Add applicationSource to ApplicationSubmissionContext and RMApp > --- > > Key: YARN-1390 > URL: https://issues.apache.org/jira/browse/YARN-1390 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > In addition to other fields like application-type (added in YARN-563), it is > useful to have an applicationSource field to track the source of an > application. The application source can be useful in (1) fetching only those > applications a user is interested in, (2) potentially adding source-specific > optimizations in the future. > Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop > etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1023) [YARN-321] Webservices REST API's support for Application History
[ https://issues.apache.org/jira/browse/YARN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817526#comment-13817526 ] Vinod Kumar Vavilapalli commented on YARN-1023: --- Had a quick look through the patch. I don't see a reason why AHS and RM cannot share the same web-service code (for the most part). > [YARN-321] Webservices REST API's support for Application History > - > > Key: YARN-1023 > URL: https://issues.apache.org/jira/browse/YARN-1023 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-321 >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: YARN-1023-v0.patch, YARN-1023-v1.patch > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-974) RMContainer should collect more useful information to be recorded in Application-History
[ https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817523#comment-13817523 ] Hadoop QA commented on YARN-974: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612858/YARN-974.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2396//console This message is automatically generated. > RMContainer should collect more useful information to be recorded in > Application-History > > > Key: YARN-974 > URL: https://issues.apache.org/jira/browse/YARN-974 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-974.1.patch, YARN-974.2.patch, YARN-974.3.patch, > YARN-974.4.patch > > > To record the history of a container, users may be also interested in the > following information: > 1. Start Time > 2. Stop Time > 3. Diagnostic Information > 4. URL to the Log File > 5. Actually Allocated Resource > 6. Actually Assigned Node > These should be remembered during the RMContainer's life cycle. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1393) Add how-to-use instruction in README for Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817516#comment-13817516 ] Wei Yan commented on YARN-1393: --- Only update the README file. Manually check the instruction steps locally. > Add how-to-use instruction in README for Yarn Scheduler Load Simulator > -- > > Key: YARN-1393 > URL: https://issues.apache.org/jira/browse/YARN-1393 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1393.patch > > > The instructions are put in the .pdf document and site page. The README needs > to include a simple instruction for users to quickly pick up. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1393) Add how-to-use instruction in README for Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817515#comment-13817515 ] Hadoop QA commented on YARN-1393: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612854/YARN-1393.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2395//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2395//console This message is automatically generated. > Add how-to-use instruction in README for Yarn Scheduler Load Simulator > -- > > Key: YARN-1393 > URL: https://issues.apache.org/jira/browse/YARN-1393 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1393.patch > > > The instructions are put in the .pdf document and site page. The README needs > to include a simple instruction for users to quickly pick up. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-974) RMContainer should collect more useful information to be recorded in Application-History
[ https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-974: - Attachment: YARN-974.4.patch Did minor update: adding readLock for getLogURL as well. > RMContainer should collect more useful information to be recorded in > Application-History > > > Key: YARN-974 > URL: https://issues.apache.org/jira/browse/YARN-974 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-974.1.patch, YARN-974.2.patch, YARN-974.3.patch, > YARN-974.4.patch > > > To record the history of a container, users may be also interested in the > following information: > 1. Start Time > 2. Stop Time > 3. Diagnostic Information > 4. URL to the Log File > 5. Actually Allocated Resource > 6. Actually Assigned Node > These should be remembered during the RMContainer's life cycle. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1393) Add how-to-use instruction in README for Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1393: -- Attachment: YARN-1393.patch > Add how-to-use instruction in README for Yarn Scheduler Load Simulator > -- > > Key: YARN-1393 > URL: https://issues.apache.org/jira/browse/YARN-1393 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1393.patch > > > The instructions are put in the .pdf document and site page. The README needs > to include a simple instruction for users to quickly pick up. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1222) Make improvements in ZKRMStateStore for fencing
[ https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1222: --- Attachment: yarn-1222-6.patch Here is an updated patch that: # Creates a new event type for failed store operations. # RMDispatcher handles these failed-store-op-events - transitions to standby on fenced exception; shuts the RM down otherwise # Mark VisibleForTesting methods in ZKRMStateStore @Private @Unstable Pending: # Documentation in yarn-default.xml # Manual testing on a real cluster # Create a JIRA to change RMStateStore#notifyDone* methods to not take an Exception [~bikassaha] - please take a look when you get a chance. I ll address any feedback in the next patch. Thanks. > Make improvements in ZKRMStateStore for fencing > --- > > Key: YARN-1222 > URL: https://issues.apache.org/jira/browse/YARN-1222 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, > yarn-1222-4.patch, yarn-1222-5.patch, yarn-1222-6.patch > > > Using multi-operations for every ZK interaction. > In every operation, automatically creating/deleting a lock znode that is the > child of the root znode. This is to achieve fencing by modifying the > create/delete permissions on the root znode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-896) Roll up for long-lived services in YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817148#comment-13817148 ] Steve Loughran commented on YARN-896: - Link to YARN-1394: RM to inform NMs when a container completed due to planned/unplanned NM outage > Roll up for long-lived services in YARN > --- > > Key: YARN-896 > URL: https://issues.apache.org/jira/browse/YARN-896 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Robert Joseph Evans > > YARN is intended to be general purpose, but it is missing some features to be > able to truly support long lived applications and long lived containers. > This ticket is intended to > # discuss what is needed to support long lived processes > # track the resulting JIRA. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1394) RM to inform AMs when a container completed due to NM going offline -planned or unplanned
Steve Loughran created YARN-1394: Summary: RM to inform AMs when a container completed due to NM going offline -planned or unplanned Key: YARN-1394 URL: https://issues.apache.org/jira/browse/YARN-1394 Project: Hadoop YARN Issue Type: Improvement Reporter: Steve Loughran YARN-914 proposes graceful decommission of an NM, and NMs already have the right to go offline. If AMs could be told that a container completed from an NM option -offline vs decommission, the AM could use that in its future blacklisting and placement policy. This matters in long-lived services which may like to place new instances where they were placed before, and track hosts failure rates -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817145#comment-13817145 ] Steve Loughran commented on YARN-914: - YARN-1394 adds the need for AMs to be told of NM failure/decommission as causes for container completion > Support graceful decommission of nodemanager > > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.1#6144)