date:20131108

[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2013-11-08 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817995#comment-13817995
 ] 

Sandy Ryza commented on YARN-807:
-

Build is failing because of the runaway process / javah problem seen in other 
JIRAs

> When querying apps by queue, iterating over all apps is inefficient and 
> limiting 
> -
>
> Key: YARN-807
> URL: https://issues.apache.org/jira/browse/YARN-807
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-807-1.patch, YARN-807.patch
>
>
> The question "which apps are in queue x" can be asked via the RM REST APIs, 
> through the ClientRMService, and through the command line.  In all these 
> cases, the question is answered by scanning through every RMApp and filtering 
> by the app's queue name.
> All schedulers maintain a mapping of queues to applications.  I think it 
> would make more sense to ask the schedulers which applications are in a given 
> queue. This is what was done in MR1. This would also have the advantage of 
> allowing a parent queue to return all the applications on leaf queues under 
> it, and allow queue name aliases, as in the way that "root.default" and 
> "default" refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2013-11-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817989#comment-13817989
 ] 

Hadoop QA commented on YARN-807:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612967/YARN-807-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2400//console

This message is automatically generated.

> When querying apps by queue, iterating over all apps is inefficient and 
> limiting 
> -
>
> Key: YARN-807
> URL: https://issues.apache.org/jira/browse/YARN-807
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-807-1.patch, YARN-807.patch
>
>
> The question "which apps are in queue x" can be asked via the RM REST APIs, 
> through the ClientRMService, and through the command line.  In all these 
> cases, the question is answered by scanning through every RMApp and filtering 
> by the app's queue name.
> All schedulers maintain a mapping of queues to applications.  I think it 
> would make more sense to ask the schedulers which applications are in a given 
> queue. This is what was done in MR1. This would also have the advantage of 
> allowing a parent queue to return all the applications on leaf queues under 
> it, and allow queue name aliases, as in the way that "root.default" and 
> "default" refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2013-11-08 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817985#comment-13817985
 ] 

Sandy Ryza commented on YARN-807:
-

Attaching a rebased patch.  It also modifies getAppsInQueue to only return IDs, 
not the full scheduler application. I'd prefer not to have this depend on 
YARN-1317.  If they're both ready at close to the same time, I'm happy to have 
YARN-1317 go in first and do the work of rebasing this on top of it.

> When querying apps by queue, iterating over all apps is inefficient and 
> limiting 
> -
>
> Key: YARN-807
> URL: https://issues.apache.org/jira/browse/YARN-807
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-807-1.patch, YARN-807.patch
>
>
> The question "which apps are in queue x" can be asked via the RM REST APIs, 
> through the ClientRMService, and through the command line.  In all these 
> cases, the question is answered by scanning through every RMApp and filtering 
> by the app's queue name.
> All schedulers maintain a mapping of queues to applications.  I think it 
> would make more sense to ask the schedulers which applications are in a given 
> queue. This is what was done in MR1. This would also have the advantage of 
> allowing a parent queue to return all the applications on leaf queues under 
> it, and allow queue name aliases, as in the way that "root.default" and 
> "default" refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2013-11-08 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-807:


Attachment: YARN-807-1.patch

> When querying apps by queue, iterating over all apps is inefficient and 
> limiting 
> -
>
> Key: YARN-807
> URL: https://issues.apache.org/jira/browse/YARN-807
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-807-1.patch, YARN-807.patch
>
>
> The question "which apps are in queue x" can be asked via the RM REST APIs, 
> through the ClientRMService, and through the command line.  In all these 
> cases, the question is answered by scanning through every RMApp and filtering 
> by the app's queue name.
> All schedulers maintain a mapping of queues to applications.  I think it 
> would make more sense to ask the schedulers which applications are in a given 
> queue. This is what was done in MR1. This would also have the advantage of 
> allowing a parent queue to return all the applications on leaf queues under 
> it, and allow queue name aliases, as in the way that "root.default" and 
> "default" refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

2013-11-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817975#comment-13817975
 ] 

Hadoop QA commented on YARN-1210:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612966/YARN-1210.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2399//console

This message is automatically generated.

> During RM restart, RM should start a new attempt only when previous attempt 
> exits for real
> --
>
> Key: YARN-1210
> URL: https://issues.apache.org/jira/browse/YARN-1210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch, 
> YARN-1210.4.patch, YARN-1210.4.patch
>
>
> When RM recovers, it can wait for existing AMs to contact RM back and then 
> kill them forcefully before even starting a new AM. Worst case, RM will start 
> a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
> we'll minimize multiple AMs racing with each other. This can help issues with 
> downstream components like Pig, Hive and Oozie during RM restart.
> In the mean while, new apps will proceed as usual as existing apps wait for 
> recovery.
> This can continue to be useful after work-preserving restart, so that AMs 
> which can properly sync back up with RM can continue to run and those that 
> don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

2013-11-08 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1210:


Attachment: YARN-1210.4.patch

> During RM restart, RM should start a new attempt only when previous attempt 
> exits for real
> --
>
> Key: YARN-1210
> URL: https://issues.apache.org/jira/browse/YARN-1210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch, 
> YARN-1210.4.patch, YARN-1210.4.patch
>
>
> When RM recovers, it can wait for existing AMs to contact RM back and then 
> kill them forcefully before even starting a new AM. Worst case, RM will start 
> a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
> we'll minimize multiple AMs racing with each other. This can help issues with 
> downstream components like Pig, Hive and Oozie during RM restart.
> In the mean while, new apps will proceed as usual as existing apps wait for 
> recovery.
> This can continue to be useful after work-preserving restart, so that AMs 
> which can properly sync back up with RM can continue to run and those that 
> don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

2013-11-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817964#comment-13817964
 ] 

Hadoop QA commented on YARN-1210:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612955/YARN-1210.4.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2398//console

This message is automatically generated.

> During RM restart, RM should start a new attempt only when previous attempt 
> exits for real
> --
>
> Key: YARN-1210
> URL: https://issues.apache.org/jira/browse/YARN-1210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch, 
> YARN-1210.4.patch
>
>
> When RM recovers, it can wait for existing AMs to contact RM back and then 
> kill them forcefully before even starting a new AM. Worst case, RM will start 
> a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
> we'll minimize multiple AMs racing with each other. This can help issues with 
> downstream components like Pig, Hive and Oozie during RM restart.
> In the mean while, new apps will proceed as usual as existing apps wait for 
> recovery.
> This can continue to be useful after work-preserving restart, so that AMs 
> which can properly sync back up with RM can continue to run and those that 
> don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

2013-11-08 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1210:


Attachment: YARN-1210.4.patch

> During RM restart, RM should start a new attempt only when previous attempt 
> exits for real
> --
>
> Key: YARN-1210
> URL: https://issues.apache.org/jira/browse/YARN-1210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch, 
> YARN-1210.4.patch
>
>
> When RM recovers, it can wait for existing AMs to contact RM back and then 
> kill them forcefully before even starting a new AM. Worst case, RM will start 
> a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
> we'll minimize multiple AMs racing with each other. This can help issues with 
> downstream components like Pig, Hive and Oozie during RM restart.
> In the mean while, new apps will proceed as usual as existing apps wait for 
> recovery.
> This can continue to be useful after work-preserving restart, so that AMs 
> which can properly sync back up with RM can continue to run and those that 
> don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

2013-11-08 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817941#comment-13817941
 ] 

Omkar Vinit Joshi commented on YARN-1210:
-

Attaching rebased patch.
I slightly modified the logic for RMRestart app recovery code.
* If application doesn't have any attempt then it will start new attempt when 
we do submitApplication as a part of recovery.
* If application has 1 more application attempts then the attempt recovery will 
take place in 2 steps.
** All the application attempts except the last attempt will be recovered first.
** When we do submitApplication as a part of application recovery we will 
replay the last attempt.
*** If last attempt doesn't have any finalRecoveredState stored then it will be 
considered as the one for which AM may or may not have been started/finished. 
So we will move this application attempt into LAUNCHED state, add it to 
AMLivenessMonitor and move application to RUNNING state.
*** If last attempt was in either FAILED/KILLED/FINISHED state then we will 
replay that attempt's BaseFinalTransition by recovering attempt synchronously 
here.

Adding test to cover below scenarios
* New application attempt is not started until previous AM container finish 
event is reported back to RM as a part of nm registration.
* If previous AM container finish event is never reported back (i.e. node 
manager on which this AM container was running also went down) in that case 
AMLivenessMonitor should time out previous attempt and start new attempt.
* If all the stored attempts had finished then new attempt should be started 
immediately.

> During RM restart, RM should start a new attempt only when previous attempt 
> exits for real
> --
>
> Key: YARN-1210
> URL: https://issues.apache.org/jira/browse/YARN-1210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch
>
>
> When RM recovers, it can wait for existing AMs to contact RM back and then 
> kill them forcefully before even starting a new AM. Worst case, RM will start 
> a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
> we'll minimize multiple AMs racing with each other. This can help issues with 
> downstream components like Pig, Hive and Oozie during RM restart.
> In the mean while, new apps will proceed as usual as existing apps wait for 
> recovery.
> This can continue to be useful after work-preserving restart, so that AMs 
> which can properly sync back up with RM can continue to run and those that 
> don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh

2013-11-08 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817931#comment-13817931
 ] 

Sandy Ryza commented on YARN-584:
-

I applied the patch and tested it as well and it appears to work well.  One 
thing I noticed - if I expand a parent queue and subqueue, and then close the 
parent queue, but not the subqueue, they both appear as open when I refresh the 
page.  Would this be difficult to fix?

> In fair scheduler web UI, queues unexpand on refresh
> 
>
> Key: YARN-584
> URL: https://issues.apache.org/jira/browse/YARN-584
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>  Labels: newbie
> Attachments: YARN-584-branch-2.2.0.patch
>
>
> In the fair scheduler web UI, you can expand queue information.  Refreshing 
> the page causes the expansions to go away, which is annoying for someone who 
> wants to monitor the scheduler page and needs to reopen all the queues they 
> care about each time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh

2013-11-08 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817928#comment-13817928
 ] 

Sandy Ryza commented on YARN-584:
-

bq. One question can't we put this in some already existing common class ? If 
you know let me know else will try to find, if not able to get any class with 
such common usage then will go on with adding SchedulerPageUtil class.
I looked and couldn't find an existing class where this would fit well.  I 
think adding a new one is fine.

bq. I added my patch to checked out trunk using patch -p0 < 
YARN-584-branch-2.2.0.patch ( at root folder )
How up to date is the version of trunk you checked out?

> In fair scheduler web UI, queues unexpand on refresh
> 
>
> Key: YARN-584
> URL: https://issues.apache.org/jira/browse/YARN-584
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>  Labels: newbie
> Attachments: YARN-584-branch-2.2.0.patch
>
>
> In the fair scheduler web UI, you can expand queue information.  Refreshing 
> the page causes the expansions to go away, which is annoying for someone who 
> wants to monitor the scheduler page and needs to reopen all the queues they 
> care about each time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-11-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817915#comment-13817915
 ] 

Hudson commented on YARN-1121:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4707 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4707/])
YARN-1121. Changed ResourceManager's state-store to drain all events on 
shut-down. Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1540232)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.3.0
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, 
> YARN-1121.6.patch, YARN-1121.7.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-11-08 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817908#comment-13817908
 ] 

Vinod Kumar Vavilapalli commented on YARN-1121:
---

Looks good overall. Checking this in.

One thing of note is that you are removing locking for service-life-cycle 
methods in RMStateStore. I verified that it seems fine - events coming in 
during serviceStop are ignored due to draining and other blocking calls are 
okay to happen.

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.2.1
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, 
> YARN-1121.6.patch, YARN-1121.7.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh

2013-11-08 Thread Harshit Daga (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817871#comment-13817871
 ] 

Harshit Daga commented on YARN-584:
---

1. Sandy logic is same for both the schedulers , will add a new class as you 
suggested to prevent code duplication.

One question can't we put this in some already existing common class ? If you 
know let me know else will try to find, if not able to get any class with such 
common usage then will go on with adding SchedulerPageUtil class.
  
With respect to coding pattern will do the changes and update the patch.

2. Steps taken to manually verify patch
   1. Tested the patch on my machine with queues and sub queues 
example : 
a. root -> [root.queue1 and root.queue2]
b. root->[ root.queue1->[root.queue1.queuea , root.queue1.queueb] and 
root.queue2]

   Test Cases
i. First time page is loading properly : success.
ii. Open (Expand) a queue and reload the page , the queue that was 
opened earlier should be open after page is reloaded : success.
iii. Expand (Open) and Unexpand (Close) few queue's and reload the page 
again, the queue which were unexpanded before page reload should be unexpanded 
even after reload and the one which were in expanded state should be in 
expanded state ie. page should look the same even after reload with respect to 
queues open/close state. : success.

Tested in chrome (version 30.0.1599.101), Safari(Version 6.1 (7537.71)) 
and Firefox(version 20.0)
System : Mac OS X (version 10.7.5)

3. For "The patch appears to cause the build to fail." 
I added my patch to checked out trunk using patch -p0 < 
YARN-584-branch-2.2.0.patch ( at root folder )
and then build it using  mvn install -DskipTests  and I got a  BUILD SUCCESS 
can you why log is showing build fail ?


> In fair scheduler web UI, queues unexpand on refresh
> 
>
> Key: YARN-584
> URL: https://issues.apache.org/jira/browse/YARN-584
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>  Labels: newbie
> Attachments: YARN-584-branch-2.2.0.patch
>
>
> In the fair scheduler web UI, you can expand queue information.  Refreshing 
> the page causes the expansions to go away, which is annoying for someone who 
> wants to monitor the scheduler page and needs to reopen all the queues they 
> care about each time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2013-11-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817831#comment-13817831
 ] 

Hadoop QA commented on YARN-1279:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612890/YARN-1279.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.mapreduce.v2.TestUberAM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2397//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2397//console

This message is automatically generated.

> Expose a client API to allow clients to figure if log aggregation is complete
> -
>
> Key: YARN-1279
> URL: https://issues.apache.org/jira/browse/YARN-1279
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Xuan Gong
> Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, 
> YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, 
> YARN-1279.5.patch, YARN-1279.6.patch, YARN-1279.7.patch, YARN-1279.8.patch, 
> YARN-1279.8.patch, YARN-1279.9.patch
>
>
> Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2013-11-08 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817797#comment-13817797
 ] 

Jian He commented on YARN-1279:
---

- updateLogAggregationStatus, doesn't need write-lock protection, state-machine 
has write-lock protection already.
- LOG_AGGREGATION_WATTING_MS, unit can be seconds instead of millisecond, like 
LOG_AGGREGATION_WAIT_SECONDS 
- LogAggregationState.COMPLETED rename to FINISHED ?
-  why RMApp Failed state doesn't receive LOG_AGGREGATION_STATUS_UPDATE event ?
- Use the wrong configure value.
{code}
  this.logAggregationTimeOut =
  
YarnConfiguration.DEFAULT_LOG_AGGREGATION_RETAIN_CHECK_INTERVAL_SECONDS;
{code}
- I think better unit tests would be using MockRM to submit a job and finish 
that job. Use MockNM.nodeHeartBeat() method, inside which customize NodeStatus 
with ApplicationLogAggregationStatus, and call that method to interact with RM. 
 And also use ClientRMService.getApplicationReport to assert the expected 
logAggregationState.
 In this way, we cover the whole picture including NM side changes and client 
side changes.  You can see example from TestRMRestart.testRMRestartSucceededApp.
Given YARN-1376 is not that big, we can incorporate that into this patch also.

> Expose a client API to allow clients to figure if log aggregation is complete
> -
>
> Key: YARN-1279
> URL: https://issues.apache.org/jira/browse/YARN-1279
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Xuan Gong
> Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, 
> YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, 
> YARN-1279.5.patch, YARN-1279.6.patch, YARN-1279.7.patch, YARN-1279.8.patch, 
> YARN-1279.8.patch, YARN-1279.9.patch
>
>
> Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2013-11-08 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817692#comment-13817692
 ] 

Xuan Gong commented on YARN-1279:
-

bq. this code logic can be simplified to say, if exceeds timeout period return 
FAILED or Timeout, otherwise return In_Progress. And so we can remove the 
logAggregationTimeOutDisabled boolean.

Make senses. Delete logAggregationTimeOutDisabled boolean. If the clients set 
logAggregationTimeOut value as negative number, will use default value instead

> Expose a client API to allow clients to figure if log aggregation is complete
> -
>
> Key: YARN-1279
> URL: https://issues.apache.org/jira/browse/YARN-1279
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Xuan Gong
> Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, 
> YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, 
> YARN-1279.5.patch, YARN-1279.6.patch, YARN-1279.7.patch, YARN-1279.8.patch, 
> YARN-1279.8.patch, YARN-1279.9.patch
>
>
> Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2013-11-08 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1279:


Attachment: YARN-1279.9.patch

> Expose a client API to allow clients to figure if log aggregation is complete
> -
>
> Key: YARN-1279
> URL: https://issues.apache.org/jira/browse/YARN-1279
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Xuan Gong
> Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, 
> YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, 
> YARN-1279.5.patch, YARN-1279.6.patch, YARN-1279.7.patch, YARN-1279.8.patch, 
> YARN-1279.8.patch, YARN-1279.9.patch
>
>
> Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1390) Add applicationSource to ApplicationSubmissionContext and RMApp

2013-11-08 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817586#comment-13817586
 ] 

Robert Kanter commented on YARN-1390:
-

Ultimately, what we want is a way to "tag" jobs in some way with the Oozie 
action ID so that we can find them in the case where the AM launcher job fails 
but the action's AM does not, in order to properly handle that situation.  (It 
would also allow us to finally add a long-requested feature of Oozie being able 
to actually kill running actions instead of letting them finish.)

The idea of having an "applicationSource" or multiple applicationTypes was to 
make this more generic than an "oozieActionID" field so other projects could 
use this feature for their own purposes as well.  

> Add applicationSource to ApplicationSubmissionContext and RMApp
> ---
>
> Key: YARN-1390
> URL: https://issues.apache.org/jira/browse/YARN-1390
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>
> In addition to other fields like application-type (added in YARN-563), it is 
> useful to have an applicationSource field to track the source of an 
> application. The application source can be useful in (1) fetching only those 
> applications a user is interested in, (2) potentially adding source-specific 
> optimizations in the future. 
> Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop 
> etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-11-08 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817560#comment-13817560
 ] 

Bikas Saha commented on YARN-1222:
--

Quick comments
1) The new event is not following the convention we have for events. Events are 
grouped by the destination of the events ie the handler. So all 
RMStateStoreEvent are handled by the state store. We have now a new class of 
event that are handled by the ResourceManager. So we should not overload the 
RMStateStoreEvents. Lets create a new type that is handled by the new handler 
in the ResourceManager. When HA is enabled then on exception we should 
transitionToStandby() but not exit. When HA is not enabled then we should die 
like we currently do.

2) I dont quite get why the ResourceManager would send a failed_store event 
back to the store who had sent it to the RM in the first place. From 1) above 
RM should either transitionToStandby or die when it gets that event.

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch, yarn-1222-5.patch, yarn-1222-6.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1390) Add applicationSource to ApplicationSubmissionContext and RMApp

2013-11-08 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817535#comment-13817535
 ] 

Vinod Kumar Vavilapalli commented on YARN-1390:
---

Don't see the point of multiple appTypes.

A Pig query is a pig query and jobs spawned for a Pig query should be of type 
Pig. I think the problem is that MR hardcodes the app-type. If we change that 
to be pluggable, then it should be enough for you?

Please change the title to reflect your requirement and not the solution. Tx.

> Add applicationSource to ApplicationSubmissionContext and RMApp
> ---
>
> Key: YARN-1390
> URL: https://issues.apache.org/jira/browse/YARN-1390
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>
> In addition to other fields like application-type (added in YARN-563), it is 
> useful to have an applicationSource field to track the source of an 
> application. The application source can be useful in (1) fetching only those 
> applications a user is interested in, (2) potentially adding source-specific 
> optimizations in the future. 
> Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop 
> etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1023) [YARN-321] Webservices REST API's support for Application History

2013-11-08 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817526#comment-13817526
 ] 

Vinod Kumar Vavilapalli commented on YARN-1023:
---

Had a quick look through the patch. I don't see a reason why AHS and RM cannot 
share the same web-service code (for the most part).

> [YARN-321] Webservices REST API's support for Application History
> -
>
> Key: YARN-1023
> URL: https://issues.apache.org/jira/browse/YARN-1023
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-321
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-1023-v0.patch, YARN-1023-v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-974) RMContainer should collect more useful information to be recorded in Application-History

2013-11-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817523#comment-13817523
 ] 

Hadoop QA commented on YARN-974:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612858/YARN-974.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2396//console

This message is automatically generated.

> RMContainer should collect more useful information to be recorded in 
> Application-History
> 
>
> Key: YARN-974
> URL: https://issues.apache.org/jira/browse/YARN-974
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-974.1.patch, YARN-974.2.patch, YARN-974.3.patch, 
> YARN-974.4.patch
>
>
> To record the history of a container, users may be also interested in the 
> following information:
> 1. Start Time
> 2. Stop Time
> 3. Diagnostic Information
> 4. URL to the Log File
> 5. Actually Allocated Resource
> 6. Actually Assigned Node
> These should be remembered during the RMContainer's life cycle.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1393) Add how-to-use instruction in README for Yarn Scheduler Load Simulator

2013-11-08 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817516#comment-13817516
 ] 

Wei Yan commented on YARN-1393:
---

Only update the README file. Manually check the instruction steps locally.

> Add how-to-use instruction in README for Yarn Scheduler Load Simulator
> --
>
> Key: YARN-1393
> URL: https://issues.apache.org/jira/browse/YARN-1393
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1393.patch
>
>
> The instructions are put in the .pdf document and site page. The README needs 
> to include a simple instruction for users to quickly pick up.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1393) Add how-to-use instruction in README for Yarn Scheduler Load Simulator

2013-11-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817515#comment-13817515
 ] 

Hadoop QA commented on YARN-1393:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612854/YARN-1393.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2395//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2395//console

This message is automatically generated.

> Add how-to-use instruction in README for Yarn Scheduler Load Simulator
> --
>
> Key: YARN-1393
> URL: https://issues.apache.org/jira/browse/YARN-1393
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1393.patch
>
>
> The instructions are put in the .pdf document and site page. The README needs 
> to include a simple instruction for users to quickly pick up.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-974) RMContainer should collect more useful information to be recorded in Application-History

2013-11-08 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-974:
-

Attachment: YARN-974.4.patch

Did minor update: adding readLock for getLogURL as well.

> RMContainer should collect more useful information to be recorded in 
> Application-History
> 
>
> Key: YARN-974
> URL: https://issues.apache.org/jira/browse/YARN-974
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-974.1.patch, YARN-974.2.patch, YARN-974.3.patch, 
> YARN-974.4.patch
>
>
> To record the history of a container, users may be also interested in the 
> following information:
> 1. Start Time
> 2. Stop Time
> 3. Diagnostic Information
> 4. URL to the Log File
> 5. Actually Allocated Resource
> 6. Actually Assigned Node
> These should be remembered during the RMContainer's life cycle.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1393) Add how-to-use instruction in README for Yarn Scheduler Load Simulator

2013-11-08 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1393:
--

Attachment: YARN-1393.patch

> Add how-to-use instruction in README for Yarn Scheduler Load Simulator
> --
>
> Key: YARN-1393
> URL: https://issues.apache.org/jira/browse/YARN-1393
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-1393.patch
>
>
> The instructions are put in the .pdf document and site page. The README needs 
> to include a simple instruction for users to quickly pick up.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-11-08 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1222:
---

Attachment: yarn-1222-6.patch

Here is an updated patch that:
# Creates a new event type for failed store operations.
# RMDispatcher handles these failed-store-op-events - transitions to standby on 
fenced exception; shuts the RM down otherwise
# Mark VisibleForTesting methods in ZKRMStateStore @Private @Unstable

Pending:
# Documentation in yarn-default.xml
# Manual testing on a real cluster
# Create a JIRA to change RMStateStore#notifyDone* methods to not take an 
Exception

[~bikassaha] - please take a look when you get a chance. I ll address any 
feedback in the next patch. Thanks.

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch, yarn-1222-5.patch, yarn-1222-6.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-896) Roll up for long-lived services in YARN

2013-11-08 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817148#comment-13817148
 ] 

Steve Loughran commented on YARN-896:
-

Link to YARN-1394: RM to inform NMs when a container completed due to 
planned/unplanned NM outage

> Roll up for long-lived services in YARN
> ---
>
> Key: YARN-896
> URL: https://issues.apache.org/jira/browse/YARN-896
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Robert Joseph Evans
>
> YARN is intended to be general purpose, but it is missing some features to be 
> able to truly support long lived applications and long lived containers.
> This ticket is intended to
>  # discuss what is needed to support long lived processes
>  # track the resulting JIRA.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1394) RM to inform AMs when a container completed due to NM going offline -planned or unplanned

2013-11-08 Thread Steve Loughran (JIRA)

Steve Loughran created YARN-1394:


 Summary: RM to inform AMs when a container completed due to NM 
going offline -planned or unplanned
 Key: YARN-1394
 URL: https://issues.apache.org/jira/browse/YARN-1394
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Steve Loughran


YARN-914 proposes graceful decommission of an NM, and NMs already have the 
right to go offline.

If AMs could be told that a container completed from an NM option -offline vs 
decommission, the AM could use that in its future blacklisting and placement 
policy. 

This matters in long-lived services which may like to place new instances where 
they were placed before, and track hosts failure rates



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

2013-11-08 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817145#comment-13817145
 ] 

Steve Loughran commented on YARN-914:
-

YARN-1394 adds the need for AMs to be told of NM failure/decommission as causes 
for container completion

> Support graceful decommission of nodemanager
> 
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Luke Lu
>Assignee: Junping Du
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

[jira] [Updated] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

[jira] [Updated] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

[jira] [Updated] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh

[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing

[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh

[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

[jira] [Updated] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

[jira] [Commented] (YARN-1390) Add applicationSource to ApplicationSubmissionContext and RMApp

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

[jira] [Commented] (YARN-1390) Add applicationSource to ApplicationSubmissionContext and RMApp

[jira] [Commented] (YARN-1023) [YARN-321] Webservices REST API's support for Application History

[jira] [Commented] (YARN-974) RMContainer should collect more useful information to be recorded in Application-History

[jira] [Commented] (YARN-1393) Add how-to-use instruction in README for Yarn Scheduler Load Simulator

[jira] [Commented] (YARN-1393) Add how-to-use instruction in README for Yarn Scheduler Load Simulator

[jira] [Updated] (YARN-974) RMContainer should collect more useful information to be recorded in Application-History

[jira] [Updated] (YARN-1393) Add how-to-use instruction in README for Yarn Scheduler Load Simulator

[jira] [Updated] (YARN-1222) Make improvements in ZKRMStateStore for fencing

[jira] [Commented] (YARN-896) Roll up for long-lived services in YARN

[jira] [Created] (YARN-1394) RM to inform AMs when a container completed due to NM going offline -planned or unplanned

[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

31 matches

Site Navigation

Mail list logo

Footer information