date:20131105

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-11-05 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813735#comment-13813735
 ] 

Bikas Saha commented on YARN-1197:
--

Can we do with just change_succeeded and change_failed lists instead of 4 
lists. Using the containerId, the AM can determine which one was 
increase/decrease.
{noformat}
+messageChangeContainersResourceResponseProto   {   
+   repeatedContainerIdProto
succeed_increased_containers=   1;  
+   repeatedContainerIdProto
succeed_decreased_containers=   2;  
+   repeatedContainerIdProto
failed_increased_containers =   3;  
+   repeatedContainerIdProto
failed_decreased_containers =   4;  
+}
{noformat}

I dont think its correct for ResourceRequest to be used to increase resources 
for an allocated container. I was expecting a new optional repeated field of 
type ResourceChangeContextProto in AllocateRequest. For requesting increase in 
container C's resource, the AM will add a ResourceChangeContextProto for that 
container in the next AllocateRequest. 

In AllocateResponse, the type of increased container should be 
ResourceIncreaseContextProto, right? Without that the AM cannot get the new 
container token for that container.

The NM changes also need to handle enforcing the new resource via cgroups etc 
in addition to changing the monitoring. This needs to be clarified in the 
document.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197.pdf
>
>
> Currently, YARN cannot support merge several containers in one node to a big 
> container, which can make us incrementally ask resources, merge them to a 
> bigger one, and launch our processes. The user scenario is described in the 
> comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1307) Rethink znode structure for RM HA

2013-11-05 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1307:
-

Attachment: YARN-1307.4.patch

Rebased on trunk. Bikas, is the change you mentioned YARN-353?

> Rethink znode structure for RM HA
> -
>
> Key: YARN-1307
> URL: https://issues.apache.org/jira/browse/YARN-1307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1307.1.patch, YARN-1307.2.patch, YARN-1307.3.patch, 
> YARN-1307.4.patch
>
>
> Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
> YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
> YARN-1222:
> {quote}
> We should move to creating a node hierarchy for apps such that all znodes for 
> an app are stored under an app znode instead of the app root znode. This will 
> help in removeApplication and also in scaling better on ZK. The earlier code 
> was written this way to ensure create/delete happens under a root znode for 
> fencing. But given that we have moved to multi-operations globally, this isnt 
> required anymore.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-979) [YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol

2013-11-05 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813769#comment-13813769
 ] 

Mayank Bansal commented on YARN-979:


bq. I still have one question w.r.t. the annotations of the getter/setter of 
GetRequest/Response. Some of them are marked as @Stable, and some are 
marked as @Unstable. In addition, some setters are marked as @Private, and some 
are marked as @Public. Do you have special consideration here? Maybe we should 
mark all as @Unstable for the initial AHS?

Fixed the annotations

Thanks,
Mayank

> [YARN-321] Add more APIs related to ApplicationAttempt and Container in 
> ApplicationHistoryProtocol
> --
>
> Key: YARN-979
> URL: https://issues.apache.org/jira/browse/YARN-979
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-979-1.patch, YARN-979-3.patch, YARN-979-4.patch, 
> YARN-979-5.patch, YARN-979-6.patch, YARN-979.2.patch
>
>
> ApplicationHistoryProtocol should have the following APIs as well:
> * getApplicationAttemptReport
> * getApplicationAttempts
> * getContainerReport
> * getContainers
> The corresponding request and response classes need to be added as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-979) [YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol

2013-11-05 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-979:
---

Attachment: YARN-979-6.patch

Attaching the latest patch.

Thanks,
Mayank

> [YARN-321] Add more APIs related to ApplicationAttempt and Container in 
> ApplicationHistoryProtocol
> --
>
> Key: YARN-979
> URL: https://issues.apache.org/jira/browse/YARN-979
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-979-1.patch, YARN-979-3.patch, YARN-979-4.patch, 
> YARN-979-5.patch, YARN-979-6.patch, YARN-979.2.patch
>
>
> ApplicationHistoryProtocol should have the following APIs as well:
> * getApplicationAttemptReport
> * getApplicationAttempts
> * getContainerReport
> * getContainers
> The corresponding request and response classes need to be added as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-261) Ability to kill AM attempts

2013-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813780#comment-13813780
 ] 

Hadoop QA commented on YARN-261:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612119/YARN-261--n7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.mapreduce.v2.TestUberAM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2369//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2369//console

This message is automatically generated.

> Ability to kill AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: YARN-261--n2.patch, YARN-261--n3.patch, 
> YARN-261--n4.patch, YARN-261--n5.patch, YARN-261--n6.patch, 
> YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol

2013-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813787#comment-13813787
 ] 

Hadoop QA commented on YARN-955:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612123/YARN-955-2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2371//console

This message is automatically generated.

> [YARN-321] Implementation of ApplicationHistoryProtocol
> ---
>
> Key: YARN-955
> URL: https://issues.apache.org/jira/browse/YARN-955
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Mayank Bansal
> Attachments: YARN-955-1.patch, YARN-955-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-979) [YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol

2013-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813794#comment-13813794
 ] 

Hadoop QA commented on YARN-979:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612134/YARN-979-6.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2373//console

This message is automatically generated.

> [YARN-321] Add more APIs related to ApplicationAttempt and Container in 
> ApplicationHistoryProtocol
> --
>
> Key: YARN-979
> URL: https://issues.apache.org/jira/browse/YARN-979
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-979-1.patch, YARN-979-3.patch, YARN-979-4.patch, 
> YARN-979-5.patch, YARN-979-6.patch, YARN-979.2.patch
>
>
> ApplicationHistoryProtocol should have the following APIs as well:
> * getApplicationAttemptReport
> * getApplicationAttempts
> * getContainerReport
> * getContainers
> The corresponding request and response classes need to be added as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA

2013-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813796#comment-13813796
 ] 

Hadoop QA commented on YARN-1307:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612126/YARN-1307.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2372//console

This message is automatically generated.

> Rethink znode structure for RM HA
> -
>
> Key: YARN-1307
> URL: https://issues.apache.org/jira/browse/YARN-1307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1307.1.patch, YARN-1307.2.patch, YARN-1307.3.patch, 
> YARN-1307.4.patch
>
>
> Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
> YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
> YARN-1222:
> {quote}
> We should move to creating a node hierarchy for apps such that all znodes for 
> an app are stored under an app znode instead of the app root znode. This will 
> help in removeApplication and also in scaling better on ZK. The earlier code 
> was written this way to ensure create/delete happens under a root znode for 
> fencing. But given that we have moved to multi-operations globally, this isnt 
> required anymore.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2013-11-05 Thread Hou Song (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813811#comment-13813811
 ] 

Hou Song commented on YARN-90:
--

Thanks for the suggestions. I'm trying to modify my patch, and will upload it 
soon.
However, I don't quite understand your saying "expose this end-to-end and not 
just metrics". We have been using failed-disk metric in our prodution cluster 
for a year, and it's good enough for our rapid disk repairment. Enlight me if 
you have a better way. 

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1323) Set HTTPS webapp address along with other RPC addresses in HAUtil

2013-11-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813848#comment-13813848
 ] 

Hudson commented on YARN-1323:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #383 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/383/])
YARN-1323. Set HTTPS webapp address along with other RPC addresses in HAUtil 
(Karthik Kambatla via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1538851)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java


> Set HTTPS webapp address along with other RPC addresses in HAUtil
> -
>
> Key: YARN-1323
> URL: https://issues.apache.org/jira/browse/YARN-1323
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Fix For: 2.3.0
>
> Attachments: yarn-1323-1.patch
>
>
> YARN-1232 adds the ability to configure multiple RMs, but missed out the 
> https web app address. Need to add that in.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1388) Fair Scheduler page always displays blank fair share

2013-11-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813847#comment-13813847
 ] 

Hudson commented on YARN-1388:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #383 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/383/])
YARN-1388. Fair Scheduler page always displays blank fair share (Liyin Liang 
via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1538855)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java


> Fair Scheduler page always displays blank fair share
> 
>
> Key: YARN-1388
> URL: https://issues.apache.org/jira/browse/YARN-1388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.1
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Fix For: 2.2.1
>
> Attachments: yarn-1388.diff
>
>
> YARN-1044 fixed min/max/used resource display problem in the scheduler  page. 
> But the "Fair Share" has the same problem and need to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1323) Set HTTPS webapp address along with other RPC addresses in HAUtil

2013-11-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813902#comment-13813902
 ] 

Hudson commented on YARN-1323:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1600 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1600/])
YARN-1323. Set HTTPS webapp address along with other RPC addresses in HAUtil 
(Karthik Kambatla via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1538851)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java


> Set HTTPS webapp address along with other RPC addresses in HAUtil
> -
>
> Key: YARN-1323
> URL: https://issues.apache.org/jira/browse/YARN-1323
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Fix For: 2.3.0
>
> Attachments: yarn-1323-1.patch
>
>
> YARN-1232 adds the ability to configure multiple RMs, but missed out the 
> https web app address. Need to add that in.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1388) Fair Scheduler page always displays blank fair share

2013-11-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813901#comment-13813901
 ] 

Hudson commented on YARN-1388:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1600 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1600/])
YARN-1388. Fair Scheduler page always displays blank fair share (Liyin Liang 
via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1538855)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java


> Fair Scheduler page always displays blank fair share
> 
>
> Key: YARN-1388
> URL: https://issues.apache.org/jira/browse/YARN-1388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.1
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Fix For: 2.2.1
>
> Attachments: yarn-1388.diff
>
>
> YARN-1044 fixed min/max/used resource display problem in the scheduler  page. 
> But the "Fair Share" has the same problem and need to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1388) Fair Scheduler page always displays blank fair share

2013-11-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813919#comment-13813919
 ] 

Hudson commented on YARN-1388:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1574 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1574/])
YARN-1388. Fair Scheduler page always displays blank fair share (Liyin Liang 
via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1538855)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java


> Fair Scheduler page always displays blank fair share
> 
>
> Key: YARN-1388
> URL: https://issues.apache.org/jira/browse/YARN-1388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.1
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Fix For: 2.2.1
>
> Attachments: yarn-1388.diff
>
>
> YARN-1044 fixed min/max/used resource display problem in the scheduler  page. 
> But the "Fair Share" has the same problem and need to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1323) Set HTTPS webapp address along with other RPC addresses in HAUtil

2013-11-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813920#comment-13813920
 ] 

Hudson commented on YARN-1323:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1574 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1574/])
YARN-1323. Set HTTPS webapp address along with other RPC addresses in HAUtil 
(Karthik Kambatla via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1538851)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java


> Set HTTPS webapp address along with other RPC addresses in HAUtil
> -
>
> Key: YARN-1323
> URL: https://issues.apache.org/jira/browse/YARN-1323
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Fix For: 2.3.0
>
> Attachments: yarn-1323-1.patch
>
>
> YARN-1232 adds the ability to configure multiple RMs, but missed out the 
> https web app address. Need to add that in.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-11-05 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813933#comment-13813933
 ] 

Wangda Tan commented on YARN-1197:
--

[~bikassaha]
Actually I was on half way of implementing these and stopped by other works. :-/

For putting increasing request to ResourceRequest,
I agree, I really spent some time (the half-baked scheduler supporting 
increase) to prove putting increasing request to resource request is NOT good, 
even if you mentioned it before :(. The original reason I put increasing 
request to ResourceRequest because in literally speaking, the increasing 
request is another form of "resource request", it also ask for more resource, 
the only difference is increasing request add a restriction on the request.
But in real YARN's implementation, it's problematic to make it being part of 
resource request, I need to handle increasing cases everywhere in RM. I think 
making it a new member in AllocateRequest is cleaner solution, but potentially, 
it will cause more interfaces/implements changes (like SchedulerApplication, 
YARNScheduler, etc.). I'll continue look at it before starting write code.

I also agree for you comments for improving representation of 
ChangeContainerResourceResponse and the missed ResourceIncreaseContextProto in 
AllocateResponse. I'll add my design proposal for handle new resource in 
monitoring module.

Again, your comments are really helpful, hope to get your more ideas :)


> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197.pdf
>
>
> Currently, YARN cannot support merge several containers in one node to a big 
> container, which can make us incrementally ask resources, merge them to a 
> bigger one, and launch our processes. The user scenario is described in the 
> comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-11-05 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814065#comment-13814065
 ] 

Karthik Kambatla commented on YARN-1222:


Thanks [~bikassaha] for the close review. 

We should probably examine the ACL strategy a little more. My reasoning behind 
allowing the user to configure the ACLs is to avoid security holes. For 
instance, if we just use yarn.resourcemanager.address and the RM's 
cluster-time-stamp, third-parties (anything but the RM) can retrieve that 
information and mess with the store.

bq. How is the following case going to work? How can the root node acl be set 
in the conf? Upon active, we have to remove the old RM's cd-acl and set our 
cd-acl. That cannot be statically set in conf right?
The root-node ACLs are per RM instance. They need to be different for it to 
work. The documentation in yarn-default.xml explains this - we might have to 
make it even more clear? 

bq. My concern is that we are only adding new ACLs every time we failover but 
never deleting them. Is it possible that we end up creating too many ACLs for 
the root znode and hit ZK issues?
Don't think that is possible. On failover, if not configured, we construct root 
node ACLs from the same initial ACLs. They are not adding up across iterations. 
The number of ACLs in the list is always bounded by (user-configured-for-store 
+ 1). Am I missing something?

e.g. If the user doesn't configure any ZK-ACLs, the ACLs for the store are 
{world:anyone:rwcda} and the ACLs for the root node are {world:anyone:rwa, 
active-rm-address:active-rm-timestamp:cd} always. 

bq. For both of the above, can we use well-known prefixes for the root znode 
acls (rm-admin-acl and rm-cd-acl). 
We might be able to do that, but the user can realize it in the current 
implementation by configuring the root ACLs to exactly that.

A completely different approach to this would be to use 
session-based-authentication. The Active session claims create-delete. However, 
we might want to do that as a follow up - it might need some more refactoring 
on the store to stick to ensure a single session. 

bq. Can we move this logic into the common RMStateStore and notify it about HA 
state loss via a standard HA exception. 
I initially did that, but moved it to ZKRMStateStore because the common 
RMStateStore is oblivious to the implicit fencing mechanism in the ZKStore. Do 
you think we should make it aware of fencing - have something like a 
StoreFencedException?

bq. Will the null return make the state store crash?
It didn't store the crash in my testing. Will look at it more closely for the 
next revision of the patch.

bq. This and other similar places need an @Private
ZKRMStateStore itself is @Private @Unstable. Should we still label the methods 
@Private? 

{quote} @Private?
{code}
+  public static String getConfValueForRMInstance(String prefix,
{code}
{quote}
This was intentional - there might be merit to leaving these methods public and 
mark the class itself @Public @Unstable at the moment.

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-953) [YARN-321] Change ResourceManager to use HistoryStorage to log history data

2013-11-05 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-953:
-

Attachment: YARN-953.5.patch

I updated the a new patch, which coordinates the change of writer interface. 
The basic design is not changed: we still have a RMApplicationHistoryWriter, 
which handles writing requests from RM on different threads asynchronously. 
There're some other major changes:

1. Instead of handling all the writing events in one separate thread, I define 
a dispatcher vector to improve concurrency, and also ensure the events of one 
application is scheduled in the same thread. Therefore, the writing events of 
different applications will be processed concurrently, while events of the same 
application will be processed in the order where they are scheduled (It's 
important to ensure the events scheduled before applicationFinished to be 
processed first).

2. Make sure applicationFinished is called after all 
applicationAttemptsFinished, especially in the killing case, where RMApp moves 
to the final state before RMAppAttempt.

3. Improve the test cases.

4. Fix the break of other tests in RM project.

There's something to be handled separately:

1. We need to make RMContainer have more information to fill 
ContainerHistoryData. It's going to be done in YARN-974

2. Like RMStateStore, RMApplicationHistoryWriter needs to flush all the pending 
events given RM stops. We can make use of the update in YARN-1121 later.

> [YARN-321] Change ResourceManager to use HistoryStorage to log history data
> ---
>
> Key: YARN-953
> URL: https://issues.apache.org/jira/browse/YARN-953
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: YARN-953-5.patch, YARN-953.1.patch, YARN-953.2.patch, 
> YARN-953.3.patch, YARN-953.4.patch, YARN-953.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-953) [YARN-321] Enable ResourceManager to write history data

2013-11-05 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-953:
-

Summary: [YARN-321] Enable ResourceManager to write history data  (was: 
[YARN-321] Change ResourceManager to use HistoryStorage to log history data)

> [YARN-321] Enable ResourceManager to write history data
> ---
>
> Key: YARN-953
> URL: https://issues.apache.org/jira/browse/YARN-953
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: YARN-953-5.patch, YARN-953.1.patch, YARN-953.2.patch, 
> YARN-953.3.patch, YARN-953.4.patch, YARN-953.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-979) [YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol

2013-11-05 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814076#comment-13814076
 ] 

Zhijie Shen commented on YARN-979:
--

+1

> [YARN-321] Add more APIs related to ApplicationAttempt and Container in 
> ApplicationHistoryProtocol
> --
>
> Key: YARN-979
> URL: https://issues.apache.org/jira/browse/YARN-979
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-979-1.patch, YARN-979-3.patch, YARN-979-4.patch, 
> YARN-979-5.patch, YARN-979-6.patch, YARN-979.2.patch
>
>
> ApplicationHistoryProtocol should have the following APIs as well:
> * getApplicationAttemptReport
> * getApplicationAttempts
> * getContainerReport
> * getContainers
> The corresponding request and response classes need to be added as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA

2013-11-05 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814078#comment-13814078
 ] 

Bikas Saha commented on YARN-1307:
--

YARN-891 changed a lot of stuff

> Rethink znode structure for RM HA
> -
>
> Key: YARN-1307
> URL: https://issues.apache.org/jira/browse/YARN-1307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1307.1.patch, YARN-1307.2.patch, YARN-1307.3.patch, 
> YARN-1307.4.patch
>
>
> Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
> YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
> YARN-1222:
> {quote}
> We should move to creating a node hierarchy for apps such that all znodes for 
> an app are stored under an app znode instead of the app root znode. This will 
> help in removeApplication and also in scaling better on ZK. The earlier code 
> was written this way to ensure create/delete happens under a root znode for 
> fencing. But given that we have moved to multi-operations globally, this isnt 
> required anymore.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work

2013-11-05 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814085#comment-13814085
 ] 

Zhijie Shen commented on YARN-1266:
---

What's the relationship between this patch and "make web apps to work"? It 
seems to be related to RPC APIs instead. To make ApplicationHistoryProtocol 
work, you may need ApplicationHistoryProtocolPBClient as well.

> Adding ApplicationHistoryProtocolPBService to make web apps to work
> ---
>
> Key: YARN-1266
> URL: https://issues.apache.org/jira/browse/YARN-1266
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-1266-1.patch, YARN-1266-2.patch
>
>
> Adding ApplicationHistoryProtocolPBService to make web apps to work and 
> changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2013-11-05 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814090#comment-13814090
 ] 

Vinod Kumar Vavilapalli commented on YARN-90:
-

bq.  However, I don't quite understand your saying "expose this end-to-end and 
not just metrics". We have been using failed-disk metric in our prodution 
cluster for a year, and it's good enough for our rapid disk repairment. Enlight 
me if you have a better way. 
I meant that it should be part of client side RPC report, JMX as well as the 
metrics. Doing only one of those is incomplete and so I was suggesting that we 
do all of that in a separate JIRA.

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-11-05 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814092#comment-13814092
 ] 

Bikas Saha commented on YARN-1222:
--

bq. The root-node ACLs are per RM instance. They need to be different for it to 
work. The documentation in yarn-default.xml explains this - we might have to 
make it even more clear?
Clarifying it, possibly with an example would be good.

bq. The number of ACLs in the list is always bounded by 
(user-configured-for-store + 1). Am I missing something?
I missed that the patch is modifying the base acl from config and not the 
actual acl from the znode. The latter would have increased the count. The 
former is fine. The current code is good.

Where is the shared rm-admin-acl being set such that both RMs have admin access 
to the root znode? This probably works because the default is world:all. But if 
that is not the case, and we are using internally generated acls, then the rm 
has to give shared admin access to the other rm when it creates the root znode, 
right?

bq. Do you think we should make it aware of fencing - have something like a 
StoreFencedException?
I think it should be aware of when the store is not available to it because it 
has been fenced out. There are/were comments in state store error handling to 
differentiate between exceptions when we have such a differentiation. So we 
should create a Fenced exception (look at HDFS code for an example). This way 
all state store should be able to return this incident for identical handling 
in the upper layers. We would like to avoid state store impls (which are 
technically runtime pluggable pieces) to have to understand internal Hadoop 
code patterns for HA etc.

bq. ZKRMStateStore itself is @Private @Unstable. Should we still label the 
methods @Private?
At some point ZKRMStateStore will become public/stable but these methods should 
remain private for testing, right?




> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-11-05 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814102#comment-13814102
 ] 

Karthik Kambatla commented on YARN-1222:


bq. Where is the shared rm-admin-acl being set such that both RMs have admin 
access to the root znode?
The shared rm-admin-acl comes from ZK_RM_STATE_STORE_ACL set by the user or 
default (world:anyone), if ZK_RM_STATE_STORE_ROOT_NODE_ACL is not set. If 
ZK_RM_STATE_STORE_ROOT_NODE_ACL is set by the user, it is the user's 
responsibility to set the ACLs in a way that both RMs have admin access and 
claim exclusive c-d access.

Agree with rest of the points. Will address them in the next revision.

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-11-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814104#comment-13814104
 ] 

Hudson commented on YARN-1374:
--

FAILURE: Integrated in Hadoop-trunk-Commit #4695 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4695/])
YARN-1374. Changed ResourceManager to start the preemption policy monitors as 
active services. Contributed by Karthik Kambatla. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1539089)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/TestSchedulingMonitor.java


> Resource Manager fails to start due to ConcurrentModificationException
> --
>
> Key: YARN-1374
> URL: https://issues.apache.org/jira/browse/YARN-1374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Assignee: Karthik Kambatla
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: yarn-1374-1.patch, yarn-1374-1.patch
>
>
> Resource Manager is failing to start with the below 
> ConcurrentModificationException.
> {code:xml}
> 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
> Refreshing hosts (include/exclude) list
> 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state INITED; cause: 
> java.util.ConcurrentModificationException
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioning to standby
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioned to standby
> 2013-10-30 20:22:42,378 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,379 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1266) Adding ApplicationHistoryProtocolPBService

2013-11-05 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1266:


Summary: Adding ApplicationHistoryProtocolPBService  (was: Adding 
ApplicationHistoryProtocolPBService to make web apps to work)

> Adding ApplicationHistoryProtocolPBService
> --
>
> Key: YARN-1266
> URL: https://issues.apache.org/jira/browse/YARN-1266
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-1266-1.patch, YARN-1266-2.patch
>
>
> Adding ApplicationHistoryProtocolPBService to make web apps to work and 
> changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work

2013-11-05 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814107#comment-13814107
 ] 

Mayank Bansal commented on YARN-1266:
-

[~zjshen]
the description is not correct for the JIRA, Fixing it . Yes it is related to 
RPC server to start.
The client is used in CLI and already covered in that patch.

Thanks,
Mayank

> Adding ApplicationHistoryProtocolPBService to make web apps to work
> ---
>
> Key: YARN-1266
> URL: https://issues.apache.org/jira/browse/YARN-1266
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-1266-1.patch, YARN-1266-2.patch
>
>
> Adding ApplicationHistoryProtocolPBService to make web apps to work and 
> changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-11-05 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814108#comment-13814108
 ] 

Bikas Saha commented on YARN-1222:
--

Lets make that clear in the yarn-site/configuration.

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.

2013-11-05 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1320:


Attachment: YARN-1320.8.patch

> Custom log4j properties in Distributed shell does not work properly.
> 
>
> Key: YARN-1320
> URL: https://issues.apache.org/jira/browse/YARN-1320
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.2.1
>
> Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch, 
> YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, 
> YARN-1320.4.patch, YARN-1320.5.patch, YARN-1320.6.patch, YARN-1320.6.patch, 
> YARN-1320.7.patch, YARN-1320.8.patch
>
>
> Distributed shell cannot pick up custom log4j properties (specified with 
> -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-11-05 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1121:
--

Attachment: YARN-1121.6.patch

bq. There are 3 new booleans with 8 combinations possible between them
Each of them serves different purpose, also add comment in the code:
- drained:  indicates the dispatcher queue's events have been drained and 
processed.
- drainingStopNeeded(renames to drainEventsOnStop):  a configuration boolean 
which enables or disables drain on stop functionality.
- drainingStop(renames to blockNewRequests): only for the purpose to block 
newly coming events while draining to stop.

bq. Given that storing stuff will be over the network and slow,  why not have a 
wait notify between this thread and the draining thread?
To do that, we may need to add things in dispatcher's runnable like 
"if(queueEmpty) notify", and this is likely to be invoked in every normal 
execution of the dispatch while loop if queue is empty, even it's not actually 
in stop phase, which may create more overhead, as this AsyncDispatcher is used 
everywhere.

bq. DrainEventHandler sounds misleading. 
- renames DrainEventHandler to DropEventHandler

bq.The other thing we can do is take a count of the number of pending events to 
drain at service stop.
For that, we change the new logic from blocking new events coming to the queue 
To process a fixed number of events out of the queue, again we may need one 
more counter to indicate how many events we have processed out of the queue.

Uploaded a new patch to address the comments

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.2.1
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.

2013-11-05 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814137#comment-13814137
 ] 

Xuan Gong commented on YARN-1320:
-

bq. I doubt if the patch is going to work if the remote file-system is HDFS. 
The propagation of the log4j properties file is via HDFS and it doesn't look 
like it is handled correctly. Please check.

I set up a three node cluster locally and test it. It works. But I still make a 
little change. I believe the custom log4j should be application based. So, I 
change the code to upload file to user/appname/appid folder(the same position 
as we upload AppMaster.jar file) in file system instead of directly under /user 
folder. 

> Custom log4j properties in Distributed shell does not work properly.
> 
>
> Key: YARN-1320
> URL: https://issues.apache.org/jira/browse/YARN-1320
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.2.1
>
> Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch, 
> YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, 
> YARN-1320.4.patch, YARN-1320.5.patch, YARN-1320.6.patch, YARN-1320.6.patch, 
> YARN-1320.7.patch, YARN-1320.8.patch
>
>
> Distributed shell cannot pick up custom log4j properties (specified with 
> -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-11-05 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814138#comment-13814138
 ] 

Jian He commented on YARN-1121:
---

bq. drainingStop(renames to blockNewRequests):
Typo, in the patch, it's actually named blockNewEvents.

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.2.1
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-11-05 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814143#comment-13814143
 ] 

Vinod Kumar Vavilapalli commented on YARN-674:
--

bq. We were intentionally going through the same submitApplication() method to 
make sure that all the initialization and setup code paths are consistently 
followed in both cases by keeping the code path identical as much as possible.
I didn't mean to fork the code, but it seems like the patch is doing exactly 
that. My original intention was to make submitApplicationOnRecovery() call 
submitApplication().

> Slow or failing DelegationToken renewals on submission itself make RM 
> unavailable
> -
>
> Key: YARN-674
> URL: https://issues.apache.org/jira/browse/YARN-674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
> YARN-674.4.patch, YARN-674.5.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look 
> like RM is unavailable as it may run out of RPC handlers due to blocked 
> client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814150#comment-13814150
 ] 

Hadoop QA commented on YARN-1121:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612223/YARN-1121.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2375//console

This message is automatically generated.

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.2.1
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.

2013-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814152#comment-13814152
 ] 

Hadoop QA commented on YARN-1320:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/1261/YARN-1320.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2374//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2374//console

This message is automatically generated.

> Custom log4j properties in Distributed shell does not work properly.
> 
>
> Key: YARN-1320
> URL: https://issues.apache.org/jira/browse/YARN-1320
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.2.1
>
> Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch, 
> YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, 
> YARN-1320.4.patch, YARN-1320.5.patch, YARN-1320.6.patch, YARN-1320.6.patch, 
> YARN-1320.7.patch, YARN-1320.8.patch
>
>
> Distributed shell cannot pick up custom log4j properties (specified with 
> -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-11-05 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1121:
--

Attachment: YARN-1121.6.patch

No idea why jenkins' not applying the patch, submit the same patch again

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.2.1
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, 
> YARN-1121.6.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-987) Adding History Service to use Store and converting Historydata to Report

2013-11-05 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814196#comment-13814196
 ] 

Mayank Bansal commented on YARN-987:


Thanks [~zjshen] for the review.

bq. As we're going to have cache, the abstraction of ApplicationHistoryContext 
may be necessary. However, one more question here: webUI and services are going 
to use ApplicationHistoryContext as well, right? if they are, returning report 
PB is actually not necessary for web. If they're not, webUI and services need a 
duplicate abstraction of combining cache and store, which is concise in terms 
of coding.

As discussed offline, We should be using History context for both client and 
UI, however it has one drawback of using proto objects to UI. Otherwise we need 
to have seprate classes for UI which I think duplicate of work.

bq. Add the config to yarn-default.xml as well. Btw, is "store.class" a bit 
better, as we have XXXApplicationHistoryStore, not XXXApplicationHistoryStorage?
Done

bq. Unnecessary code. ApplicationHistoryStore must be a service
Done.

bq. Unnecessary code. ApplicationHistoryStore must be a service
Done

bq. For ApplicationReport, you may want to get the history data of its last 
application attempt to fill the empty fields bellow.
Done

Thanks,
Mayank


> Adding History Service to use Store and converting Historydata to Report
> 
>
> Key: YARN-987
> URL: https://issues.apache.org/jira/browse/YARN-987
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-987-1.patch, YARN-987-2.patch, YARN-987-3.patch, 
> YARN-987-4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-987) Adding History Service to use Store and converting Historydata to Report

2013-11-05 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-987:
---

Attachment: YARN-987-5.patch

Attaching Latest Patch.

Thanks,
Mayank

> Adding History Service to use Store and converting Historydata to Report
> 
>
> Key: YARN-987
> URL: https://issues.apache.org/jira/browse/YARN-987
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-987-1.patch, YARN-987-2.patch, YARN-987-3.patch, 
> YARN-987-4.patch, YARN-987-5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814202#comment-13814202
 ] 

Hadoop QA commented on YARN-1121:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612228/YARN-1121.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2376//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2376//console

This message is automatically generated.

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.2.1
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, 
> YARN-1121.6.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-987) Adding History Service to use Store and converting Historydata to Report

2013-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814209#comment-13814209
 ] 

Hadoop QA commented on YARN-987:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612233/YARN-987-5.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2377//console

This message is automatically generated.

> Adding History Service to use Store and converting Historydata to Report
> 
>
> Key: YARN-987
> URL: https://issues.apache.org/jira/browse/YARN-987
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-987-1.patch, YARN-987-2.patch, YARN-987-3.patch, 
> YARN-987-4.patch, YARN-987-5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1390) Add applicationSource to ApplicationSubmissionContext and RMApp

2013-11-05 Thread Karthik Kambatla (JIRA)

Karthik Kambatla created YARN-1390:
--

 Summary: Add applicationSource to ApplicationSubmissionContext and 
RMApp
 Key: YARN-1390
 URL: https://issues.apache.org/jira/browse/YARN-1390
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.2.0
Reporter: Karthik Kambatla






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1390) Add applicationSource to ApplicationSubmissionContext and RMApp

2013-11-05 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1390:
---

 Description: 
In addition to other fields like application-type (added in YARN-563), it is 
useful to have an applicationSource field to track the source of an 
application. The application source can be useful in (1) fetching only those 
applications a user is interested in, (2) potentially adding source-specific 
optimizations in the future. 

Example of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop etc.
Target Version/s: 2.3.0
Assignee: Karthik Kambatla

> Add applicationSource to ApplicationSubmissionContext and RMApp
> ---
>
> Key: YARN-1390
> URL: https://issues.apache.org/jira/browse/YARN-1390
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>
> In addition to other fields like application-type (added in YARN-563), it is 
> useful to have an applicationSource field to track the source of an 
> application. The application source can be useful in (1) fetching only those 
> applications a user is interested in, (2) potentially adding source-specific 
> optimizations in the future. 
> Example of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop 
> etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1390) Add applicationSource to ApplicationSubmissionContext and RMApp

2013-11-05 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1390:
---

Description: 
In addition to other fields like application-type (added in YARN-563), it is 
useful to have an applicationSource field to track the source of an 
application. The application source can be useful in (1) fetching only those 
applications a user is interested in, (2) potentially adding source-specific 
optimizations in the future. 

Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop 
etc.

  was:
In addition to other fields like application-type (added in YARN-563), it is 
useful to have an applicationSource field to track the source of an 
application. The application source can be useful in (1) fetching only those 
applications a user is interested in, (2) potentially adding source-specific 
optimizations in the future. 

Example of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop etc.


> Add applicationSource to ApplicationSubmissionContext and RMApp
> ---
>
> Key: YARN-1390
> URL: https://issues.apache.org/jira/browse/YARN-1390
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>
> In addition to other fields like application-type (added in YARN-563), it is 
> useful to have an applicationSource field to track the source of an 
> application. The application source can be useful in (1) fetching only those 
> applications a user is interested in, (2) potentially adding source-specific 
> optimizations in the future. 
> Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop 
> etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1391) Lost node list contains many active node with different port

2013-11-05 Thread Siqi Li (JIRA)

Siqi Li created YARN-1391:
-

 Summary: Lost node list contains many active node with different 
port
 Key: YARN-1391
 URL: https://issues.apache.org/jira/browse/YARN-1391
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1391) Lost node list contains many active node with different port

2013-11-05 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1391:
--

Description: When restarting node manager, the active node list in webUI 
will contain duplicate entries. Such two entries have the same host name with 
different port number. After expiry interval, the older entry will get expired 
and transitioned to lost node list, and stay there until this node gets 
restarted again.

> Lost node list contains many active node with different port
> 
>
> Key: YARN-1391
> URL: https://issues.apache.org/jira/browse/YARN-1391
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>
> When restarting node manager, the active node list in webUI will contain 
> duplicate entries. Such two entries have the same host name with different 
> port number. After expiry interval, the older entry will get expired and 
> transitioned to lost node list, and stay there until this node gets restarted 
> again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1391) Lost node list contains many active node with different port

2013-11-05 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814272#comment-13814272
 ] 

Sandy Ryza commented on YARN-1391:
--

This is related to YARN-1382

> Lost node list contains many active node with different port
> 
>
> Key: YARN-1391
> URL: https://issues.apache.org/jira/browse/YARN-1391
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>
> When restarting node manager, the active node list in webUI will contain 
> duplicate entries. Such two entries have the same host name with different 
> port number. After expiry interval, the older entry will get expired and 
> transitioned to lost node list, and stay there until this node gets restarted 
> again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1391) Lost node list contains many active node with different port

2013-11-05 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1391:
--

Component/s: resourcemanager

> Lost node list contains many active node with different port
> 
>
> Key: YARN-1391
> URL: https://issues.apache.org/jira/browse/YARN-1391
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Siqi Li
>Assignee: Siqi Li
>
> When restarting node manager, the active node list in webUI will contain 
> duplicate entries. Such two entries have the same host name with different 
> port number. After expiry interval, the older entry will get expired and 
> transitioned to lost node list, and stay there until this node gets restarted 
> again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1391) Lost node list contains many active node with different port

2013-11-05 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1391:
--

Affects Version/s: 2.0.5-alpha

> Lost node list contains many active node with different port
> 
>
> Key: YARN-1391
> URL: https://issues.apache.org/jira/browse/YARN-1391
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
>
> When restarting node manager, the active node list in webUI will contain 
> duplicate entries. Such two entries have the same host name with different 
> port number. After expiry interval, the older entry will get expired and 
> transitioned to lost node list, and stay there until this node gets restarted 
> again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-11-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814274#comment-13814274
 ] 

Hudson commented on YARN-311:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4696 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4696/])
YARN-311. RM/scheduler support for dynamic resource configuration. (Junping Du 
via llu) (llu: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1539134)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceOption.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceOptionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


> Dynamic node resource configuration: core scheduler changes
> ---
>
> Key: YARN-311
> URL: https://issues.apache.org/jira/browse/YARN-311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-311-v1.patch, YARN-311-v10.patch, 
> YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v12b.patch, 
> YARN-311-v13.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, 
> YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
> YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, 
> YARN-311-v9.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating s

[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-11-05 Thread Luke Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Lu updated YARN-311:
-

Fix Version/s: 2.3.0
 Hadoop Flags: Reviewed

Committed to trunk and branch-2. Thanks Junping for the patch!

> Dynamic node resource configuration: core scheduler changes
> ---
>
> Key: YARN-311
> URL: https://issues.apache.org/jira/browse/YARN-311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.3.0
>
> Attachments: YARN-311-v1.patch, YARN-311-v10.patch, 
> YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v12b.patch, 
> YARN-311-v13.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, 
> YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
> YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, 
> YARN-311-v9.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating status comes, the RMNode's resource 
> change will be aware and the delta resource is added to schedulerNode's 
> availableResource before actual scheduling happens.
> 3. Scheduler do resource allocation according to new availableResource in 
> SchedulerNode.
> For more design details, please refer proposal and discussions in parent 
> JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-11-05 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814291#comment-13814291
 ] 

Omkar Vinit Joshi commented on YARN-674:


Thanks [~bikassaha] for the review
bq. We were intentionally going through the same submitApplication() method to 
make sure that all the initialization and setup code paths are consistently 
followed in both cases by keeping the code path identical as much as possible. 
The RM would submit a recovered application, in essence proxying a user 
submitting the application. Its a general pattern followed through the recovery 
logic - to be minimally invasive to the mainline code path so that we can avoid 
functional bugs as much as possible. Separating them into 2 methods has 
resulted in code duplication in both methods without any huge benefit that I 
can see. It also leave us susceptible to future code changes made in one code 
path and not the other.
I agree with your suggestion... reverting the changes ..discussed with 
[~vinodkv] offline.

bq. Why is isSecurityEnabled() being checked at this internal level. The code 
should not even reach this point if security is not enabled. 
you have a point ..fixing it..

bq. Also why is it calling 
rmContext.getDelegationTokenRenewer().addApplication(event) instead of 
DelegationTokenRenewer.this.addApplication(). Same for 
rmContext.getDelegationTokenRenewer().applicationFinished(evt);
Makes sense...fixed it..

bq. Rename DelegationTokenRenewerThread to not have misleading Thread in the 
name ?
fixed.

bq. Can DelegationTokenRenewerAppSubmitEvent event objects have an event type 
different from VERIFY_AND_START_APPLICATION? If not, we dont need this check 
and we can change the constructor of DelegationTokenRenewerAppSubmitEvent to 
not expect an event type argument. It should set the 
VERIFY_AND_START_APPLICATION within the constructor.
fixed..

bq. @Private + @VisibleForTesting???
fixed.


> Slow or failing DelegationToken renewals on submission itself make RM 
> unavailable
> -
>
> Key: YARN-674
> URL: https://issues.apache.org/jira/browse/YARN-674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
> YARN-674.4.patch, YARN-674.5.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look 
> like RM is unavailable as it may run out of RPC handlers due to blocked 
> client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-11-05 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-674:
---

Attachment: YARN-674.5.patch

> Slow or failing DelegationToken renewals on submission itself make RM 
> unavailable
> -
>
> Key: YARN-674
> URL: https://issues.apache.org/jira/browse/YARN-674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
> YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look 
> like RM is unavailable as it may run out of RPC handlers due to blocked 
> client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-11-05 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1222:
---

Attachment: yarn-1222-5.patch

Here is a patch that moves the fencing and transition to standby logic from 
ZKRMStateStore to RMStateStore. The store implementations are expected to throw 
a {{StoreFencedException}} when they are fenced. Uploading this for any quick 
remarks.

Will post another patch to address the suggested cosmetic changes.

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch, yarn-1222-5.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1195) RM may relaunch already KILLED / FAILED jobs after RM restarts

2013-11-05 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814331#comment-13814331
 ] 

Jian He commented on YARN-1195:
---

YARN-891 fixed this. As we store the completed application info for 
FAILED/KILLED/FINISHED apps on app completion, on restart just look for if 
application is at such final state, if it is do not restart the app.

Closed this.

> RM may relaunch already KILLED / FAILED jobs after RM restarts
> --
>
> Key: YARN-1195
> URL: https://issues.apache.org/jira/browse/YARN-1195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>
> Just like YARN-540, RM restarts after job killed/failed , but before App 
> state info is cleaned from store. the next time RM comes back, it will 
> relaunch the job again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (YARN-1195) RM may relaunch already KILLED / FAILED jobs after RM restarts

2013-11-05 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-1195.
---

Resolution: Fixed

> RM may relaunch already KILLED / FAILED jobs after RM restarts
> --
>
> Key: YARN-1195
> URL: https://issues.apache.org/jira/browse/YARN-1195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>
> Just like YARN-540, RM restarts after job killed/failed , but before App 
> state info is cleaned from store. the next time RM comes back, it will 
> relaunch the job again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814336#comment-13814336
 ] 

Hadoop QA commented on YARN-674:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612257/YARN-674.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2378//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2378//console

This message is automatically generated.

> Slow or failing DelegationToken renewals on submission itself make RM 
> unavailable
> -
>
> Key: YARN-674
> URL: https://issues.apache.org/jira/browse/YARN-674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
> YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look 
> like RM is unavailable as it may run out of RPC handlers due to blocked 
> client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-671) Add an interface on the RM to move NMs into a maintenance state

2013-11-05 Thread Cindy Li (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814346#comment-13814346
 ] 

Cindy Li commented on YARN-671:
---

YARN914 is for graceful decommission, which could tolerate longer waiting time. 
As to the resource reported to scheduler, seems the patch has been available in 
trunk, which we can base that on too. 

Either graceful decommission or draining method would result in resource wasted 
in the node. Another Jira MAPREDUCE 4710 deals with similar issue, where map 
output lost is tolerated but it has the benefit of not wasting resources in the 
node during the whole rolling restart process. 



 

> Add an interface on the RM to move NMs into a maintenance state
> ---
>
> Key: YARN-671
> URL: https://issues.apache.org/jira/browse/YARN-671
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.0.4-alpha
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1336) Work-preserving nodemanager restart

2013-11-05 Thread Cindy Li (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814372#comment-13814372
 ] 

Cindy Li commented on YARN-1336:


In the case of rolling upgrade, e.g. some new configuration or fix would be 
picked up when node manager restarts, would that cause any issue during the 
state/work recovering process? 

> Work-preserving nodemanager restart
> ---
>
> Key: YARN-1336
> URL: https://issues.apache.org/jira/browse/YARN-1336
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>
> This serves as an umbrella ticket for tasks related to work-preserving 
> nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Reopened] (YARN-1195) RM may relaunch already KILLED / FAILED jobs after RM restarts

2013-11-05 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened YARN-1195:
---


> RM may relaunch already KILLED / FAILED jobs after RM restarts
> --
>
> Key: YARN-1195
> URL: https://issues.apache.org/jira/browse/YARN-1195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>
> Just like YARN-540, RM restarts after job killed/failed , but before App 
> state info is cleaned from store. the next time RM comes back, it will 
> relaunch the job again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (YARN-1195) RM may relaunch already KILLED / FAILED jobs after RM restarts

2013-11-05 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1195.
---

Resolution: Duplicate

> RM may relaunch already KILLED / FAILED jobs after RM restarts
> --
>
> Key: YARN-1195
> URL: https://issues.apache.org/jira/browse/YARN-1195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>
> Just like YARN-540, RM restarts after job killed/failed , but before App 
> state info is cleaned from store. the next time RM comes back, it will 
> relaunch the job again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Reopened] (YARN-1330) Fair Scheduler: defaultQueueSchedulingPolicy does not take effect

2013-11-05 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza reopened YARN-1330:
--


It looks like this can lead to NPEs in the Fair Scheduler during certain 
initialization conditions.  Will upload an addendum patch.

> Fair Scheduler: defaultQueueSchedulingPolicy does not take effect
> -
>
> Key: YARN-1330
> URL: https://issues.apache.org/jira/browse/YARN-1330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.2.1
>
> Attachments: YARN-1330-1.patch, YARN-1330-1.patch, YARN-1330.patch
>
>
> The defaultQueueSchedulingPolicy property for the Fair Scheduler allocations 
> file doesn't take effect.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-11-05 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814403#comment-13814403
 ] 

Bikas Saha commented on YARN-674:
-

The assert doesnt make it to the production jar - so it wont catch anything on 
the cluster. Need to throw an exception here. If we dont want to crash the RM 
here then we can log and error. When the attempt state machine gets the event 
then it will crash on the async dispatcher thread if the event is not handled 
in the current state.
{code}+assert application.getState() == RMAppState.NEW;{code}

> Slow or failing DelegationToken renewals on submission itself make RM 
> unavailable
> -
>
> Key: YARN-674
> URL: https://issues.apache.org/jira/browse/YARN-674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
> YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look 
> like RM is unavailable as it may run out of RPC handlers due to blocked 
> client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1266) Adding ApplicationHistoryProtocolPBService

2013-11-05 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814405#comment-13814405
 ] 

Zhijie Shen commented on YARN-1266:
---

Then, it make sense.

+1

> Adding ApplicationHistoryProtocolPBService
> --
>
> Key: YARN-1266
> URL: https://issues.apache.org/jira/browse/YARN-1266
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-1266-1.patch, YARN-1266-2.patch
>
>
> Adding ApplicationHistoryProtocolPBService to make web apps to work and 
> changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-11-05 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814410#comment-13814410
 ] 

Bikas Saha commented on YARN-1121:
--

bq. To do that, we may need to add things in dispatcher's runnable like 
"if(queueEmpty) notify", and this is likely to be invoked in every normal 
execution of the dispatch while loop if queue is empty, even it's not actually 
in stop phase, which may create more overhead, as this AsyncDispatcher is used 
everywhere.
Can this be only enabled when serviceStop sets the drain events flag. In normal 
situations that flag will not be set.

Replacing the eventHandler to DropEventHandler (instead of GenericEventHandler) 
may not be enough. Someone may have already gotten a GenericEventHandler object 
and may send events using that object. So new events will keep getting added to 
the queue from those cached GenericEventHandler object. So, I think keeping 
track of the number of events to drain and only draining those many events will 
be a more robust solution.

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.2.1
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, 
> YARN-1121.6.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-11-05 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814425#comment-13814425
 ] 

Bikas Saha commented on YARN-1222:
--

REQUEST_BY_USER_FORCED is probably not the right choice.
{code}+  target.getProxy(getConfig(), 1000).transitionToStandby(
+  new HAServiceProtocol.StateChangeRequestInfo(
+  HAServiceProtocol.RequestSource.REQUEST_BY_USER_FORCED));
+} catch (IOException e) {
{code}

There are finally blocks that call methods like 
notifyDoneStoringApplicationAttempt() These end up sending events to the RM 
modules which check for the exception and then call terminate for the RM Java 
process. We probably dont want that to happen since we simply want to 
transitionToStandby and discard all the internal state.

Thinking aloud, using HAServiceTarget in RMStateStore to transitionToStandby() 
may not be the right solution. We are effectively doing an internal RPC on an 
ACL'd protocol. Is it guaranteed to succeed? Should we think of sending an 
event to the HAProtocolService or have a reference to the HAProtocolService so 
that it can be directly notified about this situation. Then the 
HAProtocolService may transition to standby internally. The store should inform 
the higher entity about the fenced state and not take action on the higher 
entity by fencing it. Thoughts?

> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch, yarn-1222-5.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-11-05 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814442#comment-13814442
 ] 

Jian He commented on YARN-674:
--

Saw this is changed back to asynchronous submission on recovery, the original 
intention was to prevent client from seeing the application as a new 
application. If asynchronously, the client can query the application before 
recover event gets processed, meaning before the application is fully recovered 
as some recover logic happens when app is processing the recover 
event(app.FinalTransition).

{code}
 // Recover the app synchronously, as otherwise client is possible to see
  // the application not recovered before it is actually recovered because
  // ClientRMService is already started at this point of time.
  appImpl.handle(new RMAppEvent(appImpl.getApplicationId(),
RMAppEventType.RECOVER));
{code}

> Slow or failing DelegationToken renewals on submission itself make RM 
> unavailable
> -
>
> Key: YARN-674
> URL: https://issues.apache.org/jira/browse/YARN-674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
> YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look 
> like RM is unavailable as it may run out of RPC handlers due to blocked 
> client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-11-05 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814478#comment-13814478
 ] 

Junping Du commented on YARN-311:
-

Thanks Luke for review!

> Dynamic node resource configuration: core scheduler changes
> ---
>
> Key: YARN-311
> URL: https://issues.apache.org/jira/browse/YARN-311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.3.0
>
> Attachments: YARN-311-v1.patch, YARN-311-v10.patch, 
> YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v12b.patch, 
> YARN-311-v13.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, 
> YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
> YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, 
> YARN-311-v9.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating status comes, the RMNode's resource 
> change will be aware and the delta resource is added to schedulerNode's 
> availableResource before actual scheduling happens.
> 3. Scheduler do resource allocation according to new availableResource in 
> SchedulerNode.
> For more design details, please refer proposal and discussions in parent 
> JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-987) Adding History Service to use Store and converting Historydata to Report

2013-11-05 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814486#comment-13814486
 ] 

Zhijie Shen commented on YARN-987:
--

* The unnecessary type casting is still there.
{code}
+  @Override
+  protected void serviceStart() throws Exception {
+LOG.info("Starting ApplicationHistory");
+if (historyStore instanceof Service) {
+  ((Service) historyStore).start();
+}
+super.serviceStart();
+  }
+
+  @Override
+  protected void serviceStop() throws Exception {
+LOG.info("Stopping ApplicationHistory");
+if (historyStore != null && historyStore instanceof Service) {
+  ((Service) historyStore).stop();
+}
+super.serviceStop();
+  }
{code}

* lastAttempt can be null. Should do null check. Otherwise, NPE may be 
expected. Btw, it not like other methods which is straightforward wrap-up. Is 
it good to write a test case for this one?
{code}
+ApplicationAttemptHistoryData lastAttempt = getLastAttempt(appHistory
+.getApplicationId());
{code}

> Adding History Service to use Store and converting Historydata to Report
> 
>
> Key: YARN-987
> URL: https://issues.apache.org/jira/browse/YARN-987
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-987-1.patch, YARN-987-2.patch, YARN-987-3.patch, 
> YARN-987-4.patch, YARN-987-5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-11-05 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1121:
--

Attachment: YARN-1121.7.patch

bq. Replacing the eventHandler to DropEventHandler (instead of 
GenericEventHandler) may not be enough
Good catch ! New patch removes the DropEventHandler and just do return if 
blockNewEvents is true in GenericEventHandler

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.2.1
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, 
> YARN-1121.6.patch, YARN-1121.7.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

2013-11-05 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814496#comment-13814496
 ] 

Omkar Vinit Joshi commented on YARN-1210:
-

Thanks [~jianhe] for reviewing it.

{code}
Instead of passing running containers as parameter in 
RegisterNodeManagerRequest, is it possible to just call heartBeat immediately 
after registerCall and then unBlockNewContainerRequests ? That way we can take 
advantage of the existing heartbeat logic, cover other things like keep app 
alive for log aggregation after AM container completes.
Or at least we can send the list of ContainerStatus(including diagnostics) 
instead of just container Ids and also the list of keep-alive apps (separate 
jira)?
{code}
it makes sense replacing finishedContainers with containerStatuses. 

bq. Unnecessary import changes in DefaultContainerExecutor.java and 
LinuxContainerExecutor, ContainerLaunch, ContainersLauncher
actually I wanted that earlier as I had created new ExitCode.java. I wanted to 
access it from ResourceTrackerService. Now since we are sending container 
status from node manager itself so no longer need that ..fixed it.

bq. Finished containers may not necessary be killed. The containers can also 
normal finish and remain in the NM cache before NM resync.
Updated the logic for cleanupContainers on node manager side. Now we should 
have all the finishedContainer statuses as it is.

bq. wrong LOG class name.
:) fixed it..

bq. LogFactory.getLog(RMAppImpl.class);
removed.

bq. Isn't always the case that after this patch only the last attempt can be 
running ? a new attempt will not be launched until the previous attempt reports 
back it really exits. If this is case, it can be a bug.
We may only need to check that if the last attempt is finished or not.
It is actually checking for any attempt to be in non running state. Do you want 
me to only check last attempt (by comparing application attempt ids)?.

bq. should we return RUNNING or ACCEPTED for apps that are not in final state ? 
It's ok to return RUNNING in the scope of this patch because anyways we are 
launching a new attempt. Later on in working preserving restart, RM can crash 
before attempt register, attempt can register with RM after RM comes back in 
which case we can then move app from ACCEPTED to RUNNING?
Yes right now I will keep it as RUNNING only. Today we don't have any 
information whether previous application master started and registered or not. 
Once we will have that information then probably we can do this.

> During RM restart, RM should start a new attempt only when previous attempt 
> exits for real
> --
>
> Key: YARN-1210
> URL: https://issues.apache.org/jira/browse/YARN-1210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1210.1.patch, YARN-1210.2.patch
>
>
> When RM recovers, it can wait for existing AMs to contact RM back and then 
> kill them forcefully before even starting a new AM. Worst case, RM will start 
> a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
> we'll minimize multiple AMs racing with each other. This can help issues with 
> downstream components like Pig, Hive and Oozie during RM restart.
> In the mean while, new apps will proceed as usual as existing apps wait for 
> recovery.
> This can continue to be useful after work-preserving restart, so that AMs 
> which can properly sync back up with RM can continue to run and those that 
> don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814512#comment-13814512
 ] 

Hadoop QA commented on YARN-1121:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612294/YARN-1121.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2379//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2379//console

This message is automatically generated.

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.2.1
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, 
> YARN-1121.6.patch, YARN-1121.7.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

2013-11-05 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1210:


Attachment: YARN-1210.3.patch

> During RM restart, RM should start a new attempt only when previous attempt 
> exits for real
> --
>
> Key: YARN-1210
> URL: https://issues.apache.org/jira/browse/YARN-1210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch
>
>
> When RM recovers, it can wait for existing AMs to contact RM back and then 
> kill them forcefully before even starting a new AM. Worst case, RM will start 
> a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
> we'll minimize multiple AMs racing with each other. This can help issues with 
> downstream components like Pig, Hive and Oozie during RM restart.
> In the mean while, new apps will proceed as usual as existing apps wait for 
> recovery.
> This can continue to be useful after work-preserving restart, so that AMs 
> which can properly sync back up with RM can continue to run and those that 
> don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (YARN-832) Update Resource javadoc to clarify units for memory

2013-11-05 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza resolved YARN-832.
-

Resolution: Duplicate

This was fixed in YARN-976

> Update Resource javadoc to clarify units for memory
> ---
>
> Key: YARN-832
> URL: https://issues.apache.org/jira/browse/YARN-832
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>  Labels: newbie
>
> These values are supposed to be megabytes (need to check MB vs MiB ie 1000 vs 
> 1024)
>   /**
>* Get memory of the resource.
>* @return memory of the resource
>*/
>   @Public
>   @Stable
>   public abstract int getMemory();
>   
>   /**
>* Set memory of the resource.
>* @param memory memory of the resource
>*/
>   @Public
>   @Stable
>   public abstract void setMemory(int memory);



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-11-05 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814541#comment-13814541
 ] 

Omkar Vinit Joshi commented on YARN-674:


Thanks [~jianhe], [~bikassaha] .

bq. Saw this is changed back to asynchronous submission on recovery, the 
original intention was to prevent client from seeing the application as a new 
application. If asynchronously, the client can query the application before 
recover event gets processed, meaning before the application is fully recovered 
as some recover logic happens when app is processing the recover 
event(app.FinalTransition).
fixed to make sure that it gets updated synchronously.

bq. The assert doesnt make it to the production jar - so it wont catch anything 
on the cluster. Need to throw an exception here. If we dont want to crash the 
RM here then we can log and error. When the attempt state machine gets the 
event then it will crash on the async dispatcher thread if the event is not 
handled in the current state.
discussed with bikas offline.. this is fine.

> Slow or failing DelegationToken renewals on submission itself make RM 
> unavailable
> -
>
> Key: YARN-674
> URL: https://issues.apache.org/jira/browse/YARN-674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
> YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look 
> like RM is unavailable as it may run out of RPC handlers due to blocked 
> client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-11-05 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-674:
---

Attachment: YARN-674.6.patch

> Slow or failing DelegationToken renewals on submission itself make RM 
> unavailable
> -
>
> Key: YARN-674
> URL: https://issues.apache.org/jira/browse/YARN-674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
> YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch, YARN-674.6.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look 
> like RM is unavailable as it may run out of RPC handlers due to blocked 
> client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again

2013-11-05 Thread Hou Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hou Song updated YARN-90:
-

Attachment: YARN-90.patch

Now I understand, thanks. 
Please review this patch first, and will open a new jira for the information 
exporsure soon. 

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814569#comment-13814569
 ] 

Hadoop QA commented on YARN-674:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612310/YARN-674.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2380//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2380//console

This message is automatically generated.

> Slow or failing DelegationToken renewals on submission itself make RM 
> unavailable
> -
>
> Key: YARN-674
> URL: https://issues.apache.org/jira/browse/YARN-674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
> YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch, YARN-674.6.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look 
> like RM is unavailable as it may run out of RPC handlers due to blocked 
> client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-954) [YARN-321] History Service should create the webUI and wire it to HistoryStorage

2013-11-05 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814591#comment-13814591
 ] 

Devaraj K commented on YARN-954:


Thanks for reminding Mayank, I will update the patch with the changes. Thanks...

> [YARN-321] History Service should create the webUI and wire it to 
> HistoryStorage
> 
>
> Key: YARN-954
> URL: https://issues.apache.org/jira/browse/YARN-954
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Devaraj K
> Attachments: YARN-954-3.patch, YARN-954-v0.patch, YARN-954-v1.patch, 
> YARN-954-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-954) [YARN-321] History Service should create the webUI and wire it to HistoryStorage

2013-11-05 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814590#comment-13814590
 ] 

Devaraj K commented on YARN-954:


Thanks for reminding Mayank, I will update the patch with the changes. Thanks...

> [YARN-321] History Service should create the webUI and wire it to 
> HistoryStorage
> 
>
> Key: YARN-954
> URL: https://issues.apache.org/jira/browse/YARN-954
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Devaraj K
> Attachments: YARN-954-3.patch, YARN-954-v0.patch, YARN-954-v1.patch, 
> YARN-954-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-11-05 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814596#comment-13814596
 ] 

Karthik Kambatla commented on YARN-1222:


bq. Thinking aloud, using HAServiceTarget in RMStateStore to 
transitionToStandby() may not be the right solution. 
My bad. I should  have explained the choice. Post YARN-1318, I think 
RMStateStore constructor should take RMContext. Then, we should be able to 
replace the RPC approach with rmContext.getHAService.transitionToStandby().

bq. The store should inform the higher entity about the fenced state and not 
take action on the higher entity by fencing it. 
I think it is a trade-off between pushing higher-level concepts like HA down 
versus spreading the logic of handling the FencedException across multiple 
entities. If we push it all the way down to the store implementation 
(ZKRMStateStore), we can get away with handling at one location. The other 
extreme would be to handle it at every location where a store operation is 
triggered. I think handling it in RMStateStore and not an implementation is a 
good compromise. A completely different approach might to be keep 
{{handleStoreFencedException()}} in {{ResourceManager}} and the store 
implementation to call it when it realizes it got fenced. Thoughts?






> Make improvements in ZKRMStateStore for fencing
> ---
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch, yarn-1222-5.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-987) Adding History Service to use Store and converting Historydata to Report

2013-11-05 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814605#comment-13814605
 ] 

Vinod Kumar Vavilapalli commented on YARN-987:
--

Quickly scanned through the patch, comments:
 - reduce the scope of methods like getLastAttempt, they don't need to be 
public.
 - ApplicationHistoryContext -> ApplicationHistoryManager and 
ApplicationHistory -> ApplicationHistoryManagerImpl. They aren't just context 
objects.

> Adding History Service to use Store and converting Historydata to Report
> 
>
> Key: YARN-987
> URL: https://issues.apache.org/jira/browse/YARN-987
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-987-1.patch, YARN-987-2.patch, YARN-987-3.patch, 
> YARN-987-4.patch, YARN-987-5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1266) Adding ApplicationHistoryProtocolPBService

2013-11-05 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814618#comment-13814618
 ] 

Vinod Kumar Vavilapalli commented on YARN-1266:
---

This is not enough, you need a client wrapper too.

I think we should just bite the bullet and remove ApplicationHistoryProtocol 
completely. We can merge the new APIs into ApplicationClientProtocol and take 
care of ResourceManager implementation of those APIs in a separate JIRA.

> Adding ApplicationHistoryProtocolPBService
> --
>
> Key: YARN-1266
> URL: https://issues.apache.org/jira/browse/YARN-1266
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-1266-1.patch, YARN-1266-2.patch
>
>
> Adding ApplicationHistoryProtocolPBService to make web apps to work and 
> changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol

2013-11-05 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814622#comment-13814622
 ] 

Zhijie Shen commented on YARN-955:
--

1. Not necessary. The default value can be read from yarn-default.xml. The 
problem is that you can not specify the prefix variables like that in the xml 
file. This default URI will be a relative path based on the current directory.
{code}
+  public static final String DEFAULT_FS_HISTORY_STORE_URI = "tmp";
{code}

2. Maybe just call it AHS_ADDRESS
{code}
+  public static final String AHS_HISTORY_ADDRESS = AHS_PREFIX + "address";
{code}

3. The nested class is not necessary. ApplicationHistoryClientService can 
implement ApplicationHistoryProtocol directly.
{code}
+  private class ApplicationHSClientProtocolHandler implements
+  ApplicationHistoryProtocol {
{code}

4. Not necessary wrap-up. Please place the simple statement directly in the 
callers. Same for getApplications.
{code}
+public List getApplicationAttempts(
+ApplicationId appId) throws IOException {
+  List appAttemptReports = new 
ArrayList(
+  history.getApplicationAttempts(appId).values());
+  return appAttemptReports;
+}
{code}

5. Personally, I think returning empty collections is fine to indicate no 
results. Otherwise, the caller needs always to check not null first.
{code}
+  } else {
+response.setApplicationList(null);
+  }
{code}

6. Why do you want two references pointing to the same object?
{code}
+historyService = createApplicationHistory();
+historyContext = (ApplicationHistoryContext) historyService;
{code}

7. In the original design, we said we're going to make AHS a service of RM, 
though it should be independent enough. In this patch, I can see AHS is going 
to be an completely independent process. So far, it should be OK, because AHS 
needs nothing from RM. However, I'm expecting some more security work to do if 
AHS is separate process, as AHS and RM will not share the common context, and 
may be launched by different users. [~vinodkv], do you have any opinion about 
service or process?

Anyway, if we decide to make AHS a process now, this patch should also include 
the shell script to launch AHS.

> [YARN-321] Implementation of ApplicationHistoryProtocol
> ---
>
> Key: YARN-955
> URL: https://issues.apache.org/jira/browse/YARN-955
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Mayank Bansal
> Attachments: YARN-955-1.patch, YARN-955-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1266) Adding ApplicationHistoryProtocolPBService

2013-11-05 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814639#comment-13814639
 ] 

Zhijie Shen commented on YARN-1266:
---

bq. This is not enough, you need a client wrapper too.

I think [~mayank_bansal] has it in another Jira. It seems that the work of 
ApplicationHistoryProtocol has been split into the following Jiras: YARN-979, 
YARN-1266, YARN-955 and YARN-967, and the client wrapper is in YARN-967. 
[~mayank_bansal], please correct me if I'm wrong.

bq. I think we should just bite the bullet and remove 
ApplicationHistoryProtocol completely. We can merge the new APIs into 
ApplicationClientProtocol and take care of ResourceManager implementation of 
those APIs in a separate JIRA.

Do you mean that we have a single RPC interface, and server-side implementation 
will redirect the query of completed applications/attempts/containers to AHS, 
right?

If so, I think it makes sense, and probably simplifies the problem. However, I 
still have one concern about  the independency of AHS. Let's say if we want AHS 
to be a separate process like JHS in the future (or maybe now, see my comments 
in YARN-955), when RM is stopped, AHS can not be accessed via RPC interface. 

> Adding ApplicationHistoryProtocolPBService
> --
>
> Key: YARN-1266
> URL: https://issues.apache.org/jira/browse/YARN-1266
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-1266-1.patch, YARN-1266-2.patch
>
>
> Adding ApplicationHistoryProtocolPBService to make web apps to work and 
> changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation

2013-11-05 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814653#comment-13814653
 ] 

Vinod Kumar Vavilapalli commented on YARN-978:
--

Tx for the reviews Zhijie. Also Xuan for the earlier patches.

> [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
> --
>
> Key: YARN-978
> URL: https://issues.apache.org/jira/browse/YARN-978
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Fix For: YARN-321
>
> Attachments: YARN-978-1.patch, YARN-978.10.patch, YARN-978.2.patch, 
> YARN-978.3.patch, YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, 
> YARN-978.7.patch, YARN-978.8.patch, YARN-978.9.patch
>
>
> We dont have ApplicationAttemptReport and Protobuf implementation.
> Adding that.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1307) Rethink znode structure for RM HA

2013-11-05 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1307:
-

Attachment: YARN-1307.4-2.patch

> Rethink znode structure for RM HA
> -
>
> Key: YARN-1307
> URL: https://issues.apache.org/jira/browse/YARN-1307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1307.1.patch, YARN-1307.2.patch, YARN-1307.3.patch, 
> YARN-1307.4-2.patch, YARN-1307.4.patch
>
>
> Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
> YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
> YARN-1222:
> {quote}
> We should move to creating a node hierarchy for apps such that all znodes for 
> an app are stored under an app znode instead of the app root znode. This will 
> help in removeApplication and also in scaling better on ZK. The earlier code 
> was written this way to ensure create/delete happens under a root znode for 
> fencing. But given that we have moved to multi-operations globally, this isnt 
> required anymore.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

87 matches

Mail list logo