[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253237#comment-14253237
 ] 

Hudson commented on YARN-2949:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/46/])
YARN-2949. Add documentation for CGroups. (Contributed by Varun Vasudev) 
(junping_du: rev 389f881d423c1f7c2bb90ff521e59eb8c7d26214)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerCgroups.apt.vm
* hadoop-yarn-project/CHANGES.txt
* hadoop-project/src/site/site.xml


 Add documentation for CGroups
 -

 Key: YARN-2949
 URL: https://issues.apache.org/jira/browse/YARN-2949
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
 apache-yarn-2949.1.patch


 A bunch of changes have gone into the NodeManager to allow greater use of 
 CGroups. It would be good to have a single page that documents how to setup 
 CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253229#comment-14253229
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/46/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie). 
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253246#comment-14253246
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #780 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/780/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie). 
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253254#comment-14253254
 ] 

Hudson commented on YARN-2949:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #780 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/780/])
YARN-2949. Add documentation for CGroups. (Contributed by Varun Vasudev) 
(junping_du: rev 389f881d423c1f7c2bb90ff521e59eb8c7d26214)
* hadoop-yarn-project/CHANGES.txt
* hadoop-project/src/site/site.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerCgroups.apt.vm


 Add documentation for CGroups
 -

 Key: YARN-2949
 URL: https://issues.apache.org/jira/browse/YARN-2949
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
 apache-yarn-2949.1.patch


 A bunch of changes have gone into the NodeManager to allow greater use of 
 CGroups. It would be good to have a single page that documents how to setup 
 CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253440#comment-14253440
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1978 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1978/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie). 
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253449#comment-14253449
 ] 

Hudson commented on YARN-2949:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1978 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1978/])
YARN-2949. Add documentation for CGroups. (Contributed by Varun Vasudev) 
(junping_du: rev 389f881d423c1f7c2bb90ff521e59eb8c7d26214)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerCgroups.apt.vm
* hadoop-project/src/site/site.xml
* hadoop-yarn-project/CHANGES.txt


 Add documentation for CGroups
 -

 Key: YARN-2949
 URL: https://issues.apache.org/jira/browse/YARN-2949
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
 apache-yarn-2949.1.patch


 A bunch of changes have gone into the NodeManager to allow greater use of 
 CGroups. It would be good to have a single page that documents how to setup 
 CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253463#comment-14253463
 ] 

Hudson commented on YARN-2949:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #43 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/43/])
YARN-2949. Add documentation for CGroups. (Contributed by Varun Vasudev) 
(junping_du: rev 389f881d423c1f7c2bb90ff521e59eb8c7d26214)
* hadoop-project/src/site/site.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerCgroups.apt.vm


 Add documentation for CGroups
 -

 Key: YARN-2949
 URL: https://issues.apache.org/jira/browse/YARN-2949
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
 apache-yarn-2949.1.patch


 A bunch of changes have gone into the NodeManager to allow greater use of 
 CGroups. It would be good to have a single page that documents how to setup 
 CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253455#comment-14253455
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #43 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/43/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie). 
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253502#comment-14253502
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #47 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/47/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie). 
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253510#comment-14253510
 ] 

Hudson commented on YARN-2949:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #47 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/47/])
YARN-2949. Add documentation for CGroups. (Contributed by Varun Vasudev) 
(junping_du: rev 389f881d423c1f7c2bb90ff521e59eb8c7d26214)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerCgroups.apt.vm
* hadoop-project/src/site/site.xml


 Add documentation for CGroups
 -

 Key: YARN-2949
 URL: https://issues.apache.org/jira/browse/YARN-2949
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
 apache-yarn-2949.1.patch


 A bunch of changes have gone into the NodeManager to allow greater use of 
 CGroups. It would be good to have a single page that documents how to setup 
 CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253531#comment-14253531
 ] 

Hudson commented on YARN-2949:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1997 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1997/])
YARN-2949. Add documentation for CGroups. (Contributed by Varun Vasudev) 
(junping_du: rev 389f881d423c1f7c2bb90ff521e59eb8c7d26214)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerCgroups.apt.vm
* hadoop-project/src/site/site.xml
* hadoop-yarn-project/CHANGES.txt


 Add documentation for CGroups
 -

 Key: YARN-2949
 URL: https://issues.apache.org/jira/browse/YARN-2949
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
 apache-yarn-2949.1.patch


 A bunch of changes have gone into the NodeManager to allow greater use of 
 CGroups. It would be good to have a single page that documents how to setup 
 CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253523#comment-14253523
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1997 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1997/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie). 
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore

2014-12-19 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2946:
-
Attachment: 0001-YARN-2946.patch

 DeadLocks in RMStateStore-ZKRMStateStore
 --

 Key: YARN-2946
 URL: https://issues.apache.org/jira/browse/YARN-2946
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 
 0002-YARN-2946.patch, RM_BeforeFix_Deadlock_cycle_1.png, 
 RM_BeforeFix_Deadlock_cycle_2.png, TestYARN2946.java


 Found one deadlock in ZKRMStateStore.
 # Initial stage zkClient is null because of zk disconnected event.
 # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
 re establish zookeeper connection either via synconnected or expired event, 
 it is highly possible that any other thred can obtain lock on 
 {{ZKRMStateStore.this}} from state machine transition events. This cause 
 Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore

2014-12-19 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253688#comment-14253688
 ] 

Rohith commented on YARN-2946:
--

I updated the patch with following fix
# All the token storage handled synchronously via state machine.
# Removed unnecessary synchronization from the method. This ensures 1st point

For the test, deployed in cluster by integrating with JCarder. Executed same 
scenario as per my earlier comment for checking any deadlock cycles. JCarder 
has not identified any deadlock cycles.

Kindly review the patch

 DeadLocks in RMStateStore-ZKRMStateStore
 --

 Key: YARN-2946
 URL: https://issues.apache.org/jira/browse/YARN-2946
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 
 0002-YARN-2946.patch, RM_BeforeFix_Deadlock_cycle_1.png, 
 RM_BeforeFix_Deadlock_cycle_2.png, TestYARN2946.java


 Found one deadlock in ZKRMStateStore.
 # Initial stage zkClient is null because of zk disconnected event.
 # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
 re establish zookeeper connection either via synconnected or expired event, 
 it is highly possible that any other thred can obtain lock on 
 {{ZKRMStateStore.this}} from state machine transition events. This cause 
 Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2877) Extend YARN to support distributed scheduling

2014-12-19 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-2877:

Assignee: Konstantinos Karanasos

 Extend YARN to support distributed scheduling
 -

 Key: YARN-2877
 URL: https://issues.apache.org/jira/browse/YARN-2877
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Sriram Rao
Assignee: Konstantinos Karanasos

 This is an umbrella JIRA that proposes to extend YARN to support distributed 
 scheduling.  Briefly, some of the motivations for distributed scheduling are 
 the following:
 1. Improve cluster utilization by opportunistically executing tasks otherwise 
 idle resources on individual machines.
 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
 (i.e., task execution time is much less compared to the time required for 
 obtaining a container from the RM).
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-12-19 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253753#comment-14253753
 ] 

Chen He commented on YARN-1680:
---

Any update on this issue? I have some free cycles recently.

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Craig Welch
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-12-19 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253765#comment-14253765
 ] 

Craig Welch commented on YARN-1680:
---

Go for it :-) I thought I was free to work it, and as soon as we switched the 
assignment I got too busy with other things.  

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Craig Welch
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-12-19 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1680:
--
Assignee: Chen He  (was: Craig Welch)

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-12-19 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253773#comment-14253773
 ] 

Chen He commented on YARN-1680:
---

Thanks, [~cwelch].

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore

2014-12-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253793#comment-14253793
 ] 

Hadoop QA commented on YARN-2946:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688362/0001-YARN-2946.patch
  against trunk revision 6635ccd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6156//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6156//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6156//console

This message is automatically generated.

 DeadLocks in RMStateStore-ZKRMStateStore
 --

 Key: YARN-2946
 URL: https://issues.apache.org/jira/browse/YARN-2946
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 
 0002-YARN-2946.patch, RM_BeforeFix_Deadlock_cycle_1.png, 
 RM_BeforeFix_Deadlock_cycle_2.png, TestYARN2946.java


 Found one deadlock in ZKRMStateStore.
 # Initial stage zkClient is null because of zk disconnected event.
 # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
 re establish zookeeper connection either via synconnected or expired event, 
 it is highly possible that any other thred can obtain lock on 
 {{ZKRMStateStore.this}} from state machine transition events. This cause 
 Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-19 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2975:
---
Attachment: yarn-2975-2.patch

Updated patch to preserve behavior of FSLeafQueue#removeApp and add 
FSLeafQueue#removeNonRunnableApp separately. 

 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2975-1.patch, yarn-2975-2.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254086#comment-14254086
 ] 

Hadoop QA commented on YARN-2975:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688412/yarn-2975-2.patch
  against trunk revision d9e4d67.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6157//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6157//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6157//console

This message is automatically generated.

 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2975-1.patch, yarn-2975-2.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254095#comment-14254095
 ] 

Jian He commented on YARN-2964:
---

bq. do you think this is something we can/should fix in YARN?
I think so. RM is the designated renewer so it should renew the token every so 
often. But because there's a bug in DelegationTokenRenewer, RM just forgets the 
token and won't renew the token automatically. So we should fix this in 
DelegationTokenRenewer to keep track of the token and renew the token properly.

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler

2014-12-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254110#comment-14254110
 ] 

Karthik Kambatla commented on YARN-2738:


Thanks Carlo, makes sense.

Sorry for the delay in getting to this. The latest patch looks pretty good, 
except for one nit: spurious change in the following snippet. I can take care 
of it at commit time. 
{code}
String text = ((Text) field.getFirstChild()).getData();
{code}

However, I have some comments that might require some follow-up work:
# Should we have a default implementation of {{getAverageCapacity}} etc. in 
ReservationSchedulerConfiguration, and not require separate implementations in 
CS and FS.
# Would it make sense to have a common ReservationQueueConfiguration for both 
CS and FS? 

 Add FairReservationSystem for FairScheduler
 ---

 Key: YARN-2738
 URL: https://issues.apache.org/jira/browse/YARN-2738
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2738.001.patch, YARN-2738.002.patch, 
 YARN-2738.003.patch, YARN-2738.004.patch


 Need to create a FairReservationSystem that will implement ReservationSystem 
 for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2574) Add support for FairScheduler to the ReservationSystem

2014-12-19 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2574:
---
Issue Type: New Feature  (was: Improvement)

 Add support for FairScheduler to the ReservationSystem
 --

 Key: YARN-2574
 URL: https://issues.apache.org/jira/browse/YARN-2574
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot

 YARN-1051 introduces the ReservationSystem and the current implementation is 
 based on CapacityScheduler. This JIRA proposes adding support for 
 FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2852) WebUI Metrics: Add disk I/O resource information to the web ui and metrics

2014-12-19 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2852:
-
Labels: metrics supportability  (was: )

 WebUI  Metrics: Add disk I/O resource information to the web ui and metrics
 

 Key: YARN-2852
 URL: https://issues.apache.org/jira/browse/YARN-2852
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan
  Labels: metrics, supportability
 Attachments: YARN-2852-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.

2014-12-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254178#comment-14254178
 ] 

Karthik Kambatla commented on YARN-2675:


Given we split up all the cases of ContainerDoneTransition, do we still need 
it? 

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch, 
 YARN-2675.005.patch, YARN-2675.006.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2014-12-19 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2423:

Attachment: YARN-2423.005.patch

005 patch fixes the test failure.  A previous test was leaking UGI settings.

[~zjshen], can you take a look at the latest patch?

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
 Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
 YARN-2423.patch, YARN-2423.patch, YARN-2423.patch


 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

2014-12-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254201#comment-14254201
 ] 

Karthik Kambatla commented on YARN-2655:


[~ywskycn] - the patch doesn't apply anymore. Mind updating it? 

 AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
 --

 Key: YARN-2655
 URL: https://issues.apache.org/jira/browse/YARN-2655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Nishan Shetty
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2655-1.patch, screenshot-1.png, screenshot-2.png


 AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
 Screenshot attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

2014-12-19 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254202#comment-14254202
 ] 

Wei Yan commented on YARN-2655:
---

[~kasha], sure, will do it soon.

 AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
 --

 Key: YARN-2655
 URL: https://issues.apache.org/jira/browse/YARN-2655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Nishan Shetty
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2655-1.patch, screenshot-1.png, screenshot-2.png


 AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
 Screenshot attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2982) Use ReservationQueueConfiguration in CapacityScheduler

2014-12-19 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-2982:
---

 Summary: Use ReservationQueueConfiguration in CapacityScheduler
 Key: YARN-2982
 URL: https://issues.apache.org/jira/browse/YARN-2982
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Anubhav Dhoot


ReservationQueueConfiguration is common to reservation irrespective of 
Scheduler. It would be good to have CapacityScheduler also  support this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2982) Use ReservationQueueConfiguration in CapacityScheduler

2014-12-19 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2982:

Parent Issue: YARN-2574  (was: YARN-2572)

 Use ReservationQueueConfiguration in CapacityScheduler
 --

 Key: YARN-2982
 URL: https://issues.apache.org/jira/browse/YARN-2982
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Anubhav Dhoot

 ReservationQueueConfiguration is common to reservation irrespective of 
 Scheduler. It would be good to have CapacityScheduler also  support this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler

2014-12-19 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254212#comment-14254212
 ] 

Anubhav Dhoot commented on YARN-2738:
-

Re 1. This is a configuration point which will need to be implemented based on 
each CS or FS configuration mechanism
Re 2. Added YARN-2982 

Thanks for the review [~kasha]

 Add FairReservationSystem for FairScheduler
 ---

 Key: YARN-2738
 URL: https://issues.apache.org/jira/browse/YARN-2738
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2738.001.patch, YARN-2738.002.patch, 
 YARN-2738.003.patch, YARN-2738.004.patch


 Need to create a FairReservationSystem that will implement ReservationSystem 
 for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.

2014-12-19 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254216#comment-14254216
 ] 

zhihai xu commented on YARN-2675:
-

Although we don't use it in the state machine directly, it is the base class of 
all other added classes. So we still need it.

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch, 
 YARN-2675.005.patch, YARN-2675.006.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-19 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254223#comment-14254223
 ] 

Anubhav Dhoot commented on YARN-2975:
-

Minor comment:
The following comment might be misleading. One may assume this means the app 
will be removed regardless and the boolean return is only to indicate whether 
it happened to be nonRunnable
{noformat}
  /**
   * @return true if the app was non-runnable, false otherwise
   */
 public boolean removeNonRunnableApp(FSAppAttempt app) {
{noformat}

LGTM otherwise

 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2975-1.patch, yarn-2975-2.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

2014-12-19 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254225#comment-14254225
 ] 

Wei Yan commented on YARN-2655:
---

Problem already solved in YARN-1156. Closing it.

 AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
 --

 Key: YARN-2655
 URL: https://issues.apache.org/jira/browse/YARN-2655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Nishan Shetty
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2655-1.patch, screenshot-1.png, screenshot-2.png


 AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
 Screenshot attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler

2014-12-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254249#comment-14254249
 ] 

Karthik Kambatla commented on YARN-2738:


+1. Checking this in. 

 Add FairReservationSystem for FairScheduler
 ---

 Key: YARN-2738
 URL: https://issues.apache.org/jira/browse/YARN-2738
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2738.001.patch, YARN-2738.002.patch, 
 YARN-2738.003.patch, YARN-2738.004.patch


 Need to create a FairReservationSystem that will implement ReservationSystem 
 for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()

2014-12-19 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254264#comment-14254264
 ] 

Hitesh Shah commented on YARN-868:
--

[~vinodkv] Mind taking a look? 

 YarnClient should set the service address in tokens returned by 
 getRMDelegationToken()
 --

 Key: YARN-868
 URL: https://issues.apache.org/jira/browse/YARN-868
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Varun Saxena
 Attachments: YARN-868.patch


 Either the client should set this information into the token or the client 
 layer should expose an api that returns the service address.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2574) Add support for FairScheduler to the ReservationSystem

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254261#comment-14254261
 ] 

Hudson commented on YARN-2574:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6762 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6762/])
YARN-2738. [YARN-2574] Add FairReservationSystem for FairScheduler. (Anubhav 
Dhoot via kasha) (kasha: rev a22ffc318801698e86cd0e316b4824015f2486ac)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java


 Add support for FairScheduler to the ReservationSystem
 --

 Key: YARN-2574
 URL: https://issues.apache.org/jira/browse/YARN-2574
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot

 YARN-1051 introduces the ReservationSystem and the current implementation is 
 based on CapacityScheduler. This JIRA proposes adding support for 
 FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254263#comment-14254263
 ] 

Hudson commented on YARN-2738:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6762 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6762/])
YARN-2738. [YARN-2574] Add FairReservationSystem for FairScheduler. (Anubhav 
Dhoot via kasha) (kasha: rev a22ffc318801698e86cd0e316b4824015f2486ac)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java


 Add FairReservationSystem for FairScheduler
 ---

 Key: YARN-2738
 URL: https://issues.apache.org/jira/browse/YARN-2738
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2738.001.patch, YARN-2738.002.patch, 
 YARN-2738.003.patch, YARN-2738.004.patch


 Need to create a FairReservationSystem that will implement ReservationSystem 
 for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2014-12-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254275#comment-14254275
 ] 

Hadoop QA commented on YARN-2423:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688447/YARN-2423.005.patch
  against trunk revision 6f1e366.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 36 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice:

  org.apache.hadoop.yarn.client.api.impl.TestTimelineClient

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6158//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6158//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6158//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6158//console

This message is automatically generated.

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
 Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
 YARN-2423.patch, YARN-2423.patch, YARN-2423.patch


 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-19 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254283#comment-14254283
 ] 

Robert Kanter commented on YARN-2975:
-

+1 after clarifying the comment that Anubhav pointed out

 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2975-1.patch, yarn-2975-2.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-19 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2975:
---
Attachment: yarn-2975-3.patch

Thanks Anubhav. Updated the comment to be clearer.

The test failures and findbugs warnings look unrelated. 

 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254296#comment-14254296
 ] 

Karthik Kambatla commented on YARN-2975:


Thanks for the review, Robert. I ll go ahead and commit this, if Jenkins 
doesn't complain of any new issues. 

 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.

2014-12-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254302#comment-14254302
 ] 

Karthik Kambatla commented on YARN-2675:


bq. it is the base class of all other added classes
Never mind, I am not the brightest today. Forgot the child classes call 
super.transition. 

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch, 
 YARN-2675.005.patch, YARN-2675.006.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2675) containersKilled metrics is not updated when the container is killed during localization

2014-12-19 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2675:
---
Summary: containersKilled metrics is not updated when the container is 
killed during localization  (was: the containersKilled metrics is not updated 
when the container is killed during localization.)

 containersKilled metrics is not updated when the container is killed during 
 localization
 

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch, 
 YARN-2675.005.patch, YARN-2675.006.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.

2014-12-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254307#comment-14254307
 ] 

Karthik Kambatla commented on YARN-2675:


The latest patch looks good, the findbugs warnings look unrelated.

+1. Checking this in. 

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch, 
 YARN-2675.005.patch, YARN-2675.006.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2675) containersKilled metrics is not updated when the container is killed during localization

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254332#comment-14254332
 ] 

Hudson commented on YARN-2675:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6764 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6764/])
YARN-2675. containersKilled metrics is not updated when the container is killed 
during localization. (Zhihai Xu via kasha) (kasha: rev 
954fb8581ec6d7d389ac5d6f94061760a29bc309)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java


 containersKilled metrics is not updated when the container is killed during 
 localization
 

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch, 
 YARN-2675.005.patch, YARN-2675.006.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()

2014-12-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254352#comment-14254352
 ] 

Hadoop QA commented on YARN-868:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661447/YARN-868.patch
  against trunk revision 390a7c1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 35 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6161//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6161//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6161//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6161//console

This message is automatically generated.

 YarnClient should set the service address in tokens returned by 
 getRMDelegationToken()
 --

 Key: YARN-868
 URL: https://issues.apache.org/jira/browse/YARN-868
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Varun Saxena
 Attachments: YARN-868.patch


 Either the client should set this information into the token or the client 
 layer should expose an api that returns the service address.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore

2014-12-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254376#comment-14254376
 ] 

Jian He commented on YARN-2946:
---

[~rohithsharma], I had a quick look at the patch. one comment is:
In each store/update method, instead of doing this:
{code}
  if (isFencedState()) {
LOG.info(State store is in Fenced state. Can't remove RM Delegation 
+ Token Master key.);
return;
  }
  this.stateMachine.doTransition(RMStateStoreEventType.UPDATE_AMRM_TOKEN,
  new RMStateStoreAMRMTokenEvent(amrmTokenSecretManagerState, isUpdate,
  RMStateStoreEventType.UPDATE_AMRM_TOKEN));
{code}
we can do this 
{code}
handleStoreEvent(RMStateStoreEvent event)
{code}

 DeadLocks in RMStateStore-ZKRMStateStore
 --

 Key: YARN-2946
 URL: https://issues.apache.org/jira/browse/YARN-2946
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 
 0002-YARN-2946.patch, RM_BeforeFix_Deadlock_cycle_1.png, 
 RM_BeforeFix_Deadlock_cycle_2.png, TestYARN2946.java


 Found one deadlock in ZKRMStateStore.
 # Initial stage zkClient is null because of zk disconnected event.
 # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
 re establish zookeeper connection either via synconnected or expired event, 
 it is highly possible that any other thred can obtain lock on 
 {{ZKRMStateStore.this}} from state machine transition events. This cause 
 Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

2014-12-19 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254382#comment-14254382
 ] 

Ming Ma commented on YARN-914:
--

[~djp], thanks for working on this.

It looks like we are going to use YARN-291 and thus the drain the state 
approach, instead of the more complicated migrate the state approach. So YARN 
will reduce the capacity of the nodes as part of the decomission process until 
all its map output are fetched or until all the applications the node touches 
have completed? In addition, it will be interesting to understand how you 
handle long running jobs.

FYI, https://issues.apache.org/jira/browse/YARN-1996 will drain containers of 
unhealthy nodes.


 Support graceful decommission of nodemanager
 

 Key: YARN-914
 URL: https://issues.apache.org/jira/browse/YARN-914
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du

 When NMs are decommissioned for non-fault reasons (capacity change etc.), 
 it's desirable to minimize the impact to running applications.
 Currently if a NM is decommissioned, all running containers on the NM need to 
 be rescheduled on other NMs. Further more, for finished map tasks, if their 
 map output are not fetched by the reducers of the job, these map tasks will 
 need to be rerun as well.
 We propose to introduce a mechanism to optionally gracefully decommission a 
 node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2952) Incorrect version check in RMStateStore

2014-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254395#comment-14254395
 ] 

Hudson commented on YARN-2952:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6765 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6765/])
YARN-2952. Fixed incorrect version check in StateStore. Contributed by Rohith 
Sharmaks (jianhe: rev 808cba3821d5bc4267f69d14220757f01cd55715)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* hadoop-yarn-project/CHANGES.txt


 Incorrect version check in RMStateStore
 ---

 Key: YARN-2952
 URL: https://issues.apache.org/jira/browse/YARN-2952
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-2952.patch


 In RMStateStore#checkVersion:  if we modify  tCURRENT_VERSION_INFO to 2.0, 
 it'll still store the version as 1.0 which is incorrect; The same thing might 
 happen to NM store, timeline store.
 {code}
 // if there is no version info, treat it as 1.0;
 if (loadedVersion == null) {
   loadedVersion = Version.newInstance(1, 0);
 }
 if (loadedVersion.isCompatibleTo(getCurrentVersion())) {
   LOG.info(Storing RM state version info  + getCurrentVersion());
   storeVersion();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)