[jira] [Assigned] (YARN-381) Improve FS docs

2013-03-13 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza reassigned YARN-381:
---

Assignee: Sandy Ryza

 Improve FS docs
 ---

 Key: YARN-381
 URL: https://issues.apache.org/jira/browse/YARN-381
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Sandy Ryza
Priority: Minor

 The MR2 FS docs could use some improvements.
 Configuration:
 - sizebasedweight - what is the size here? Total memory usage?
 Pool properties:
 - minResources - what does min amount of aggregate memory mean given that 
 this is not a reservation?
 - maxResources - is this a hard limit?
 - weight: How is this  ratio configured?  Eg base is 1 and all weights are 
 relative to that?
 - schedulingMode - what is the default? Is fifo pure fifo, eg waits until all 
 tasks for the job are finished before launching the next job?
 There's no mention of ACLs, even though they're supported. See the CS docs 
 for comparison.
 Also there are a couple typos worth fixing while we're at it, eg finish. 
 apps to run
 Worth keeping in mind that some of these will need to be updated to reflect 
 that resource calculators are now pluggable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-381) Improve FS docs

2013-03-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600981#comment-13600981
 ] 

Sandy Ryza commented on YARN-381:
-

It's in megabytes.  Patch coming soon will include this.

 Improve FS docs
 ---

 Key: YARN-381
 URL: https://issues.apache.org/jira/browse/YARN-381
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Sandy Ryza
Priority: Minor

 The MR2 FS docs could use some improvements.
 Configuration:
 - sizebasedweight - what is the size here? Total memory usage?
 Pool properties:
 - minResources - what does min amount of aggregate memory mean given that 
 this is not a reservation?
 - maxResources - is this a hard limit?
 - weight: How is this  ratio configured?  Eg base is 1 and all weights are 
 relative to that?
 - schedulingMode - what is the default? Is fifo pure fifo, eg waits until all 
 tasks for the job are finished before launching the next job?
 There's no mention of ACLs, even though they're supported. See the CS docs 
 for comparison.
 Also there are a couple typos worth fixing while we're at it, eg finish. 
 apps to run
 Worth keeping in mind that some of these will need to be updated to reflect 
 that resource calculators are now pluggable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601029#comment-13601029
 ] 

Hudson commented on YARN-198:
-

Integrated in Hadoop-Yarn-trunk #154 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/154/])
YARN-198. Added a link to RM pages from the NodeManager web app. 
Contributed by Jian He. (Revision 1455800)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1455800
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java


 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Fix For: 2.0.5-beta

 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601102#comment-13601102
 ] 

Hudson commented on YARN-198:
-

Integrated in Hadoop-Hdfs-trunk #1343 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1343/])
YARN-198. Added a link to RM pages from the NodeManager web app. 
Contributed by Jian He. (Revision 1455800)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1455800
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java


 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Fix For: 2.0.5-beta

 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601161#comment-13601161
 ] 

Hudson commented on YARN-198:
-

Integrated in Hadoop-Mapreduce-trunk #1371 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1371/])
YARN-198. Added a link to RM pages from the NodeManager web app. 
Contributed by Jian He. (Revision 1455800)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1455800
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java


 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Fix For: 2.0.5-beta

 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601237#comment-13601237
 ] 

Robert Joseph Evans commented on YARN-378:
--

The patch looks good to me. The only problem I have is with how we are 
informing the AM of the maximum number of retires that it has.  This should 
work, but it is going to require a lot of changes to the MR AM to use it.  
Right now the number is used in the init of MRAppMaster, but we will not get 
that information until start() is called and we register with the RM.  I would 
much rather see a new environment variable added that can hold this 
information, because it makes MAPREDUCE-5062 much simpler.  But I am OK with 
the way it currently is.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-462) Project Parameter for Chargeback

2013-03-13 Thread Kendall Thrapp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601245#comment-13601245
 ] 

Kendall Thrapp commented on YARN-462:
-

Thanks for the questions and feedback.  Yes, first I should clarify what I 
intended by chargeback.  I'm looking to be able quantify cluster resource usage 
(memory, CPU, HDFS, etc.) for every application, and then roll that up to the 
project level.  This would allow us to accurately charge the customer (i.e. 
team/project) for their grid usage (either literally or just informatively).  I 
want to provide incentive for more efficient coding, as well as make it easier 
for teams to compare their resource usage across different software versions of 
their Hadoop applications, config parameter changes, etc.

I had originally hoped that hierarchical queues could serve this purpose as 
well, but have since run into several issues with this approach.  The first is 
that it doesn't scale for clusters with large numbers of projects.  I've seen 
large clusters shared between over a hundred different projects, each with 
their own teams of users.  If I recall correctly, queues can't be assigned less 
than 1% of the total capacity, so it wouldn't be possible to give each of these 
project their own queue.  Even if we could, I suspect this could result in too 
much overhead for the scheduler and too much fragmentation of the cluster 
resources, which could result in poorer overall utilization.

The second issue is that the project-per-queue approach conflicts with how I 
see users wanting to use our queues.  In many cases I see queues being used to 
distinguish application priorities, ensuring that high priority time-sensitive 
jobs get the resources they need to finish on time, while big but lower 
priority and less time-sensitive jobs are constrained by being in a smaller 
queue.  I'd expect a lot of pushback from our users for any chargeback-focused 
queue configuration that had a negative impact on job run times and meeting 
SLAs.  The idea of the project/chargeback parameter decouples the two.

 Project Parameter for Chargeback
 

 Key: YARN-462
 URL: https://issues.apache.org/jira/browse/YARN-462
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp

 Problem Summary
 For the purpose of chargeback and better understanding of grid usage, we need 
 to be able to associate applications with projects, e.g. pipeline X, 
 property Y.  This would allow us to aggregate on this property, thereby 
 helping us compute grid resource usage for the entire project.  Currently, 
 for a given application, two things we know about it are the user that 
 submitted it and the queue it was submitted to.  Below, I'll explain why 
 neither of these is adequate for enterprise-level chargeback and 
 understanding resource allocation needs.
 Why Not Users?
 Its not individual users that are paying the bill -- its projects.  When one 
 of our real users submits an application on a Hadoop grid, they're presumably 
 not usually doing it for themselves.  They're doing work for some project or 
 team effort, so its that team or project that should be charged for all its 
 users applications.  Maintaining outside lists of associations between users 
 and projects is error-prone because it is time-sensitive and requires 
 continued ongoing maintenance.  New users join organizations, users leave and 
 users even change projects.  Furthermore, users may split their time between 
 multiple projects, making it ambiguous as to which of a user's projects a 
 given application should be charged.  Also, there can be headless users, 
 which can be even more difficult to link to a project and can be shared 
 between teams or projects.
 Why Not Queues?
 The purpose of queues is for scheduling.  Overloading the queues concept to 
 also mean who should be charged for an application can have a detrimental 
 effect on the primary purpose of queues.  It could be manageable in the case 
 of a very small number of projects sharing a cluster, but doesn't scale to 
 tens or hundreds of projects sharing a cluster.  If a given cluster is shared 
 between 50 projects, creating 50 separate queues will result in inefficient 
 use of the cluster resources.  Furthermore, a given project may desire more 
 than one queue for different types or priorities of applications.  
 Proposed Solution
 Rather than relying on external tools to infer through the user and/or queue 
 who to charge for a given application, I propose a straightforward approach 
 where that information be explicitly supplied when the application is 
 submitted, just like we do with queues.  Let's use a charge card analogy: 
 when you buy something online, you don't 

[jira] [Commented] (YARN-379) yarn [node,application] command print logger info messages

2013-03-13 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601259#comment-13601259
 ] 

Thomas Graves commented on YARN-379:


I think the approach looks fine.  Did you see a way to just disable the logging 
for AbstractService for these calls rather then everything?

minor nit, can you change the name of COMMON_LOGGING_OPTS to be something more 
like YARN_CLI_NOLOG_OPTS and add a comment about what it is for.

 yarn [node,application] command print logger info messages
 --

 Key: YARN-379
 URL: https://issues.apache.org/jira/browse/YARN-379
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Abhishek Kapoor
  Labels: usability
 Attachments: YARN-379.patch


 Running the yarn node and yarn applications command results in annoying log 
 info messages being printed:
 $ yarn node -list
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Total Nodes:1
  Node-IdNode-State  Node-Http-Address   
 Health-Status(isNodeHealthy)Running-Containers
 foo:8041RUNNING  foo:8042   true  
  0
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.
 $ yarn application
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Invalid Command Usage : 
 usage: application
  -kill arg Kills the application.
  -list   Lists all the Applications from RM.
  -status arg   Prints the status of the application.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-13 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-460:
---

Priority: Blocker  (was: Critical)

 CS user left in list of active users for the queue even when application 
 finished
 -

 Key: YARN-460
 URL: https://issues.apache.org/jira/browse/YARN-460
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.7, 2.0.4-alpha
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Blocker
 Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, 
 YARN-460.patch, YARN-460.patch, YARN-460.patch


 We have seen a user get left in the queues list of active users even though 
 the application was removed. This can cause everyone else in the queue to get 
 less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-449) HBase test failures when running against Hadoop 2

2013-03-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601356#comment-13601356
 ] 

Ted Yu commented on YARN-449:
-

Here is OS for Hadoop QA machine:
Linux asf002.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
17:42:25 UTC 2011 x86_64 GNU/Linux

Here is OS for the machine where I ran unit test manually:
Linux ygridcore.net 2.6.32-220.23.1.el6.YAHOO.20120713.x86_64 #1 SMP Fri Jul 13 
11:40:51 CDT 2012 x86_64 x86_64 x86_64 GNU/Linux

 HBase test failures when running against Hadoop 2
 -

 Key: YARN-449
 URL: https://issues.apache.org/jira/browse/YARN-449
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Priority: Blocker
 Attachments: 7904-v5.txt, hbase-7904-v3.txt, 
 hbase-TestHFileOutputFormat-wip.txt, hbase-TestingUtility-wip.txt, 
 minimr_randomdir-branch2.txt


 Post YARN-429, unit tests for HBase continue to fail since the classpath for 
 the MRAppMaster is not being set correctly.
 Reverting YARN-129 may fix this, but I'm not sure that's the correct 
 solution. My guess is, as Alexandro pointed out in YARN-129, maven 
 classloader magic is messing up java.class.path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications

2013-03-13 Thread Kendall Thrapp (JIRA)
Kendall Thrapp created YARN-473:
---

 Summary: Capacity Scheduler webpage and REST API not showing 
correct number of pending applications
 Key: YARN-473
 URL: https://issues.apache.org/jira/browse/YARN-473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.6
Reporter: Kendall Thrapp


The Capacity Scheduler REST API 
(http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
 is not returning the correct number of pending applications.  
numPendingApplications is almost always zero, even if there are dozens of 
pending apps.

In investigating this, I discovered that the Resource Manager's Scheduler 
webpage is als showing an incorrect but different number of pending 
applications.  For example, the cluster I'm looking at right now currently has 
15 applications in the ACCEPTED state, but the Cluster Metrics table near the 
top of the page says there are only 2 pending apps.  The REST API says there 
are zero pending apps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications

2013-03-13 Thread Kendall Thrapp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kendall Thrapp updated YARN-473:


Description: 
The Capacity Scheduler REST API 
(http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
 is not returning the correct number of pending applications.  
numPendingApplications is almost always zero, even if there are dozens of 
pending apps.

In investigating this, I discovered that the Resource Manager's Scheduler 
webpage is also showing an incorrect but different number of pending 
applications.  For example, the cluster I'm looking at right now currently has 
15 applications in the ACCEPTED state, but the Cluster Metrics table near the 
top of the page says there are only 2 pending apps.  The REST API says there 
are zero pending apps.

  was:
The Capacity Scheduler REST API 
(http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
 is not returning the correct number of pending applications.  
numPendingApplications is almost always zero, even if there are dozens of 
pending apps.

In investigating this, I discovered that the Resource Manager's Scheduler 
webpage is als showing an incorrect but different number of pending 
applications.  For example, the cluster I'm looking at right now currently has 
15 applications in the ACCEPTED state, but the Cluster Metrics table near the 
top of the page says there are only 2 pending apps.  The REST API says there 
are zero pending apps.


 Capacity Scheduler webpage and REST API not showing correct number of pending 
 applications
 --

 Key: YARN-473
 URL: https://issues.apache.org/jira/browse/YARN-473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.6
Reporter: Kendall Thrapp

 The Capacity Scheduler REST API 
 (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
  is not returning the correct number of pending applications.  
 numPendingApplications is almost always zero, even if there are dozens of 
 pending apps.
 In investigating this, I discovered that the Resource Manager's Scheduler 
 webpage is also showing an incorrect but different number of pending 
 applications.  For example, the cluster I'm looking at right now currently 
 has 15 applications in the ACCEPTED state, but the Cluster Metrics table near 
 the top of the page says there are only 2 pending apps.  The REST API says 
 there are zero pending apps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-462) Project Parameter for Chargeback

2013-03-13 Thread Andy Rhee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601601#comment-13601601
 ] 

Andy Rhee commented on YARN-462:


Kendall - Again, another great idea!  Two things popped in my mind.  

1. I wonder if we need to also verify and enforce project validity on a given 
cluster mapped to a whitelist or blacklist in the cluster config (this might 
even be tied to external source of truth like LDAP later) or decouple or 
delegate validation to other parts or external process, e.g. queue, user, or 
project accounting. 
 
2. Another interesting spin off of your idea could be flexible enforceable 
parameters or meta config.  Instead of keep modifying the code every time we 
have a great idea for a new parameter to enforce, it may be more cost effective 
to allow admins to define enforceable parameters in the cluster config, so that 
we don't have to worry about what to name new parameter or changing it later, 
IMHO :)

 Project Parameter for Chargeback
 

 Key: YARN-462
 URL: https://issues.apache.org/jira/browse/YARN-462
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp

 Problem Summary
 For the purpose of chargeback and better understanding of grid usage, we need 
 to be able to associate applications with projects, e.g. pipeline X, 
 property Y.  This would allow us to aggregate on this property, thereby 
 helping us compute grid resource usage for the entire project.  Currently, 
 for a given application, two things we know about it are the user that 
 submitted it and the queue it was submitted to.  Below, I'll explain why 
 neither of these is adequate for enterprise-level chargeback and 
 understanding resource allocation needs.
 Why Not Users?
 Its not individual users that are paying the bill -- its projects.  When one 
 of our real users submits an application on a Hadoop grid, they're presumably 
 not usually doing it for themselves.  They're doing work for some project or 
 team effort, so its that team or project that should be charged for all its 
 users applications.  Maintaining outside lists of associations between users 
 and projects is error-prone because it is time-sensitive and requires 
 continued ongoing maintenance.  New users join organizations, users leave and 
 users even change projects.  Furthermore, users may split their time between 
 multiple projects, making it ambiguous as to which of a user's projects a 
 given application should be charged.  Also, there can be headless users, 
 which can be even more difficult to link to a project and can be shared 
 between teams or projects.
 Why Not Queues?
 The purpose of queues is for scheduling.  Overloading the queues concept to 
 also mean who should be charged for an application can have a detrimental 
 effect on the primary purpose of queues.  It could be manageable in the case 
 of a very small number of projects sharing a cluster, but doesn't scale to 
 tens or hundreds of projects sharing a cluster.  If a given cluster is shared 
 between 50 projects, creating 50 separate queues will result in inefficient 
 use of the cluster resources.  Furthermore, a given project may desire more 
 than one queue for different types or priorities of applications.  
 Proposed Solution
 Rather than relying on external tools to infer through the user and/or queue 
 who to charge for a given application, I propose a straightforward approach 
 where that information be explicitly supplied when the application is 
 submitted, just like we do with queues.  Let's use a charge card analogy: 
 when you buy something online, you don't just say who you are and how to ship 
 it, you also specify how you're paying for it.  Similarly, when submitting an 
 application in YARN, you could explicitly specify to whom it's resource usage 
 should be associated (a project, team, cost center, etc).
 This new configuration parameter should default to being optional, so that 
 organizations not interested in chargeback or project-level resource tracking 
 can happily continue on as if it wasn't there.  However, it should be 
 configurable at the cluster-level such that, a given cluster to could elect 
 to make it required, so that all applications would have an associated 
 project.  The value of this new parameter should be exposed via the Resource 
 Manager UI and Resource Manager REST API, so that users and tools can make 
 use of it for chargeback, utilization metrics, etc.
 I'm undecided on what to name the new parameter, as I like the flexibility in 
 the ways it could be used.  It is essentially just an additional party other 
 than user or queue that an application can be associated with, so its use is 
 not just limited to a chargeback 

[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601625#comment-13601625
 ] 

Bikas Saha commented on YARN-378:
-

+1 for Vinods comments. Also, personally, I would break down the following code 
in 2 places. First in some init method that reads the global value from config, 
checks for errors and sets a sensible default global value. Once that is done, 
use the appValue and globalValue to set the actual value. The current code is 
making me think more than I need to IMO.
{code}
+int numRMAMRetries = conf.getInt(YarnConfiguration.RM_AM_MAX_RETRIES, 
 YarnConfiguration.DEFAULT_RM_AM_MAX_RETRIES);
+int numAPPAMRetries = submissionContext.getNumMaxRetries();
+if (numAPPAMRetries = 0) {
+  if (numRMAMRetries = 0) {
+// AM needs to try once at least
+this.maxRetries = 1;
+LOG.error(AM Retries is wrongly configured. The specific AM Retries: 
++ numAPPAMRetries +  for application: 
++ applicationId.getId() + , the global AM Retries: 
++ numRMAMRetries);
+  } else {
+this.maxRetries = numRMAMRetries;
+  }
+} else {
+  if (numAPPAMRetries = numRMAMRetries) {
+this.maxRetries = numAPPAMRetries;
+  } else {
+this.maxRetries = numRMAMRetries;
+LOG.warn(The specific AM Retries:  + numAPPAMRetries
++  for application:  + applicationId.getId()
++  is larger than the global AM Retries:  + numRMAMRetries
++ . Use the global AM Retries instead.);
+  }
+}
{code}

Secondly, IMO the use of Retry in the name is confusing since we need a minimum 
value 1 for the first attempt and the first attempt is not a retry. alternative 
name could be maxAppAttempts If we continue to use retry in the name then its 
value should be 0 if the attempt is launched only once, since number of retries 
= 0.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-472) MR app master deletes staging dir when sent a reboot command from the RM

2013-03-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-472:


Summary: MR app master deletes staging dir when sent a reboot command from 
the RM  (was: MR Job falied if RM restarted when the job is running)

 MR app master deletes staging dir when sent a reboot command from the RM
 

 Key: YARN-472
 URL: https://issues.apache.org/jira/browse/YARN-472
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: jian he
Assignee: jian he

 If the RM is restarted when the MR job is running , the job failed because 
 the staging directory is cleaned. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-472) MR app master deletes staging dir when sent a reboot command from the RM

2013-03-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-472:


Description: If the RM is restarted when the MR job is running, then it 
sends a reboot command to the job. The job ends up deleting the staging dir and 
that causes the next attempt to fail.  (was: If the RM is restarted when the MR 
job is running , the job failed because the staging directory is cleaned. )

 MR app master deletes staging dir when sent a reboot command from the RM
 

 Key: YARN-472
 URL: https://issues.apache.org/jira/browse/YARN-472
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: jian he
Assignee: jian he

 If the RM is restarted when the MR job is running, then it sends a reboot 
 command to the job. The job ends up deleting the staging dir and that causes 
 the next attempt to fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-462) Project Parameter for Chargeback

2013-03-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601652#comment-13601652
 ] 

Karthik Kambatla commented on YARN-462:
---

Fair points, Kendall. Thanks for the detailed explanation. As Arun said, the 
idea seems to be a very useful one, but we should be wary of adding new 
concepts to YARN. If we decide to go ahead with the chargeback parameter, I am 
concerned if we will end up duplicating a lot of scheduler code - ACLs, 
enforcement etc.

I wonder if the following would satisfy your requirements while leveraging all 
queue definition/ACLs logic and not overloading the scheduler:
- Idea of a 'project' queue that goes under the leaf queues. These 'project' 
queues are transparent to the scheduler at scheduling time, but keep track of 
the actual usage.
- e.g. root.sales.seller1.sell-coconut-project and 
root.sales.seller1.sell-pineapple-project could be two queues for seller1. At 
schedule time, the scheduler views all jobs under both projects to be under 
seller1 and we hopefully won't run into capacity  1 issues you are mentioning. 
Neither does it increase the scheduling latency.

 Project Parameter for Chargeback
 

 Key: YARN-462
 URL: https://issues.apache.org/jira/browse/YARN-462
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp

 Problem Summary
 For the purpose of chargeback and better understanding of grid usage, we need 
 to be able to associate applications with projects, e.g. pipeline X, 
 property Y.  This would allow us to aggregate on this property, thereby 
 helping us compute grid resource usage for the entire project.  Currently, 
 for a given application, two things we know about it are the user that 
 submitted it and the queue it was submitted to.  Below, I'll explain why 
 neither of these is adequate for enterprise-level chargeback and 
 understanding resource allocation needs.
 Why Not Users?
 Its not individual users that are paying the bill -- its projects.  When one 
 of our real users submits an application on a Hadoop grid, they're presumably 
 not usually doing it for themselves.  They're doing work for some project or 
 team effort, so its that team or project that should be charged for all its 
 users applications.  Maintaining outside lists of associations between users 
 and projects is error-prone because it is time-sensitive and requires 
 continued ongoing maintenance.  New users join organizations, users leave and 
 users even change projects.  Furthermore, users may split their time between 
 multiple projects, making it ambiguous as to which of a user's projects a 
 given application should be charged.  Also, there can be headless users, 
 which can be even more difficult to link to a project and can be shared 
 between teams or projects.
 Why Not Queues?
 The purpose of queues is for scheduling.  Overloading the queues concept to 
 also mean who should be charged for an application can have a detrimental 
 effect on the primary purpose of queues.  It could be manageable in the case 
 of a very small number of projects sharing a cluster, but doesn't scale to 
 tens or hundreds of projects sharing a cluster.  If a given cluster is shared 
 between 50 projects, creating 50 separate queues will result in inefficient 
 use of the cluster resources.  Furthermore, a given project may desire more 
 than one queue for different types or priorities of applications.  
 Proposed Solution
 Rather than relying on external tools to infer through the user and/or queue 
 who to charge for a given application, I propose a straightforward approach 
 where that information be explicitly supplied when the application is 
 submitted, just like we do with queues.  Let's use a charge card analogy: 
 when you buy something online, you don't just say who you are and how to ship 
 it, you also specify how you're paying for it.  Similarly, when submitting an 
 application in YARN, you could explicitly specify to whom it's resource usage 
 should be associated (a project, team, cost center, etc).
 This new configuration parameter should default to being optional, so that 
 organizations not interested in chargeback or project-level resource tracking 
 can happily continue on as if it wasn't there.  However, it should be 
 configurable at the cluster-level such that, a given cluster to could elect 
 to make it required, so that all applications would have an associated 
 project.  The value of this new parameter should be exposed via the Resource 
 Manager UI and Resource Manager REST API, so that users and tools can make 
 use of it for chargeback, utilization metrics, etc.
 I'm undecided on what to name the new parameter, as I like the flexibility in 
 the ways it 

[jira] [Created] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed

2013-03-13 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-474:


 Summary: CapacityScheduler does not activate applications when 
configuration is refreshed
 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah


Submit 3 applications to a cluster where capacity scheduler limits allow only 1 
running application. Modify capacity scheduler config to increase value of 
yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. 

The 2 applications not yet in running state do not get launched even though 
limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed

2013-03-13 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-474:
-

Component/s: capacityscheduler

 CapacityScheduler does not activate applications when configuration is 
 refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Hitesh Shah

 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed

2013-03-13 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-474:
-

Target Version/s: 2.0.5-beta

 CapacityScheduler does not activate applications when configuration is 
 refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Hitesh Shah

 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601718#comment-13601718
 ] 

Zhijie Shen commented on YARN-378:
--

@Robert, if RM is supposed to inform AM about the number, it seems to happen no 
early than AM registration. Otherwise, is the launch environment of the AM 
container possible to set by RM? Such that AM can got the number when it is 
constructed?

@Bikas, I like maxAppAttempts better, and the computation logic doesn't need to 
be changed (i.e., otherwise, retries + 1).

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601744#comment-13601744
 ] 

Hitesh Shah commented on YARN-378:
--

How about changing the AMLauncher to add the last retry information into the 
AM's env?

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-03-13 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-475:


 Summary: Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it 
is no longer set in an AM's environment
 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah


AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the 
application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-476) ProcfsBasedProcessTree info message confuses users

2013-03-13 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-476:
---

 Summary: ProcfsBasedProcessTree info message confuses users
 Key: YARN-476
 URL: https://issues.apache.org/jira/browse/YARN-476
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.6
Reporter: Jason Lowe


ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such as 
the following:

{noformat}
2013-03-13 12:41:51,957 INFO [communication thread] 
org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may have 
finished in the interim.
2013-03-13 12:41:51,958 INFO [communication thread] 
org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may have 
finished in the interim.
2013-03-13 12:41:51,958 INFO [communication thread] 
org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may have 
finished in the interim.
{noformat}

As described in MAPREDUCE-4570, this is something that naturally occurs in the 
process of monitoring processes via procfs.  It's uninteresting at best and can 
confuse users who think it's a reason their job isn't running as expected when 
it appears in their logs.

We should either make this DEBUG or remove it entirely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601770#comment-13601770
 ] 

Bikas Saha commented on YARN-378:
-

How about getting an estimate on MAPREDUCE-5062 effort before going down the 
path of env vars. Env vars are brittle and something like this should come 
clearly from the API rather than env vars IMO.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-03-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601810#comment-13601810
 ] 

Zhijie Shen commented on YARN-475:
--

ApplicationConstants.AM_APP_ATTEMPT_ID_ENV seems to be still used by unmanaged 
AM. See the following code in UnmanagedAMLauncher.

{code}
if(!setClasspath  classpath!=null) {
  envAMList.add(CLASSPATH=+classpath);
}

envAMList.add(ApplicationConstants.AM_APP_ATTEMPT_ID_ENV + = + attemptId);

String[] envAM = new String[envAMList.size()];
Process amProc = Runtime.getRuntime().exec(amCmd, envAMList.toArray(envAM));
{code}

Also, it is still checked in the AM of distributed shell.

{code}
if (envs.containsKey(ApplicationConstants.AM_APP_ATTEMPT_ID_ENV)) {
  appAttemptID = ConverterUtils.toApplicationAttemptId(envs
  .get(ApplicationConstants.AM_APP_ATTEMPT_ID_ENV));
}
{code}

 Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in 
 an AM's environment
 ---

 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive 
 the application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601813#comment-13601813
 ] 

Bikas Saha commented on YARN-378:
-

If its too much work in the MR AM then we could set the env in addition to the 
API.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-03-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601838#comment-13601838
 ] 

Bikas Saha commented on YARN-475:
-

Even when we remove it - is there some helper lib/api to help AM's derive 
appid, attempt number etc from the container_id?

 Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in 
 an AM's environment
 ---

 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive 
 the application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-03-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601883#comment-13601883
 ] 

Hitesh Shah commented on YARN-475:
--

{code}
  // get containerIdStr from environment
  ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
  ApplicationAttemptId applicationAttemptId =
  containerId.getApplicationAttemptId();
{code}

 Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in 
 an AM's environment
 ---

 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive 
 the application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-03-13 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601897#comment-13601897
 ] 

Eli Reisman commented on YARN-475:
--

Thanks, I was just going to ask this. So the containerId is the right place 
to get an app id from the AM's container environment? I think I'd doing this in 
my Giraph-YARN patch already but I will check.


 Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in 
 an AM's environment
 ---

 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive 
 the application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-03-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601905#comment-13601905
 ] 

Hitesh Shah commented on YARN-475:
--

Both DistributedShell and UnmanagedAM use it currently but we should remove its 
usage as it is definitely being set in the environment by the RM's AMLauncher.

 Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in 
 an AM's environment
 ---

 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive 
 the application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-437) Update documentation of Writing Yarn applications to match current best practices

2013-03-13 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601913#comment-13601913
 ] 

Eli Reisman commented on YARN-437:
--

I agree. Its painful to wait but this won't get done very often and having used 
the old and new APIs now, I would say this is worth waiting for. An overhaul of 
that document is a must in the near future, though!


 Update documentation of Writing Yarn applications to match current best 
 practices
 -

 Key: YARN-437
 URL: https://issues.apache.org/jira/browse/YARN-437
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
  Labels: usability

 Should fix docs to point to usage of YarnClient and AMRMClient helper libs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-226) Log aggregation should not assume an AppMaster will have containerId 1

2013-03-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601952#comment-13601952
 ] 

Siddharth Seth commented on YARN-226:
-

bq. The Giraph to YARN port assumes giraph tasks start at 2 and up as far as 
container #'s go. Is this unsafe for the future?
One scenario in which the AM does not get container id 1 is when it requires 
more resources than the minimum allocation - in which case reservations come in 
to play. Depending on whether the reservation is the final allocation or 
whether it happens elsewhere - the container id may not be one.
Similarly, assuming the container IDs are contiguous is not valid. IDs can be 
skipped.

 Log aggregation should not assume an AppMaster will have containerId 1
 --

 Key: YARN-226
 URL: https://issues.apache.org/jira/browse/YARN-226
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth

 In case of reservcations, etc - AppMasters may not get container id 1. We 
 likely need additional info in the CLC / tokens indicating whether a 
 container is an AM or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed

2013-03-13 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-474:
---

 Target Version/s: 0.23.7, 2.0.5-beta  (was: 2.0.5-beta)
Affects Version/s: 2.0.3-alpha
   0.23.6

 CapacityScheduler does not activate applications when configuration is 
 refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Hitesh Shah

 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-440) Flatten RegisterNodeManagerResponse

2013-03-13 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-440:
--

Assignee: Xuan Gong

 Flatten RegisterNodeManagerResponse
 ---

 Key: YARN-440
 URL: https://issues.apache.org/jira/browse/YARN-440
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong

 RegisterNodeManagerResponse has another wrapper RegistrationResponse under 
 it, which can be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-439) Flatten NodeHeartbeatResponse

2013-03-13 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-439:
--

Assignee: Xuan Gong

 Flatten NodeHeartbeatResponse
 -

 Key: YARN-439
 URL: https://issues.apache.org/jira/browse/YARN-439
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong

 NodeheartbeatResponse has another wrapper HeartbeatResponse under it, which 
 can be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart

2013-03-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602042#comment-13602042
 ] 

Siddharth Seth commented on YARN-71:


Comments on the latest patch.

- timestamp can moce out - so that the same ts is used across all local dirs.
- Instead of scheduling old files, then renaming the current files and 
scheduling additional deletes - this could change to just rename the current 
files, and schedule deletion once.
In the unit test
-  

 Ensure/confirm that the NodeManager cleans up local-dirs on restart
 ---

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, 
 YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch


 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart

2013-03-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602042#comment-13602042
 ] 

Siddharth Seth edited comment on YARN-71 at 3/14/13 5:20 AM:
-

Comments on the latest patch.

- timestamp can move out - so that the same ts is used across all local dirs.
- Instead of scheduling old files, then renaming the current files and 
scheduling additional deletes - this could change to just rename the current 
files, and schedule deletion once.
In the unit test
- There's a couple of races. One when asserting state as RUNNING since the 
events may not have been processed. Second when asserting file delete, since 
that's also a separate thread.
- Also, the test should verify the correct user being used for deletion; spy on 
the deletion service.
- Minor, Use Records instead of RecordFactory

Also, can you please mention how you've tested the patch.

  was (Author: sseth):
Comments on the latest patch.

- timestamp can moce out - so that the same ts is used across all local dirs.
- Instead of scheduling old files, then renaming the current files and 
scheduling additional deletes - this could change to just rename the current 
files, and schedule deletion once.
In the unit test
-  
  
 Ensure/confirm that the NodeManager cleans up local-dirs on restart
 ---

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, 
 YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch


 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602055#comment-13602055
 ] 

Vinod Kumar Vavilapalli commented on YARN-378:
--

+1 for maxAppAttempts naming.

+1 to Bobby's proposal to add it to env. We are sending across other important 
things like app-attempt-id as part of the env, so +1 for adding this info too.

bq. First in some init method that reads the global value from config, checks 
for errors and sets a sensible default global value.
Yes, this should happen somewhere in the main thread and crash the RM in case 
of invalid configs. RMApp gets created much later, so..

bq. Env vars are brittle..
I suppose this is on Windows?



 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira