date:20130313

[jira] [Assigned] (YARN-381) Improve FS docs

2013-03-13 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza reassigned YARN-381:
---

Assignee: Sandy Ryza

 Improve FS docs
 ---

 Key: YARN-381
 URL: https://issues.apache.org/jira/browse/YARN-381
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Sandy Ryza
Priority: Minor

 The MR2 FS docs could use some improvements.
 Configuration:
 - sizebasedweight - what is the size here? Total memory usage?
 Pool properties:
 - minResources - what does min amount of aggregate memory mean given that 
 this is not a reservation?
 - maxResources - is this a hard limit?
 - weight: How is this  ratio configured?  Eg base is 1 and all weights are 
 relative to that?
 - schedulingMode - what is the default? Is fifo pure fifo, eg waits until all 
 tasks for the job are finished before launching the next job?
 There's no mention of ACLs, even though they're supported. See the CS docs 
 for comparison.
 Also there are a couple typos worth fixing while we're at it, eg finish. 
 apps to run
 Worth keeping in mind that some of these will need to be updated to reflect 
 that resource calculators are now pluggable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-381) Improve FS docs

2013-03-13 Thread Sandy Ryza (JIRA)

[
https://issues.apache.org/jira/browse/YARN-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600981#comment-13600981
]

Sandy Ryza commented on YARN-381:
-

It's in megabytes. Patch coming soon will include this.

Improve FS docs
---

Key: YARN-381
URL: https://issues.apache.org/jira/browse/YARN-381
Project: Hadoop YARN
Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Sandy Ryza
Priority: Minor

The MR2 FS docs could use some improvements.
Configuration:
- sizebasedweight - what is the size here? Total memory usage?
Pool properties:
- minResources - what does min amount of aggregate memory mean given that
this is not a reservation?
- maxResources - is this a hard limit?
- weight: How is this ratio configured? Eg base is 1 and all weights are
relative to that?
- schedulingMode - what is the default? Is fifo pure fifo, eg waits until all
tasks for the job are finished before launching the next job?
There's no mention of ACLs, even though they're supported. See the CS docs
for comparison.
Also there are a couple typos worth fixing while we're at it, eg finish.
apps to run
Worth keeping in mind that some of these will need to be updated to reflect
that resource calculators are now pluggable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601029#comment-13601029
 ] 

Hudson commented on YARN-198:
-

Integrated in Hadoop-Yarn-trunk #154 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/154/])
YARN-198. Added a link to RM pages from the NodeManager web app. 
Contributed by Jian He. (Revision 1455800)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1455800
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java


 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Fix For: 2.0.5-beta

 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601102#comment-13601102
 ] 

Hudson commented on YARN-198:
-

Integrated in Hadoop-Hdfs-trunk #1343 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1343/])
YARN-198. Added a link to RM pages from the NodeManager web app. 
Contributed by Jian He. (Revision 1455800)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1455800
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java


 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Fix For: 2.0.5-beta

 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601161#comment-13601161
 ] 

Hudson commented on YARN-198:
-

Integrated in Hadoop-Mapreduce-trunk #1371 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1371/])
YARN-198. Added a link to RM pages from the NodeManager web app. 
Contributed by Jian He. (Revision 1455800)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1455800
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java


 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Fix For: 2.0.5-beta

 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Robert Joseph Evans (JIRA)

[
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601237#comment-13601237
]

Robert Joseph Evans commented on YARN-378:
--

The patch looks good to me. The only problem I have is with how we are
informing the AM of the maximum number of retires that it has. This should
work, but it is going to require a lot of changes to the MR AM to use it.
Right now the number is used in the init of MRAppMaster, but we will not get
that information until start() is called and we register with the RM. I would
much rather see a new environment variable added that can hold this
information, because it makes MAPREDUCE-5062 much simpler. But I am OK with
the way it currently is.

ApplicationMaster retry times should be set by Client
-

Key: YARN-378
URL: https://issues.apache.org/jira/browse/YARN-378
Project: Hadoop YARN
Issue Type: Sub-task
Components: client, resourcemanager
Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
Labels: usability
Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch,
YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch

We should support that different client or user have different
ApplicationMaster retry times. It also say that
yarn.resourcemanager.am.max-retries should be set by client.

[jira] [Commented] (YARN-462) Project Parameter for Chargeback

2013-03-13 Thread Kendall Thrapp (JIRA)

[
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601245#comment-13601245
]

Kendall Thrapp commented on YARN-462:
-

Thanks for the questions and feedback. Yes, first I should clarify what I
intended by chargeback. I'm looking to be able quantify cluster resource usage
(memory, CPU, HDFS, etc.) for every application, and then roll that up to the
project level. This would allow us to accurately charge the customer (i.e.
team/project) for their grid usage (either literally or just informatively). I
want to provide incentive for more efficient coding, as well as make it easier
for teams to compare their resource usage across different software versions of
their Hadoop applications, config parameter changes, etc.

I had originally hoped that hierarchical queues could serve this purpose as
well, but have since run into several issues with this approach. The first is
that it doesn't scale for clusters with large numbers of projects. I've seen
large clusters shared between over a hundred different projects, each with
their own teams of users. If I recall correctly, queues can't be assigned less
than 1% of the total capacity, so it wouldn't be possible to give each of these
project their own queue. Even if we could, I suspect this could result in too
much overhead for the scheduler and too much fragmentation of the cluster
resources, which could result in poorer overall utilization.

The second issue is that the project-per-queue approach conflicts with how I
see users wanting to use our queues. In many cases I see queues being used to
distinguish application priorities, ensuring that high priority time-sensitive
jobs get the resources they need to finish on time, while big but lower
priority and less time-sensitive jobs are constrained by being in a smaller
queue. I'd expect a lot of pushback from our users for any chargeback-focused
queue configuration that had a negative impact on job run times and meeting
SLAs. The idea of the project/chargeback parameter decouples the two.

Project Parameter for Chargeback

Key: YARN-462
URL: https://issues.apache.org/jira/browse/YARN-462
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp

[jira] [Commented] (YARN-379) yarn [node,application] command print logger info messages

2013-03-13 Thread Thomas Graves (JIRA)

[
https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601259#comment-13601259
]

Thomas Graves commented on YARN-379:

I think the approach looks fine. Did you see a way to just disable the logging
for AbstractService for these calls rather then everything?

minor nit, can you change the name of COMMON_LOGGING_OPTS to be something more
like YARN_CLI_NOLOG_OPTS and add a comment about what it is for.

yarn [node,application] command print logger info messages
--

Key: YARN-379
URL: https://issues.apache.org/jira/browse/YARN-379
Project: Hadoop YARN
Issue Type: Bug
Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Abhishek Kapoor
Labels: usability
Attachments: YARN-379.patch

Running the yarn node and yarn applications command results in annoying log
info messages being printed:
$ yarn node -list
13/02/06 02:36:50 INFO service.AbstractService:
Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/02/06 02:36:50 INFO service.AbstractService:
Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
Total Nodes:1
Node-IdNode-State Node-Http-Address
Health-Status(isNodeHealthy)Running-Containers
foo:8041RUNNING foo:8042 true
0
13/02/06 02:36:50 INFO service.AbstractService:
Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.
$ yarn application
13/02/06 02:38:47 INFO service.AbstractService:
Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/02/06 02:38:47 INFO service.AbstractService:
Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
Invalid Command Usage :
usage: application
-kill arg Kills the application.
-list Lists all the Applications from RM.
-status arg Prints the status of the application.
13/02/06 02:38:47 INFO service.AbstractService:
Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-13 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-460:
---

Priority: Blocker  (was: Critical)

 CS user left in list of active users for the queue even when application 
 finished
 -

 Key: YARN-460
 URL: https://issues.apache.org/jira/browse/YARN-460
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.7, 2.0.4-alpha
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Blocker
 Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, 
 YARN-460.patch, YARN-460.patch, YARN-460.patch


 We have seen a user get left in the queues list of active users even though 
 the application was removed. This can cause everyone else in the queue to get 
 less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-449) HBase test failures when running against Hadoop 2

2013-03-13 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601356#comment-13601356
 ] 

Ted Yu commented on YARN-449:
-

Here is OS for Hadoop QA machine:
Linux asf002.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
17:42:25 UTC 2011 x86_64 GNU/Linux

Here is OS for the machine where I ran unit test manually:
Linux ygridcore.net 2.6.32-220.23.1.el6.YAHOO.20120713.x86_64 #1 SMP Fri Jul 13 
11:40:51 CDT 2012 x86_64 x86_64 x86_64 GNU/Linux

 HBase test failures when running against Hadoop 2
 -

 Key: YARN-449
 URL: https://issues.apache.org/jira/browse/YARN-449
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Priority: Blocker
 Attachments: 7904-v5.txt, hbase-7904-v3.txt, 
 hbase-TestHFileOutputFormat-wip.txt, hbase-TestingUtility-wip.txt, 
 minimr_randomdir-branch2.txt


 Post YARN-429, unit tests for HBase continue to fail since the classpath for 
 the MRAppMaster is not being set correctly.
 Reverting YARN-129 may fix this, but I'm not sure that's the correct 
 solution. My guess is, as Alexandro pointed out in YARN-129, maven 
 classloader magic is messing up java.class.path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications

2013-03-13 Thread Kendall Thrapp (JIRA)

Kendall Thrapp created YARN-473:
---

 Summary: Capacity Scheduler webpage and REST API not showing 
correct number of pending applications
 Key: YARN-473
 URL: https://issues.apache.org/jira/browse/YARN-473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.6
Reporter: Kendall Thrapp


The Capacity Scheduler REST API 
(http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
 is not returning the correct number of pending applications.  
numPendingApplications is almost always zero, even if there are dozens of 
pending apps.

In investigating this, I discovered that the Resource Manager's Scheduler 
webpage is als showing an incorrect but different number of pending 
applications.  For example, the cluster I'm looking at right now currently has 
15 applications in the ACCEPTED state, but the Cluster Metrics table near the 
top of the page says there are only 2 pending apps.  The REST API says there 
are zero pending apps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications

2013-03-13 Thread Kendall Thrapp (JIRA)

[
https://issues.apache.org/jira/browse/YARN-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kendall Thrapp updated YARN-473:

Description:
The Capacity Scheduler REST API
(http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
is not returning the correct number of pending applications.
numPendingApplications is almost always zero, even if there are dozens of
pending apps.

In investigating this, I discovered that the Resource Manager's Scheduler
webpage is also showing an incorrect but different number of pending
applications. For example, the cluster I'm looking at right now currently has
15 applications in the ACCEPTED state, but the Cluster Metrics table near the
top of the page says there are only 2 pending apps. The REST API says there
are zero pending apps.

was:
The Capacity Scheduler REST API
(http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
is not returning the correct number of pending applications.
numPendingApplications is almost always zero, even if there are dozens of
pending apps.

In investigating this, I discovered that the Resource Manager's Scheduler
webpage is als showing an incorrect but different number of pending
applications. For example, the cluster I'm looking at right now currently has
15 applications in the ACCEPTED state, but the Cluster Metrics table near the
top of the page says there are only 2 pending apps. The REST API says there
are zero pending apps.

Capacity Scheduler webpage and REST API not showing correct number of pending
applications
--

Key: YARN-473
URL: https://issues.apache.org/jira/browse/YARN-473
Project: Hadoop YARN
Issue Type: Bug
Components: capacityscheduler
Affects Versions: 0.23.6
Reporter: Kendall Thrapp

The Capacity Scheduler REST API
(http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
is not returning the correct number of pending applications.
numPendingApplications is almost always zero, even if there are dozens of
pending apps.
In investigating this, I discovered that the Resource Manager's Scheduler
webpage is also showing an incorrect but different number of pending
applications. For example, the cluster I'm looking at right now currently
has 15 applications in the ACCEPTED state, but the Cluster Metrics table near
the top of the page says there are only 2 pending apps. The REST API says
there are zero pending apps.

[jira] [Commented] (YARN-462) Project Parameter for Chargeback

2013-03-13 Thread Andy Rhee (JIRA)

[
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601601#comment-13601601
]

Andy Rhee commented on YARN-462:

Kendall - Again, another great idea! Two things popped in my mind.

1. I wonder if we need to also verify and enforce project validity on a given
cluster mapped to a whitelist or blacklist in the cluster config (this might
even be tied to external source of truth like LDAP later) or decouple or
delegate validation to other parts or external process, e.g. queue, user, or
project accounting.

2. Another interesting spin off of your idea could be flexible enforceable
parameters or meta config. Instead of keep modifying the code every time we
have a great idea for a new parameter to enforce, it may be more cost effective
to allow admins to define enforceable parameters in the cluster config, so that
we don't have to worry about what to name new parameter or changing it later,
IMHO :)

Project Parameter for Chargeback

Key: YARN-462
URL: https://issues.apache.org/jira/browse/YARN-462
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp

Problem Summary
For the purpose of chargeback and better understanding of grid usage, we need
to be able to associate applications with projects, e.g. pipeline X,
property Y. This would allow us to aggregate on this property, thereby
helping us compute grid resource usage for the entire project. Currently,
for a given application, two things we know about it are the user that
submitted it and the queue it was submitted to. Below, I'll explain why
neither of these is adequate for enterprise-level chargeback and
understanding resource allocation needs.
Why Not Users?
Its not individual users that are paying the bill -- its projects. When one
of our real users submits an application on a Hadoop grid, they're presumably
not usually doing it for themselves. They're doing work for some project or
team effort, so its that team or project that should be charged for all its
users applications. Maintaining outside lists of associations between users
and projects is error-prone because it is time-sensitive and requires
continued ongoing maintenance. New users join organizations, users leave and
users even change projects. Furthermore, users may split their time between
multiple projects, making it ambiguous as to which of a user's projects a
given application should be charged. Also, there can be headless users,
which can be even more difficult to link to a project and can be shared
between teams or projects.
Why Not Queues?
The purpose of queues is for scheduling. Overloading the queues concept to
also mean who should be charged for an application can have a detrimental
effect on the primary purpose of queues. It could be manageable in the case
of a very small number of projects sharing a cluster, but doesn't scale to
tens or hundreds of projects sharing a cluster. If a given cluster is shared
between 50 projects, creating 50 separate queues will result in inefficient
use of the cluster resources. Furthermore, a given project may desire more
than one queue for different types or priorities of applications.
Proposed Solution
Rather than relying on external tools to infer through the user and/or queue
who to charge for a given application, I propose a straightforward approach
where that information be explicitly supplied when the application is
submitted, just like we do with queues. Let's use a charge card analogy:
when you buy something online, you don't just say who you are and how to ship
it, you also specify how you're paying for it. Similarly, when submitting an
application in YARN, you could explicitly specify to whom it's resource usage
should be associated (a project, team, cost center, etc).
This new configuration parameter should default to being optional, so that
organizations not interested in chargeback or project-level resource tracking
can happily continue on as if it wasn't there. However, it should be
configurable at the cluster-level such that, a given cluster to could elect
to make it required, so that all applications would have an associated
project. The value of this new parameter should be exposed via the Resource
Manager UI and Resource Manager REST API, so that users and tools can make
use of it for chargeback, utilization metrics, etc.
I'm undecided on what to name the new parameter, as I like the flexibility in
the ways it could be used. It is essentially just an additional party other
than user or queue that an application can be associated with, so its use is
not just limited to a chargeback

[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601625#comment-13601625
 ] 

Bikas Saha commented on YARN-378:
-

+1 for Vinods comments. Also, personally, I would break down the following code 
in 2 places. First in some init method that reads the global value from config, 
checks for errors and sets a sensible default global value. Once that is done, 
use the appValue and globalValue to set the actual value. The current code is 
making me think more than I need to IMO.
{code}
+int numRMAMRetries = conf.getInt(YarnConfiguration.RM_AM_MAX_RETRIES, 
 YarnConfiguration.DEFAULT_RM_AM_MAX_RETRIES);
+int numAPPAMRetries = submissionContext.getNumMaxRetries();
+if (numAPPAMRetries = 0) {
+  if (numRMAMRetries = 0) {
+// AM needs to try once at least
+this.maxRetries = 1;
+LOG.error(AM Retries is wrongly configured. The specific AM Retries: 
++ numAPPAMRetries +  for application: 
++ applicationId.getId() + , the global AM Retries: 
++ numRMAMRetries);
+  } else {
+this.maxRetries = numRMAMRetries;
+  }
+} else {
+  if (numAPPAMRetries = numRMAMRetries) {
+this.maxRetries = numAPPAMRetries;
+  } else {
+this.maxRetries = numRMAMRetries;
+LOG.warn(The specific AM Retries:  + numAPPAMRetries
++  for application:  + applicationId.getId()
++  is larger than the global AM Retries:  + numRMAMRetries
++ . Use the global AM Retries instead.);
+  }
+}
{code}

Secondly, IMO the use of Retry in the name is confusing since we need a minimum 
value 1 for the first attempt and the first attempt is not a retry. alternative 
name could be maxAppAttempts If we continue to use retry in the name then its 
value should be 0 if the attempt is launched only once, since number of retries 
= 0.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-472) MR app master deletes staging dir when sent a reboot command from the RM

2013-03-13 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-472:


Summary: MR app master deletes staging dir when sent a reboot command from 
the RM  (was: MR Job falied if RM restarted when the job is running)

 MR app master deletes staging dir when sent a reboot command from the RM
 

 Key: YARN-472
 URL: https://issues.apache.org/jira/browse/YARN-472
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: jian he
Assignee: jian he

 If the RM is restarted when the MR job is running , the job failed because 
 the staging directory is cleaned. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-472) MR app master deletes staging dir when sent a reboot command from the RM

2013-03-13 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-472:


Description: If the RM is restarted when the MR job is running, then it 
sends a reboot command to the job. The job ends up deleting the staging dir and 
that causes the next attempt to fail.  (was: If the RM is restarted when the MR 
job is running , the job failed because the staging directory is cleaned. )

 MR app master deletes staging dir when sent a reboot command from the RM
 

 Key: YARN-472
 URL: https://issues.apache.org/jira/browse/YARN-472
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: jian he
Assignee: jian he

 If the RM is restarted when the MR job is running, then it sends a reboot 
 command to the job. The job ends up deleting the staging dir and that causes 
 the next attempt to fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-462) Project Parameter for Chargeback

2013-03-13 Thread Karthik Kambatla (JIRA)

[
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601652#comment-13601652
]

Karthik Kambatla commented on YARN-462:
---

Fair points, Kendall. Thanks for the detailed explanation. As Arun said, the
idea seems to be a very useful one, but we should be wary of adding new
concepts to YARN. If we decide to go ahead with the chargeback parameter, I am
concerned if we will end up duplicating a lot of scheduler code - ACLs,
enforcement etc.

I wonder if the following would satisfy your requirements while leveraging all
queue definition/ACLs logic and not overloading the scheduler:
- Idea of a 'project' queue that goes under the leaf queues. These 'project'
queues are transparent to the scheduler at scheduling time, but keep track of
the actual usage.
- e.g. root.sales.seller1.sell-coconut-project and
root.sales.seller1.sell-pineapple-project could be two queues for seller1. At
schedule time, the scheduler views all jobs under both projects to be under
seller1 and we hopefully won't run into capacity 1 issues you are mentioning.
Neither does it increase the scheduling latency.

Project Parameter for Chargeback

Key: YARN-462
URL: https://issues.apache.org/jira/browse/YARN-462
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp

[jira] [Created] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed

2013-03-13 Thread Hitesh Shah (JIRA)

Hitesh Shah created YARN-474:


 Summary: CapacityScheduler does not activate applications when 
configuration is refreshed
 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah


Submit 3 applications to a cluster where capacity scheduler limits allow only 1 
running application. Modify capacity scheduler config to increase value of 
yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. 

The 2 applications not yet in running state do not get launched even though 
limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed

2013-03-13 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-474:
-

Component/s: capacityscheduler

 CapacityScheduler does not activate applications when configuration is 
 refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Hitesh Shah

 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed

2013-03-13 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-474:
-

Target Version/s: 2.0.5-beta

 CapacityScheduler does not activate applications when configuration is 
 refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Hitesh Shah

 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601718#comment-13601718
 ] 

Zhijie Shen commented on YARN-378:
--

@Robert, if RM is supposed to inform AM about the number, it seems to happen no 
early than AM registration. Otherwise, is the launch environment of the AM 
container possible to set by RM? Such that AM can got the number when it is 
constructed?

@Bikas, I like maxAppAttempts better, and the computation logic doesn't need to 
be changed (i.e., otherwise, retries + 1).

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601744#comment-13601744
 ] 

Hitesh Shah commented on YARN-378:
--

How about changing the AMLauncher to add the last retry information into the 
AM's env?

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-03-13 Thread Hitesh Shah (JIRA)

Hitesh Shah created YARN-475:


 Summary: Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it 
is no longer set in an AM's environment
 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah


AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the 
application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-476) ProcfsBasedProcessTree info message confuses users

2013-03-13 Thread Jason Lowe (JIRA)

Jason Lowe created YARN-476:
---

 Summary: ProcfsBasedProcessTree info message confuses users
 Key: YARN-476
 URL: https://issues.apache.org/jira/browse/YARN-476
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.6
Reporter: Jason Lowe


ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such as 
the following:

{noformat}
2013-03-13 12:41:51,957 INFO [communication thread] 
org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may have 
finished in the interim.
2013-03-13 12:41:51,958 INFO [communication thread] 
org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may have 
finished in the interim.
2013-03-13 12:41:51,958 INFO [communication thread] 
org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may have 
finished in the interim.
{noformat}

As described in MAPREDUCE-4570, this is something that naturally occurs in the 
process of monitoring processes via procfs.  It's uninteresting at best and can 
confuse users who think it's a reason their job isn't running as expected when 
it appears in their logs.

We should either make this DEBUG or remove it entirely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601770#comment-13601770
 ] 

Bikas Saha commented on YARN-378:
-

How about getting an estimate on MAPREDUCE-5062 effort before going down the 
path of env vars. Env vars are brittle and something like this should come 
clearly from the API rather than env vars IMO.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-03-13 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601810#comment-13601810
 ] 

Zhijie Shen commented on YARN-475:
--

ApplicationConstants.AM_APP_ATTEMPT_ID_ENV seems to be still used by unmanaged 
AM. See the following code in UnmanagedAMLauncher.

{code}
if(!setClasspath  classpath!=null) {
  envAMList.add(CLASSPATH=+classpath);
}

envAMList.add(ApplicationConstants.AM_APP_ATTEMPT_ID_ENV + = + attemptId);

String[] envAM = new String[envAMList.size()];
Process amProc = Runtime.getRuntime().exec(amCmd, envAMList.toArray(envAM));
{code}

Also, it is still checked in the AM of distributed shell.

{code}
if (envs.containsKey(ApplicationConstants.AM_APP_ATTEMPT_ID_ENV)) {
  appAttemptID = ConverterUtils.toApplicationAttemptId(envs
  .get(ApplicationConstants.AM_APP_ATTEMPT_ID_ENV));
}
{code}

 Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in 
 an AM's environment
 ---

 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive 
 the application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601813#comment-13601813
 ] 

Bikas Saha commented on YARN-378:
-

If its too much work in the MR AM then we could set the env in addition to the 
API.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-03-13 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601838#comment-13601838
 ] 

Bikas Saha commented on YARN-475:
-

Even when we remove it - is there some helper lib/api to help AM's derive 
appid, attempt number etc from the container_id?

 Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in 
 an AM's environment
 ---

 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive 
 the application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-03-13 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601883#comment-13601883
 ] 

Hitesh Shah commented on YARN-475:
--

{code}
  // get containerIdStr from environment
  ContainerId containerId = ConverterUtils.toContainerId(containerIdStr);
  ApplicationAttemptId applicationAttemptId =
  containerId.getApplicationAttemptId();
{code}

 Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in 
 an AM's environment
 ---

 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive 
 the application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-03-13 Thread Eli Reisman (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601897#comment-13601897
 ] 

Eli Reisman commented on YARN-475:
--

Thanks, I was just going to ask this. So the containerId is the right place 
to get an app id from the AM's container environment? I think I'd doing this in 
my Giraph-YARN patch already but I will check.


 Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in 
 an AM's environment
 ---

 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive 
 the application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-03-13 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601905#comment-13601905
 ] 

Hitesh Shah commented on YARN-475:
--

Both DistributedShell and UnmanagedAM use it currently but we should remove its 
usage as it is definitely being set in the environment by the RM's AMLauncher.

 Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in 
 an AM's environment
 ---

 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive 
 the application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-437) Update documentation of Writing Yarn applications to match current best practices

2013-03-13 Thread Eli Reisman (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601913#comment-13601913
 ] 

Eli Reisman commented on YARN-437:
--

I agree. Its painful to wait but this won't get done very often and having used 
the old and new APIs now, I would say this is worth waiting for. An overhaul of 
that document is a must in the near future, though!


 Update documentation of Writing Yarn applications to match current best 
 practices
 -

 Key: YARN-437
 URL: https://issues.apache.org/jira/browse/YARN-437
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
  Labels: usability

 Should fix docs to point to usage of YarnClient and AMRMClient helper libs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-226) Log aggregation should not assume an AppMaster will have containerId 1

2013-03-13 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601952#comment-13601952
 ] 

Siddharth Seth commented on YARN-226:
-

bq. The Giraph to YARN port assumes giraph tasks start at 2 and up as far as 
container #'s go. Is this unsafe for the future?
One scenario in which the AM does not get container id 1 is when it requires 
more resources than the minimum allocation - in which case reservations come in 
to play. Depending on whether the reservation is the final allocation or 
whether it happens elsewhere - the container id may not be one.
Similarly, assuming the container IDs are contiguous is not valid. IDs can be 
skipped.

 Log aggregation should not assume an AppMaster will have containerId 1
 --

 Key: YARN-226
 URL: https://issues.apache.org/jira/browse/YARN-226
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth

 In case of reservcations, etc - AppMasters may not get container id 1. We 
 likely need additional info in the CLC / tokens indicating whether a 
 container is an AM or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed

2013-03-13 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-474:
---

 Target Version/s: 0.23.7, 2.0.5-beta  (was: 2.0.5-beta)
Affects Version/s: 2.0.3-alpha
   0.23.6

 CapacityScheduler does not activate applications when configuration is 
 refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Hitesh Shah

 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-440) Flatten RegisterNodeManagerResponse

2013-03-13 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-440:
--

Assignee: Xuan Gong

 Flatten RegisterNodeManagerResponse
 ---

 Key: YARN-440
 URL: https://issues.apache.org/jira/browse/YARN-440
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong

 RegisterNodeManagerResponse has another wrapper RegistrationResponse under 
 it, which can be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-439) Flatten NodeHeartbeatResponse

2013-03-13 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-439:
--

Assignee: Xuan Gong

 Flatten NodeHeartbeatResponse
 -

 Key: YARN-439
 URL: https://issues.apache.org/jira/browse/YARN-439
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong

 NodeheartbeatResponse has another wrapper HeartbeatResponse under it, which 
 can be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart

2013-03-13 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602042#comment-13602042
 ] 

Siddharth Seth commented on YARN-71:


Comments on the latest patch.

- timestamp can moce out - so that the same ts is used across all local dirs.
- Instead of scheduling old files, then renaming the current files and 
scheduling additional deletes - this could change to just rename the current 
files, and schedule deletion once.
In the unit test
-  

 Ensure/confirm that the NodeManager cleans up local-dirs on restart
 ---

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, 
 YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch


 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart

2013-03-13 Thread Siddharth Seth (JIRA)

[
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602042#comment-13602042
]

Siddharth Seth edited comment on YARN-71 at 3/14/13 5:20 AM:
-

Comments on the latest patch.

- timestamp can move out - so that the same ts is used across all local dirs.
- Instead of scheduling old files, then renaming the current files and
scheduling additional deletes - this could change to just rename the current
files, and schedule deletion once.
In the unit test
- There's a couple of races. One when asserting state as RUNNING since the
events may not have been processed. Second when asserting file delete, since
that's also a separate thread.
- Also, the test should verify the correct user being used for deletion; spy on
the deletion service.
- Minor, Use Records instead of RecordFactory

Also, can you please mention how you've tested the patch.

was (Author: sseth):
Comments on the latest patch.

- timestamp can moce out - so that the same ts is used across all local dirs.
- Instead of scheduling old files, then renaming the current files and
scheduling additional deletes - this could change to just rename the current
files, and schedule deletion once.
In the unit test
-

Ensure/confirm that the NodeManager cleans up local-dirs on restart
---

Key: YARN-71
URL: https://issues.apache.org/jira/browse/YARN-71
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Critical
Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch,
YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch

We have to make sure that NodeManagers cleanup their local files on restart.
It may already be working like that in which case we should have tests
validating this.

[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-13 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602055#comment-13602055
]

Vinod Kumar Vavilapalli commented on YARN-378:
--

+1 for maxAppAttempts naming.

+1 to Bobby's proposal to add it to env. We are sending across other important
things like app-attempt-id as part of the env, so +1 for adding this info too.

bq. First in some init method that reads the global value from config, checks
for errors and sets a sensible default global value.
Yes, this should happen somewhere in the main thread and crash the RM in case
of invalid configs. RMApp gets created much later, so..

bq. Env vars are brittle..
I suppose this is on Windows?

ApplicationMaster retry times should be set by Client
-

We should support that different client or user have different
ApplicationMaster retry times. It also say that
yarn.resourcemanager.am.max-retries should be set by client.

39 matches

Mail list logo