[jira] [Assigned] (YARN-381) Improve FS docs
[ https://issues.apache.org/jira/browse/YARN-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned YARN-381: --- Assignee: Sandy Ryza Improve FS docs --- Key: YARN-381 URL: https://issues.apache.org/jira/browse/YARN-381 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Sandy Ryza Priority: Minor The MR2 FS docs could use some improvements. Configuration: - sizebasedweight - what is the size here? Total memory usage? Pool properties: - minResources - what does min amount of aggregate memory mean given that this is not a reservation? - maxResources - is this a hard limit? - weight: How is this ratio configured? Eg base is 1 and all weights are relative to that? - schedulingMode - what is the default? Is fifo pure fifo, eg waits until all tasks for the job are finished before launching the next job? There's no mention of ACLs, even though they're supported. See the CS docs for comparison. Also there are a couple typos worth fixing while we're at it, eg finish. apps to run Worth keeping in mind that some of these will need to be updated to reflect that resource calculators are now pluggable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-381) Improve FS docs
[ https://issues.apache.org/jira/browse/YARN-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600981#comment-13600981 ] Sandy Ryza commented on YARN-381: - It's in megabytes. Patch coming soon will include this. Improve FS docs --- Key: YARN-381 URL: https://issues.apache.org/jira/browse/YARN-381 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Sandy Ryza Priority: Minor The MR2 FS docs could use some improvements. Configuration: - sizebasedweight - what is the size here? Total memory usage? Pool properties: - minResources - what does min amount of aggregate memory mean given that this is not a reservation? - maxResources - is this a hard limit? - weight: How is this ratio configured? Eg base is 1 and all weights are relative to that? - schedulingMode - what is the default? Is fifo pure fifo, eg waits until all tasks for the job are finished before launching the next job? There's no mention of ACLs, even though they're supported. See the CS docs for comparison. Also there are a couple typos worth fixing while we're at it, eg finish. apps to run Worth keeping in mind that some of these will need to be updated to reflect that resource calculators are now pluggable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager
[ https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601029#comment-13601029 ] Hudson commented on YARN-198: - Integrated in Hadoop-Yarn-trunk #154 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/154/]) YARN-198. Added a link to RM pages from the NodeManager web app. Contributed by Jian He. (Revision 1455800) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1455800 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager --- Key: YARN-198 URL: https://issues.apache.org/jira/browse/YARN-198 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Ramgopal N Assignee: jian he Priority: Minor Labels: usability Fix For: 2.0.5-beta Attachments: YARN-198.patch If we are navigating to Nodemanager by clicking on the node link in RM,there is no link provided on the NM to navigate back to RM. If there is a link to navigate back to RM it would be good -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager
[ https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601102#comment-13601102 ] Hudson commented on YARN-198: - Integrated in Hadoop-Hdfs-trunk #1343 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1343/]) YARN-198. Added a link to RM pages from the NodeManager web app. Contributed by Jian He. (Revision 1455800) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1455800 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager --- Key: YARN-198 URL: https://issues.apache.org/jira/browse/YARN-198 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Ramgopal N Assignee: jian he Priority: Minor Labels: usability Fix For: 2.0.5-beta Attachments: YARN-198.patch If we are navigating to Nodemanager by clicking on the node link in RM,there is no link provided on the NM to navigate back to RM. If there is a link to navigate back to RM it would be good -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager
[ https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601161#comment-13601161 ] Hudson commented on YARN-198: - Integrated in Hadoop-Mapreduce-trunk #1371 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1371/]) YARN-198. Added a link to RM pages from the NodeManager web app. Contributed by Jian He. (Revision 1455800) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1455800 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager --- Key: YARN-198 URL: https://issues.apache.org/jira/browse/YARN-198 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Ramgopal N Assignee: jian he Priority: Minor Labels: usability Fix For: 2.0.5-beta Attachments: YARN-198.patch If we are navigating to Nodemanager by clicking on the node link in RM,there is no link provided on the NM to navigate back to RM. If there is a link to navigate back to RM it would be good -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601237#comment-13601237 ] Robert Joseph Evans commented on YARN-378: -- The patch looks good to me. The only problem I have is with how we are informing the AM of the maximum number of retires that it has. This should work, but it is going to require a lot of changes to the MR AM to use it. Right now the number is used in the init of MRAppMaster, but we will not get that information until start() is called and we register with the RM. I would much rather see a new environment variable added that can hold this information, because it makes MAPREDUCE-5062 much simpler. But I am OK with the way it currently is. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-462) Project Parameter for Chargeback
[ https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601245#comment-13601245 ] Kendall Thrapp commented on YARN-462: - Thanks for the questions and feedback. Yes, first I should clarify what I intended by chargeback. I'm looking to be able quantify cluster resource usage (memory, CPU, HDFS, etc.) for every application, and then roll that up to the project level. This would allow us to accurately charge the customer (i.e. team/project) for their grid usage (either literally or just informatively). I want to provide incentive for more efficient coding, as well as make it easier for teams to compare their resource usage across different software versions of their Hadoop applications, config parameter changes, etc. I had originally hoped that hierarchical queues could serve this purpose as well, but have since run into several issues with this approach. The first is that it doesn't scale for clusters with large numbers of projects. I've seen large clusters shared between over a hundred different projects, each with their own teams of users. If I recall correctly, queues can't be assigned less than 1% of the total capacity, so it wouldn't be possible to give each of these project their own queue. Even if we could, I suspect this could result in too much overhead for the scheduler and too much fragmentation of the cluster resources, which could result in poorer overall utilization. The second issue is that the project-per-queue approach conflicts with how I see users wanting to use our queues. In many cases I see queues being used to distinguish application priorities, ensuring that high priority time-sensitive jobs get the resources they need to finish on time, while big but lower priority and less time-sensitive jobs are constrained by being in a smaller queue. I'd expect a lot of pushback from our users for any chargeback-focused queue configuration that had a negative impact on job run times and meeting SLAs. The idea of the project/chargeback parameter decouples the two. Project Parameter for Chargeback Key: YARN-462 URL: https://issues.apache.org/jira/browse/YARN-462 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Problem Summary For the purpose of chargeback and better understanding of grid usage, we need to be able to associate applications with projects, e.g. pipeline X, property Y. This would allow us to aggregate on this property, thereby helping us compute grid resource usage for the entire project. Currently, for a given application, two things we know about it are the user that submitted it and the queue it was submitted to. Below, I'll explain why neither of these is adequate for enterprise-level chargeback and understanding resource allocation needs. Why Not Users? Its not individual users that are paying the bill -- its projects. When one of our real users submits an application on a Hadoop grid, they're presumably not usually doing it for themselves. They're doing work for some project or team effort, so its that team or project that should be charged for all its users applications. Maintaining outside lists of associations between users and projects is error-prone because it is time-sensitive and requires continued ongoing maintenance. New users join organizations, users leave and users even change projects. Furthermore, users may split their time between multiple projects, making it ambiguous as to which of a user's projects a given application should be charged. Also, there can be headless users, which can be even more difficult to link to a project and can be shared between teams or projects. Why Not Queues? The purpose of queues is for scheduling. Overloading the queues concept to also mean who should be charged for an application can have a detrimental effect on the primary purpose of queues. It could be manageable in the case of a very small number of projects sharing a cluster, but doesn't scale to tens or hundreds of projects sharing a cluster. If a given cluster is shared between 50 projects, creating 50 separate queues will result in inefficient use of the cluster resources. Furthermore, a given project may desire more than one queue for different types or priorities of applications. Proposed Solution Rather than relying on external tools to infer through the user and/or queue who to charge for a given application, I propose a straightforward approach where that information be explicitly supplied when the application is submitted, just like we do with queues. Let's use a charge card analogy: when you buy something online, you don't
[jira] [Commented] (YARN-379) yarn [node,application] command print logger info messages
[ https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601259#comment-13601259 ] Thomas Graves commented on YARN-379: I think the approach looks fine. Did you see a way to just disable the logging for AbstractService for these calls rather then everything? minor nit, can you change the name of COMMON_LOGGING_OPTS to be something more like YARN_CLI_NOLOG_OPTS and add a comment about what it is for. yarn [node,application] command print logger info messages -- Key: YARN-379 URL: https://issues.apache.org/jira/browse/YARN-379 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Abhishek Kapoor Labels: usability Attachments: YARN-379.patch Running the yarn node and yarn applications command results in annoying log info messages being printed: $ yarn node -list 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Total Nodes:1 Node-IdNode-State Node-Http-Address Health-Status(isNodeHealthy)Running-Containers foo:8041RUNNING foo:8042 true 0 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped. $ yarn application 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Invalid Command Usage : usage: application -kill arg Kills the application. -list Lists all the Applications from RM. -status arg Prints the status of the application. 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-460: --- Priority: Blocker (was: Critical) CS user left in list of active users for the queue even when application finished - Key: YARN-460 URL: https://issues.apache.org/jira/browse/YARN-460 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Thomas Graves Assignee: Thomas Graves Priority: Blocker Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-449) HBase test failures when running against Hadoop 2
[ https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601356#comment-13601356 ] Ted Yu commented on YARN-449: - Here is OS for Hadoop QA machine: Linux asf002.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux Here is OS for the machine where I ran unit test manually: Linux ygridcore.net 2.6.32-220.23.1.el6.YAHOO.20120713.x86_64 #1 SMP Fri Jul 13 11:40:51 CDT 2012 x86_64 x86_64 x86_64 GNU/Linux HBase test failures when running against Hadoop 2 - Key: YARN-449 URL: https://issues.apache.org/jira/browse/YARN-449 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Siddharth Seth Priority: Blocker Attachments: 7904-v5.txt, hbase-7904-v3.txt, hbase-TestHFileOutputFormat-wip.txt, hbase-TestingUtility-wip.txt, minimr_randomdir-branch2.txt Post YARN-429, unit tests for HBase continue to fail since the classpath for the MRAppMaster is not being set correctly. Reverting YARN-129 may fix this, but I'm not sure that's the correct solution. My guess is, as Alexandro pointed out in YARN-129, maven classloader magic is messing up java.class.path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications
Kendall Thrapp created YARN-473: --- Summary: Capacity Scheduler webpage and REST API not showing correct number of pending applications Key: YARN-473 URL: https://issues.apache.org/jira/browse/YARN-473 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.6 Reporter: Kendall Thrapp The Capacity Scheduler REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API) is not returning the correct number of pending applications. numPendingApplications is almost always zero, even if there are dozens of pending apps. In investigating this, I discovered that the Resource Manager's Scheduler webpage is als showing an incorrect but different number of pending applications. For example, the cluster I'm looking at right now currently has 15 applications in the ACCEPTED state, but the Cluster Metrics table near the top of the page says there are only 2 pending apps. The REST API says there are zero pending apps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications
[ https://issues.apache.org/jira/browse/YARN-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kendall Thrapp updated YARN-473: Description: The Capacity Scheduler REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API) is not returning the correct number of pending applications. numPendingApplications is almost always zero, even if there are dozens of pending apps. In investigating this, I discovered that the Resource Manager's Scheduler webpage is also showing an incorrect but different number of pending applications. For example, the cluster I'm looking at right now currently has 15 applications in the ACCEPTED state, but the Cluster Metrics table near the top of the page says there are only 2 pending apps. The REST API says there are zero pending apps. was: The Capacity Scheduler REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API) is not returning the correct number of pending applications. numPendingApplications is almost always zero, even if there are dozens of pending apps. In investigating this, I discovered that the Resource Manager's Scheduler webpage is als showing an incorrect but different number of pending applications. For example, the cluster I'm looking at right now currently has 15 applications in the ACCEPTED state, but the Cluster Metrics table near the top of the page says there are only 2 pending apps. The REST API says there are zero pending apps. Capacity Scheduler webpage and REST API not showing correct number of pending applications -- Key: YARN-473 URL: https://issues.apache.org/jira/browse/YARN-473 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.6 Reporter: Kendall Thrapp The Capacity Scheduler REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API) is not returning the correct number of pending applications. numPendingApplications is almost always zero, even if there are dozens of pending apps. In investigating this, I discovered that the Resource Manager's Scheduler webpage is also showing an incorrect but different number of pending applications. For example, the cluster I'm looking at right now currently has 15 applications in the ACCEPTED state, but the Cluster Metrics table near the top of the page says there are only 2 pending apps. The REST API says there are zero pending apps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-462) Project Parameter for Chargeback
[ https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601601#comment-13601601 ] Andy Rhee commented on YARN-462: Kendall - Again, another great idea! Two things popped in my mind. 1. I wonder if we need to also verify and enforce project validity on a given cluster mapped to a whitelist or blacklist in the cluster config (this might even be tied to external source of truth like LDAP later) or decouple or delegate validation to other parts or external process, e.g. queue, user, or project accounting. 2. Another interesting spin off of your idea could be flexible enforceable parameters or meta config. Instead of keep modifying the code every time we have a great idea for a new parameter to enforce, it may be more cost effective to allow admins to define enforceable parameters in the cluster config, so that we don't have to worry about what to name new parameter or changing it later, IMHO :) Project Parameter for Chargeback Key: YARN-462 URL: https://issues.apache.org/jira/browse/YARN-462 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Problem Summary For the purpose of chargeback and better understanding of grid usage, we need to be able to associate applications with projects, e.g. pipeline X, property Y. This would allow us to aggregate on this property, thereby helping us compute grid resource usage for the entire project. Currently, for a given application, two things we know about it are the user that submitted it and the queue it was submitted to. Below, I'll explain why neither of these is adequate for enterprise-level chargeback and understanding resource allocation needs. Why Not Users? Its not individual users that are paying the bill -- its projects. When one of our real users submits an application on a Hadoop grid, they're presumably not usually doing it for themselves. They're doing work for some project or team effort, so its that team or project that should be charged for all its users applications. Maintaining outside lists of associations between users and projects is error-prone because it is time-sensitive and requires continued ongoing maintenance. New users join organizations, users leave and users even change projects. Furthermore, users may split their time between multiple projects, making it ambiguous as to which of a user's projects a given application should be charged. Also, there can be headless users, which can be even more difficult to link to a project and can be shared between teams or projects. Why Not Queues? The purpose of queues is for scheduling. Overloading the queues concept to also mean who should be charged for an application can have a detrimental effect on the primary purpose of queues. It could be manageable in the case of a very small number of projects sharing a cluster, but doesn't scale to tens or hundreds of projects sharing a cluster. If a given cluster is shared between 50 projects, creating 50 separate queues will result in inefficient use of the cluster resources. Furthermore, a given project may desire more than one queue for different types or priorities of applications. Proposed Solution Rather than relying on external tools to infer through the user and/or queue who to charge for a given application, I propose a straightforward approach where that information be explicitly supplied when the application is submitted, just like we do with queues. Let's use a charge card analogy: when you buy something online, you don't just say who you are and how to ship it, you also specify how you're paying for it. Similarly, when submitting an application in YARN, you could explicitly specify to whom it's resource usage should be associated (a project, team, cost center, etc). This new configuration parameter should default to being optional, so that organizations not interested in chargeback or project-level resource tracking can happily continue on as if it wasn't there. However, it should be configurable at the cluster-level such that, a given cluster to could elect to make it required, so that all applications would have an associated project. The value of this new parameter should be exposed via the Resource Manager UI and Resource Manager REST API, so that users and tools can make use of it for chargeback, utilization metrics, etc. I'm undecided on what to name the new parameter, as I like the flexibility in the ways it could be used. It is essentially just an additional party other than user or queue that an application can be associated with, so its use is not just limited to a chargeback
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601625#comment-13601625 ] Bikas Saha commented on YARN-378: - +1 for Vinods comments. Also, personally, I would break down the following code in 2 places. First in some init method that reads the global value from config, checks for errors and sets a sensible default global value. Once that is done, use the appValue and globalValue to set the actual value. The current code is making me think more than I need to IMO. {code} +int numRMAMRetries = conf.getInt(YarnConfiguration.RM_AM_MAX_RETRIES, YarnConfiguration.DEFAULT_RM_AM_MAX_RETRIES); +int numAPPAMRetries = submissionContext.getNumMaxRetries(); +if (numAPPAMRetries = 0) { + if (numRMAMRetries = 0) { +// AM needs to try once at least +this.maxRetries = 1; +LOG.error(AM Retries is wrongly configured. The specific AM Retries: ++ numAPPAMRetries + for application: ++ applicationId.getId() + , the global AM Retries: ++ numRMAMRetries); + } else { +this.maxRetries = numRMAMRetries; + } +} else { + if (numAPPAMRetries = numRMAMRetries) { +this.maxRetries = numAPPAMRetries; + } else { +this.maxRetries = numRMAMRetries; +LOG.warn(The specific AM Retries: + numAPPAMRetries ++ for application: + applicationId.getId() ++ is larger than the global AM Retries: + numRMAMRetries ++ . Use the global AM Retries instead.); + } +} {code} Secondly, IMO the use of Retry in the name is confusing since we need a minimum value 1 for the first attempt and the first attempt is not a retry. alternative name could be maxAppAttempts If we continue to use retry in the name then its value should be 0 if the attempt is launched only once, since number of retries = 0. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-472) MR app master deletes staging dir when sent a reboot command from the RM
[ https://issues.apache.org/jira/browse/YARN-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-472: Summary: MR app master deletes staging dir when sent a reboot command from the RM (was: MR Job falied if RM restarted when the job is running) MR app master deletes staging dir when sent a reboot command from the RM Key: YARN-472 URL: https://issues.apache.org/jira/browse/YARN-472 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: jian he Assignee: jian he If the RM is restarted when the MR job is running , the job failed because the staging directory is cleaned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-472) MR app master deletes staging dir when sent a reboot command from the RM
[ https://issues.apache.org/jira/browse/YARN-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-472: Description: If the RM is restarted when the MR job is running, then it sends a reboot command to the job. The job ends up deleting the staging dir and that causes the next attempt to fail. (was: If the RM is restarted when the MR job is running , the job failed because the staging directory is cleaned. ) MR app master deletes staging dir when sent a reboot command from the RM Key: YARN-472 URL: https://issues.apache.org/jira/browse/YARN-472 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: jian he Assignee: jian he If the RM is restarted when the MR job is running, then it sends a reboot command to the job. The job ends up deleting the staging dir and that causes the next attempt to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-462) Project Parameter for Chargeback
[ https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601652#comment-13601652 ] Karthik Kambatla commented on YARN-462: --- Fair points, Kendall. Thanks for the detailed explanation. As Arun said, the idea seems to be a very useful one, but we should be wary of adding new concepts to YARN. If we decide to go ahead with the chargeback parameter, I am concerned if we will end up duplicating a lot of scheduler code - ACLs, enforcement etc. I wonder if the following would satisfy your requirements while leveraging all queue definition/ACLs logic and not overloading the scheduler: - Idea of a 'project' queue that goes under the leaf queues. These 'project' queues are transparent to the scheduler at scheduling time, but keep track of the actual usage. - e.g. root.sales.seller1.sell-coconut-project and root.sales.seller1.sell-pineapple-project could be two queues for seller1. At schedule time, the scheduler views all jobs under both projects to be under seller1 and we hopefully won't run into capacity 1 issues you are mentioning. Neither does it increase the scheduling latency. Project Parameter for Chargeback Key: YARN-462 URL: https://issues.apache.org/jira/browse/YARN-462 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Problem Summary For the purpose of chargeback and better understanding of grid usage, we need to be able to associate applications with projects, e.g. pipeline X, property Y. This would allow us to aggregate on this property, thereby helping us compute grid resource usage for the entire project. Currently, for a given application, two things we know about it are the user that submitted it and the queue it was submitted to. Below, I'll explain why neither of these is adequate for enterprise-level chargeback and understanding resource allocation needs. Why Not Users? Its not individual users that are paying the bill -- its projects. When one of our real users submits an application on a Hadoop grid, they're presumably not usually doing it for themselves. They're doing work for some project or team effort, so its that team or project that should be charged for all its users applications. Maintaining outside lists of associations between users and projects is error-prone because it is time-sensitive and requires continued ongoing maintenance. New users join organizations, users leave and users even change projects. Furthermore, users may split their time between multiple projects, making it ambiguous as to which of a user's projects a given application should be charged. Also, there can be headless users, which can be even more difficult to link to a project and can be shared between teams or projects. Why Not Queues? The purpose of queues is for scheduling. Overloading the queues concept to also mean who should be charged for an application can have a detrimental effect on the primary purpose of queues. It could be manageable in the case of a very small number of projects sharing a cluster, but doesn't scale to tens or hundreds of projects sharing a cluster. If a given cluster is shared between 50 projects, creating 50 separate queues will result in inefficient use of the cluster resources. Furthermore, a given project may desire more than one queue for different types or priorities of applications. Proposed Solution Rather than relying on external tools to infer through the user and/or queue who to charge for a given application, I propose a straightforward approach where that information be explicitly supplied when the application is submitted, just like we do with queues. Let's use a charge card analogy: when you buy something online, you don't just say who you are and how to ship it, you also specify how you're paying for it. Similarly, when submitting an application in YARN, you could explicitly specify to whom it's resource usage should be associated (a project, team, cost center, etc). This new configuration parameter should default to being optional, so that organizations not interested in chargeback or project-level resource tracking can happily continue on as if it wasn't there. However, it should be configurable at the cluster-level such that, a given cluster to could elect to make it required, so that all applications would have an associated project. The value of this new parameter should be exposed via the Resource Manager UI and Resource Manager REST API, so that users and tools can make use of it for chargeback, utilization metrics, etc. I'm undecided on what to name the new parameter, as I like the flexibility in the ways it
[jira] [Created] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed
Hitesh Shah created YARN-474: Summary: CapacityScheduler does not activate applications when configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed
[ https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-474: - Component/s: capacityscheduler CapacityScheduler does not activate applications when configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Hitesh Shah Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed
[ https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-474: - Target Version/s: 2.0.5-beta CapacityScheduler does not activate applications when configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Hitesh Shah Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601718#comment-13601718 ] Zhijie Shen commented on YARN-378: -- @Robert, if RM is supposed to inform AM about the number, it seems to happen no early than AM registration. Otherwise, is the launch environment of the AM container possible to set by RM? Such that AM can got the number when it is constructed? @Bikas, I like maxAppAttempts better, and the computation logic doesn't need to be changed (i.e., otherwise, retries + 1). ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601744#comment-13601744 ] Hitesh Shah commented on YARN-378: -- How about changing the AMLauncher to add the last retry information into the AM's env? ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
Hitesh Shah created YARN-475: Summary: Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-476) ProcfsBasedProcessTree info message confuses users
Jason Lowe created YARN-476: --- Summary: ProcfsBasedProcessTree info message confuses users Key: YARN-476 URL: https://issues.apache.org/jira/browse/YARN-476 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.6 Reporter: Jason Lowe ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such as the following: {noformat} 2013-03-13 12:41:51,957 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may have finished in the interim. 2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may have finished in the interim. 2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may have finished in the interim. {noformat} As described in MAPREDUCE-4570, this is something that naturally occurs in the process of monitoring processes via procfs. It's uninteresting at best and can confuse users who think it's a reason their job isn't running as expected when it appears in their logs. We should either make this DEBUG or remove it entirely. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601770#comment-13601770 ] Bikas Saha commented on YARN-378: - How about getting an estimate on MAPREDUCE-5062 effort before going down the path of env vars. Env vars are brittle and something like this should come clearly from the API rather than env vars IMO. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601810#comment-13601810 ] Zhijie Shen commented on YARN-475: -- ApplicationConstants.AM_APP_ATTEMPT_ID_ENV seems to be still used by unmanaged AM. See the following code in UnmanagedAMLauncher. {code} if(!setClasspath classpath!=null) { envAMList.add(CLASSPATH=+classpath); } envAMList.add(ApplicationConstants.AM_APP_ATTEMPT_ID_ENV + = + attemptId); String[] envAM = new String[envAMList.size()]; Process amProc = Runtime.getRuntime().exec(amCmd, envAMList.toArray(envAM)); {code} Also, it is still checked in the AM of distributed shell. {code} if (envs.containsKey(ApplicationConstants.AM_APP_ATTEMPT_ID_ENV)) { appAttemptID = ConverterUtils.toApplicationAttemptId(envs .get(ApplicationConstants.AM_APP_ATTEMPT_ID_ENV)); } {code} Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment --- Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601813#comment-13601813 ] Bikas Saha commented on YARN-378: - If its too much work in the MR AM then we could set the env in addition to the API. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601838#comment-13601838 ] Bikas Saha commented on YARN-475: - Even when we remove it - is there some helper lib/api to help AM's derive appid, attempt number etc from the container_id? Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment --- Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601883#comment-13601883 ] Hitesh Shah commented on YARN-475: -- {code} // get containerIdStr from environment ContainerId containerId = ConverterUtils.toContainerId(containerIdStr); ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId(); {code} Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment --- Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601897#comment-13601897 ] Eli Reisman commented on YARN-475: -- Thanks, I was just going to ask this. So the containerId is the right place to get an app id from the AM's container environment? I think I'd doing this in my Giraph-YARN patch already but I will check. Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment --- Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601905#comment-13601905 ] Hitesh Shah commented on YARN-475: -- Both DistributedShell and UnmanagedAM use it currently but we should remove its usage as it is definitely being set in the environment by the RM's AMLauncher. Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment --- Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-437) Update documentation of Writing Yarn applications to match current best practices
[ https://issues.apache.org/jira/browse/YARN-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601913#comment-13601913 ] Eli Reisman commented on YARN-437: -- I agree. Its painful to wait but this won't get done very often and having used the old and new APIs now, I would say this is worth waiting for. An overhaul of that document is a must in the near future, though! Update documentation of Writing Yarn applications to match current best practices - Key: YARN-437 URL: https://issues.apache.org/jira/browse/YARN-437 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Labels: usability Should fix docs to point to usage of YarnClient and AMRMClient helper libs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-226) Log aggregation should not assume an AppMaster will have containerId 1
[ https://issues.apache.org/jira/browse/YARN-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601952#comment-13601952 ] Siddharth Seth commented on YARN-226: - bq. The Giraph to YARN port assumes giraph tasks start at 2 and up as far as container #'s go. Is this unsafe for the future? One scenario in which the AM does not get container id 1 is when it requires more resources than the minimum allocation - in which case reservations come in to play. Depending on whether the reservation is the final allocation or whether it happens elsewhere - the container id may not be one. Similarly, assuming the container IDs are contiguous is not valid. IDs can be skipped. Log aggregation should not assume an AppMaster will have containerId 1 -- Key: YARN-226 URL: https://issues.apache.org/jira/browse/YARN-226 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth In case of reservcations, etc - AppMasters may not get container id 1. We likely need additional info in the CLC / tokens indicating whether a container is an AM or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed
[ https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-474: --- Target Version/s: 0.23.7, 2.0.5-beta (was: 2.0.5-beta) Affects Version/s: 2.0.3-alpha 0.23.6 CapacityScheduler does not activate applications when configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Hitesh Shah Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-440) Flatten RegisterNodeManagerResponse
[ https://issues.apache.org/jira/browse/YARN-440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-440: -- Assignee: Xuan Gong Flatten RegisterNodeManagerResponse --- Key: YARN-440 URL: https://issues.apache.org/jira/browse/YARN-440 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Xuan Gong RegisterNodeManagerResponse has another wrapper RegistrationResponse under it, which can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-439) Flatten NodeHeartbeatResponse
[ https://issues.apache.org/jira/browse/YARN-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-439: -- Assignee: Xuan Gong Flatten NodeHeartbeatResponse - Key: YARN-439 URL: https://issues.apache.org/jira/browse/YARN-439 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Xuan Gong NodeheartbeatResponse has another wrapper HeartbeatResponse under it, which can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602042#comment-13602042 ] Siddharth Seth commented on YARN-71: Comments on the latest patch. - timestamp can moce out - so that the same ts is used across all local dirs. - Instead of scheduling old files, then renaming the current files and scheduling additional deletes - this could change to just rename the current files, and schedule deletion once. In the unit test - Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602042#comment-13602042 ] Siddharth Seth edited comment on YARN-71 at 3/14/13 5:20 AM: - Comments on the latest patch. - timestamp can move out - so that the same ts is used across all local dirs. - Instead of scheduling old files, then renaming the current files and scheduling additional deletes - this could change to just rename the current files, and schedule deletion once. In the unit test - There's a couple of races. One when asserting state as RUNNING since the events may not have been processed. Second when asserting file delete, since that's also a separate thread. - Also, the test should verify the correct user being used for deletion; spy on the deletion service. - Minor, Use Records instead of RecordFactory Also, can you please mention how you've tested the patch. was (Author: sseth): Comments on the latest patch. - timestamp can moce out - so that the same ts is used across all local dirs. - Instead of scheduling old files, then renaming the current files and scheduling additional deletes - this could change to just rename the current files, and schedule deletion once. In the unit test - Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602055#comment-13602055 ] Vinod Kumar Vavilapalli commented on YARN-378: -- +1 for maxAppAttempts naming. +1 to Bobby's proposal to add it to env. We are sending across other important things like app-attempt-id as part of the env, so +1 for adding this info too. bq. First in some init method that reads the global value from config, checks for errors and sets a sensible default global value. Yes, this should happen somewhere in the main thread and crash the RM in case of invalid configs. RMApp gets created much later, so.. bq. Env vars are brittle.. I suppose this is on Windows? ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira