[jira] [Created] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
Chen He created YARN-3324: - Summary: TestDockerContainerExecutor should clean test docker image from local repository after test is done Key: YARN-3324 URL: https://issues.apache.org/jira/browse/YARN-3324 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Chen He Current TestDockerContainerExecutor only cleans the temp directory in local file system but leaves the test docker image in local docker repository. It should be cleaned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3323) Task UI, sort by name doesn't work
[ https://issues.apache.org/jira/browse/YARN-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354405#comment-14354405 ] Brahma Reddy Battula commented on YARN-3323: [~ajisakaa] Kindly review the attached patch.. > Task UI, sort by name doesn't work > -- > > Key: YARN-3323 > URL: https://issues.apache.org/jira/browse/YARN-3323 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.5.1 >Reporter: Thomas Graves >Assignee: Brahma Reddy Battula > Attachments: YARN-3323.patch > > > If you go to the MapReduce ApplicationMaster or HistoryServer UI and open the > list of tasks, then try to sort by the task name/id, it does nothing. > Note that if you go to the task attempts, that seem to sort fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3323) Task UI, sort by name doesn't work
[ https://issues.apache.org/jira/browse/YARN-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3323: --- Attachment: YARN-3323.patch > Task UI, sort by name doesn't work > -- > > Key: YARN-3323 > URL: https://issues.apache.org/jira/browse/YARN-3323 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.5.1 >Reporter: Thomas Graves >Assignee: Brahma Reddy Battula > Attachments: YARN-3323.patch > > > If you go to the MapReduce ApplicationMaster or HistoryServer UI and open the > list of tasks, then try to sort by the task name/id, it does nothing. > Note that if you go to the task attempts, that seem to sort fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354394#comment-14354394 ] Rohith commented on YARN-3305: -- Updated the patch for normalizing ResourceRequest when there is attempt is added. Kindly review the pach. > AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is > less than minimumAllocation > > > Key: YARN-3305 > URL: https://issues.apache.org/jira/browse/YARN-3305 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith > Attachments: 0001-YARN-3305.patch > > > For given any ResourceRequest, {{CS#allocate}} normalizes request to > minimumAllocation if requested memory is less than minimumAllocation. > But AM-used resource is updated with actual ResourceRequest made by user. > This results in AM container allocation more than Max ApplicationMaster > Resource. > This is because AM-Used is updated with actual ResourceRequest made by user > while activating the applications. But during allocation of container, > ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354393#comment-14354393 ] Rohith commented on YARN-3305: -- ResourceRequest's are normalized when CS#allocate has been invoked. But AMUsed is updated while activating the applications which is earlier to CS#allocate call. For AM ResourceRequest, normization should be done while adding attempt only. > AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is > less than minimumAllocation > > > Key: YARN-3305 > URL: https://issues.apache.org/jira/browse/YARN-3305 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith > Attachments: 0001-YARN-3305.patch > > > For given any ResourceRequest, {{CS#allocate}} normalizes request to > minimumAllocation if requested memory is less than minimumAllocation. > But AM-used resource is updated with actual ResourceRequest made by user. > This results in AM container allocation more than Max ApplicationMaster > Resource. > This is because AM-Used is updated with actual ResourceRequest made by user > while activating the applications. But during allocation of container, > ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3305: - Attachment: 0001-YARN-3305.patch > AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is > less than minimumAllocation > > > Key: YARN-3305 > URL: https://issues.apache.org/jira/browse/YARN-3305 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Rohith >Assignee: Rohith > Attachments: 0001-YARN-3305.patch > > > For given any ResourceRequest, {{CS#allocate}} normalizes request to > minimumAllocation if requested memory is less than minimumAllocation. > But AM-used resource is updated with actual ResourceRequest made by user. > This results in AM container allocation more than Max ApplicationMaster > Resource. > This is because AM-Used is updated with actual ResourceRequest made by user > while activating the applications. But during allocation of container, > ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354362#comment-14354362 ] Naganarasimha G R commented on YARN-2495: - Hi [~wangda], 1) IMO method name was not readable when it was {{setAreNodeLabelsSet}} but i have changed it to {{setAreNodeLabelsSetInReq}} i feel this is sufficient. setAreNodeLabelsUpdated is same as earlier for which Craig had commented (which i also feel valid) {quote} I would go with areNodeLablesSet (all "isNodeLabels" => "areNodeLabels" wherever it appears, actually) - wrt "Set" vs "Updated" - this is primarily a workaround for the null/empty ambiguity and I think this name better reflects what is really going on (am I sending a value to act on or not), but I also think that this is a better contract, the receiver (rm) shouldn't really care about the logic the nm side is using to decide whether or not to set it's labels (freshness, "updatedness", whatever), so all that should be communicated in the api is whether or not the value is set, not whether it's an update/whether it's checking freshness, etc. that's a nit, but I think it's a clearer name. {quote} Yes true lets finalize the name this time after that will start working on the patch if not it will be a wasted effort 5) {quote} It will be problematic to ask admins make NM/RM configuration keep synchronized, so I don't want (and also not necessary) NM depends on RM's configuration. So I suggest to make a changes: In NodeManager.java: when user doesn't configure provider, it should be null. In your patch, you can return a null directly, and YARN-2729 will implement the logic of instancing provider from config. In NodeStatusUpdaterImpl: avoid using isDistributedNodeLabelsConf, since we will not have "distributedNodeLabelConf" in NM side if you agree on previously comment, instead, it will check null of provider. {quote} Well modifications side is clear to me but is it good to allow the configurations being different from NM and RM ? Infact i wanted to discuss regarding whether to send shutdown during register if NM is configured differently from RM, but waited for the base changes to go in before discussing new stuff. 8) ??You can add an additional comments in line 626 for this.?? Ok will add a comment in LabelProvider.getLabels , Idea is LabelProvider is expected to give same Labels continiously untill there is a change and if null or empty is returned then No label is assumed 10) {{updateNodeLabelsInNodeLabelsManager -> updateNodeLabelsFromNMReport}} : will take care in next patch {{LOG.info(... accepted from RM, use LOG.debug and check isDebugEnabled.}} : I feel better to Log this as "Error" as we are sending the labels only in case of any change and there has to be some way to identify if labels for a given NM and also currently we are sending out shutdown signal too. ??Make errorMessage clear: indicate 1# this is node labels reported from NM, and 2# it's failed to be put to RM instead of "not properly configured".?? i think i have captured first point, but any way will reframe it as {{"Node Labels reported from the NM with id were rejected from RM with exception message as .}} ??Another thing we should do is, when distributed node label configuration is set, any direct modify node to labels mapping from RMAdminCLI should be rejected (like -replaceNodeToLabels).?? Will work on this once 2495 and 2729 are done .. Thanks [~vinodkv] & [~cwelch] for reviewing it ??configuration.type -> configuration-type?? will take care in next patch {quote} Should RegisterNodeManagerRequestProto.nodeLabels be a set instead? Do we really need NodeHeartbeatRequest.areNodeLabelsSetInReq()? Why not just look at the set as mentioned in the previous comment? {quote} Well as craig informed, RegisterNodeManagerRequestProto.nodeLabels is already a set but as by default empty set is provided by protoc, its req to inform whether labels are set as part of request hence areNodeLabelsSetInReq is required. ??RegisterNodeManagerRequest is getting changed. It will be interesting to reason about rolling-upgrades in this scenario.?? Well though i am not much aware of Rolling upgrades, i don't see any problems in a normal case because RM tries to read the labels from NM's req only when its distributed conf and also {{areNodeLabelsSetInReq}} is by default false. But I had queries when some existing setup they want to modify to distributed conf setup # Whether we need to send shutdown during register if NM is configured differently from RM ? # Will the new configurations be added in NM and RM and then Rolling upgrade will be done ? or we do rolling upgrade first and then reconfigure & restart RM's and NM's ??How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration?? Well i have t
[jira] [Updated] (YARN-3323) Task UI, sort by name doesn't work
[ https://issues.apache.org/jira/browse/YARN-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3323: Summary: Task UI, sort by name doesn't work (was: MR Task UI, sort by name doesn't work) Moving to YARN project. > Task UI, sort by name doesn't work > -- > > Key: YARN-3323 > URL: https://issues.apache.org/jira/browse/YARN-3323 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.5.1 >Reporter: Thomas Graves >Assignee: Brahma Reddy Battula > > If you go to the MapReduce ApplicationMaster or HistoryServer UI and open the > list of tasks, then try to sort by the task name/id, it does nothing. > Note that if you go to the task attempts, that seem to sort fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3323) MR Task UI, sort by name doesn't work
[ https://issues.apache.org/jira/browse/YARN-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA moved MAPREDUCE-6102 to YARN-3323: Component/s: (was: webapps) webapp Target Version/s: (was: 2.6.0) Affects Version/s: (was: 2.5.1) 2.5.1 Key: YARN-3323 (was: MAPREDUCE-6102) Project: Hadoop YARN (was: Hadoop Map/Reduce) > MR Task UI, sort by name doesn't work > - > > Key: YARN-3323 > URL: https://issues.apache.org/jira/browse/YARN-3323 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.5.1 >Reporter: Thomas Graves >Assignee: Brahma Reddy Battula > > If you go to the MapReduce ApplicationMaster or HistoryServer UI and open the > list of tasks, then try to sort by the task name/id, it does nothing. > Note that if you go to the task attempts, that seem to sort fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2172) Suspend/Resume Hadoop Jobs
[ https://issues.apache.org/jira/browse/YARN-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354306#comment-14354306 ] Hadoop QA commented on YARN-2172: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658578/hadoop_job_suspend_resume.patch against trunk revision 47f7f18. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6899//console This message is automatically generated. > Suspend/Resume Hadoop Jobs > -- > > Key: YARN-2172 > URL: https://issues.apache.org/jira/browse/YARN-2172 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager, webapp >Affects Versions: 2.2.0 > Environment: CentOS 6.5, Hadoop 2.2.0 >Reporter: Richard Chen > Labels: hadoop, jobs, resume, suspend > Attachments: Hadoop Job Suspend Resume Design.docx, > hadoop_job_suspend_resume.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > In a multi-application cluster environment, jobs running inside Hadoop YARN > may be of lower-priority than jobs running outside Hadoop YARN like HBase. To > give way to other higher-priority jobs inside Hadoop, a user or some > cluster-level resource scheduling service should be able to suspend and/or > resume some particular jobs within Hadoop YARN. > When target jobs inside Hadoop are suspended, those already allocated and > running task containers will continue to run until their completion or active > preemption by other ways. But no more new containers would be allocated to > the target jobs. In contrast, when suspended jobs are put into resume mode, > they will continue to run from the previous job progress and have new task > containers allocated to complete the rest of the jobs. > My team has completed its implementation and our tests showed it works in a > rather solid and convenient way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package
[ https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-745: -- Fix Version/s: (was: 2.7.0) > Move UnmanagedAMLauncher to yarn client package > --- > > Key: YARN-745 > URL: https://issues.apache.org/jira/browse/YARN-745 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Bikas Saha > > Its currently sitting in yarn applications project which sounds wrong. client > project sounds better since it contains the utilities/libraries that clients > use to write and debug yarn applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354279#comment-14354279 ] Rohith commented on YARN-3273: -- Thanks Jian He for your suggestion:-) Overall summary to be in right direction. I am assuming that all scheduler changes is only for CS. Is there any common scheduelr changes to be done ? # Headroom will be dispalyed in application attempt page. This will be set as 0 once the attempt is finished. # For each leaf queue in CS, UsedAMResource,UsedUserAMResource, 'User Limit for User' will be displayed. # In Active User, for each user link will be provided which redirect to additional filtered user page containing userInfo in table as above sample table. This is also applicable only for CS. # All active users table wont be rendered. Instead only link will be provided for each user i.e step-3 in active user. Am I understading is correct? > Improve web UI to facilitate scheduling analysis and debugging > -- > > Key: YARN-3273 > URL: https://issues.apache.org/jira/browse/YARN-3273 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Rohith > Attachments: 0001-YARN-3273-v1.patch, > YARN-3273-am-resource-used-AND-User-limit.PNG, > YARN-3273-application-headroom.PNG > > > Job may be stuck for reasons such as: > - hitting queue capacity > - hitting user-limit, > - hitting AM-resource-percentage > The first queueCapacity is already shown on the UI. > We may surface things like: > - what is user's current usage and user-limit; > - what is the AM resource usage and limit; > - what is the application's current HeadRoom; > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2784) Yarn project module names in POM needs to consistent acros hadoop project
[ https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2784: --- Component/s: (was: test) build > Yarn project module names in POM needs to consistent acros hadoop project > - > > Key: YARN-2784 > URL: https://issues.apache.org/jira/browse/YARN-2784 > Project: Hadoop YARN > Issue Type: Improvement > Components: build >Reporter: Rohith >Assignee: Rohith >Priority: Minor > Attachments: YARN-2784.patch > > > All yarn and mapreduce pom.xml has project name has > hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop > projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop > MapReduce ". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2172) Suspend/Resume Hadoop Jobs
[ https://issues.apache.org/jira/browse/YARN-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2172: --- Fix Version/s: (was: 2.2.0) > Suspend/Resume Hadoop Jobs > -- > > Key: YARN-2172 > URL: https://issues.apache.org/jira/browse/YARN-2172 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager, webapp >Affects Versions: 2.2.0 > Environment: CentOS 6.5, Hadoop 2.2.0 >Reporter: Richard Chen > Labels: hadoop, jobs, resume, suspend > Attachments: Hadoop Job Suspend Resume Design.docx, > hadoop_job_suspend_resume.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > In a multi-application cluster environment, jobs running inside Hadoop YARN > may be of lower-priority than jobs running outside Hadoop YARN like HBase. To > give way to other higher-priority jobs inside Hadoop, a user or some > cluster-level resource scheduling service should be able to suspend and/or > resume some particular jobs within Hadoop YARN. > When target jobs inside Hadoop are suspended, those already allocated and > running task containers will continue to run until their completion or active > preemption by other ways. But no more new containers would be allocated to > the target jobs. In contrast, when suspended jobs are put into resume mode, > they will continue to run from the previous job progress and have new task > containers allocated to complete the rest of the jobs. > My team has completed its implementation and our tests showed it works in a > rather solid and convenient way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-965) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed
[ https://issues.apache.org/jira/browse/YARN-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-965: -- Fix Version/s: (was: 2.7.0) > NodeManager Metrics containersRunning is not correct When localizing > container process is failed or killed > -- > > Key: YARN-965 > URL: https://issues.apache.org/jira/browse/YARN-965 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.4-alpha > Environment: suse linux >Reporter: Li Yuan > > When successfully launched a container, container state from LOCALIZED to > RUNNING, containersRunning ++. Container state from EXITED_WITH_FAILURE or > KILLING to DONE, containersRunning--. > However, state EXITED_WITH_FAILURE or KILLING could come from > LOCALIZING(LOCALIZED), not RUNNING, which caused containersRunningis less > than the actual number. Further more, Metrics is wrong, containersLaunched != > containersCompleted + containersFailed + containersKilled + containersRunning > + containersIniting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1147) Add end-to-end tests for HA
[ https://issues.apache.org/jira/browse/YARN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-1147: --- Fix Version/s: (was: 2.7.0) > Add end-to-end tests for HA > --- > > Key: YARN-1147 > URL: https://issues.apache.org/jira/browse/YARN-1147 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Xuan Gong > > While individual sub-tasks add tests for the code they include, it will be > handy to write end-to-end tests for HA including some stress testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-113) WebAppProxyServlet must use SSLFactory for the HttpClient connections
[ https://issues.apache.org/jira/browse/YARN-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-113: -- Fix Version/s: (was: 2.7.0) > WebAppProxyServlet must use SSLFactory for the HttpClient connections > - > > Key: YARN-113 > URL: https://issues.apache.org/jira/browse/YARN-113 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.3-alpha >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > > The HttpClient must be configured to use the SSLFactory when the web UIs are > over HTTPS, otherwise the proxy servlet fails to connect to the AM because of > unknown (self-signed) certificates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS
[ https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-153: -- Fix Version/s: (was: 2.7.0) > PaaS on YARN: an YARN application to demonstrate that YARN can be used as a > PaaS > > > Key: YARN-153 > URL: https://issues.apache.org/jira/browse/YARN-153 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jacob Jaigak Song >Assignee: Jacob Jaigak Song > Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, > MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, > MAPREDUCE4393.patch > > Original Estimate: 336h > Time Spent: 336h > Remaining Estimate: 0h > > This application is to demonstrate that YARN can be used for non-mapreduce > applications. As Hadoop has already been adopted and deployed widely and its > deployment in future will be highly increased, we thought that it's a good > potential to be used as PaaS. > I have implemented a proof of concept to demonstrate that YARN can be used as > a PaaS (Platform as a Service). I have done a gap analysis against VMware's > Cloud Foundry and tried to achieve as many PaaS functionalities as possible > on YARN. > I'd like to check in this POC as a YARN example application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2784) Yarn project module names in POM needs to consistent acros hadoop project
[ https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2784: --- Fix Version/s: (was: 2.7.0) > Yarn project module names in POM needs to consistent acros hadoop project > - > > Key: YARN-2784 > URL: https://issues.apache.org/jira/browse/YARN-2784 > Project: Hadoop YARN > Issue Type: Improvement > Components: test >Reporter: Rohith >Assignee: Rohith >Priority: Minor > Attachments: YARN-2784.patch > > > All yarn and mapreduce pom.xml has project name has > hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop > projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop > MapReduce ". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-314: -- Fix Version/s: (was: 2.7.0) > Schedulers should allow resource requests of different sizes at the same > priority and location > -- > > Key: YARN-314 > URL: https://issues.apache.org/jira/browse/YARN-314 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza > Attachments: yarn-314-prelim.patch > > > Currently, resource requests for the same container and locality are expected > to all be the same size. > While it it doesn't look like it's needed for apps currently, and can be > circumvented by specifying different priorities if absolutely necessary, it > seems to me that the ability to request containers with different resource > requirements at the same priority level should be there for the future and > for completeness sake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2890: --- Fix Version/s: (was: 2.7.0) > MiniMRYarnCluster should turn on timeline service if configured to do so > > > Key: YARN-2890 > URL: https://issues.apache.org/jira/browse/YARN-2890 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, > YARN-2890.patch > > > Currently the MiniMRYarnCluster does not consider the configuration value for > enabling timeline service before starting. The MiniYarnCluster should only > start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS
[ https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354236#comment-14354236 ] Hudson commented on YARN-3300: -- FAILURE: Integrated in Hadoop-trunk-Commit #7293 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7293/]) YARN-3300. Outstanding_resource_requests table should not be shown in AHS. Contributed by Xuan Gong (jianhe: rev c3003eba6f9802f15699564a5eb7c6e34424cb14) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppAttemptPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/CHANGES.txt > outstanding_resource_requests table should not be shown in AHS > -- > > Key: YARN-3300 > URL: https://issues.apache.org/jira/browse/YARN-3300 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.7.0 > > Attachments: YARN-3300.1.patch, YARN-3300.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-160: -- Fix Version/s: (was: 2.7.0) > nodemanagers should obtain cpu/memory values from underlying OS > --- > > Key: YARN-160 > URL: https://issues.apache.org/jira/browse/YARN-160 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.0.3-alpha >Reporter: Alejandro Abdelnur >Assignee: Varun Vasudev > Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, > apache-yarn-160.2.patch, apache-yarn-160.3.patch > > > As mentioned in YARN-2 > *NM memory and CPU configs* > Currently these values are coming from the config of the NM, we should be > able to obtain those values from the OS (ie, in the case of Linux from > /proc/meminfo & /proc/cpuinfo). As this is highly OS dependent we should have > an interface that obtains this information. In addition implementations of > this interface should be able to specify a mem/cpu offset (amount of mem/cpu > not to be avail as YARN resource), this would allow to reserve mem/cpu for > the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS
[ https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354227#comment-14354227 ] Jian He commented on YARN-3300: --- sounds good. committing > outstanding_resource_requests table should not be shown in AHS > -- > > Key: YARN-3300 > URL: https://issues.apache.org/jira/browse/YARN-3300 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3300.1.patch, YARN-3300.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly
[ https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-1142: --- Fix Version/s: (was: 2.7.0) > MiniYARNCluster web ui does not work properly > - > > Key: YARN-1142 > URL: https://issues.apache.org/jira/browse/YARN-1142 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur > > When going to the RM http port, the NM web ui is displayed. It seems there is > a singleton somewhere that breaks things when RM & NMs run in the same > process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3200) Factor OSType out from Shell: changes in YARN
[ https://issues.apache.org/jira/browse/YARN-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3200: --- Fix Version/s: (was: 2.7.0) > Factor OSType out from Shell: changes in YARN > - > > Key: YARN-3200 > URL: https://issues.apache.org/jira/browse/YARN-3200 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2902: --- Fix Version/s: (was: 2.7.0) > Killing a container that is localizing can orphan resources in the > DOWNLOADING state > > > Key: YARN-2902 > URL: https://issues.apache.org/jira/browse/YARN-2902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-2902.002.patch, YARN-2902.patch > > > If a container is in the process of localizing when it is stopped/killed then > resources are left in the DOWNLOADING state. If no other container comes > along and requests these resources they linger around with no reference > counts but aren't cleaned up during normal cache cleanup scans since it will > never delete resources in the DOWNLOADING state even if their reference count > is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3187: --- Fix Version/s: (was: 2.6.0) > Documentation of Capacity Scheduler Queue mapping based on user or group > > > Key: YARN-3187 > URL: https://issues.apache.org/jira/browse/YARN-3187 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, documentation >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Gururaj Shetty > Labels: documentation > Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch > > > YARN-2411 exposes a very useful feature {{support simple user and group > mappings to queues}} but its not captured in the documentation. So in this > jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3111) Fix ratio problem on FairScheduler page
[ https://issues.apache.org/jira/browse/YARN-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3111: --- Fix Version/s: (was: 2.7.0) > Fix ratio problem on FairScheduler page > --- > > Key: YARN-3111 > URL: https://issues.apache.org/jira/browse/YARN-3111 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Minor > Attachments: YARN-3111.1.patch, YARN-3111.png > > > Found 3 problems on FairScheduler page: > 1. Only compute memory for ratio even when queue schedulingPolicy is DRF. > 2. When min resources is configured larger than real resources, the steady > fair share ratio is so long that it is out the page. > 3. When cluster resources is 0(no nodemanager start), ratio is displayed as > "NaN% used" > Attached image shows the snapshot of above problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3322) RM/AM/JHS webservers should return HTTP.BadRequest for malformed requests and not HTTP.NotFound
[ https://issues.apache.org/jira/browse/YARN-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3923 to YARN-3322: --- Component/s: (was: webapps) (was: mrv2) Affects Version/s: (was: 0.23.0) Key: YARN-3322 (was: MAPREDUCE-3923) Project: Hadoop YARN (was: Hadoop Map/Reduce) > RM/AM/JHS webservers should return HTTP.BadRequest for malformed requests and > not HTTP.NotFound > --- > > Key: YARN-3322 > URL: https://issues.apache.org/jira/browse/YARN-3322 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bikas Saha > > Many webserver methods (eg. > AMWebServices.getTaskAttemptFromTaskAttemptString()) return NotFound for > malformed requests instead of BadRequest. > This would be inconsistent with expected HTTP behavior. Would be good to fix > them. NotFound should be returned for valid resources which dont exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS
[ https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354126#comment-14354126 ] Xuan Gong commented on YARN-3300: - bq. actually, after looking at the UI, on app page, there's a big blank space above the resource requests table, similarly for the attempt page. could you fix that too ? Thanks for the reviewing. Right now, both the attempt status in app block and the container status in appattempt block become the table now, and every table has a wrap which contains the mini-height as 302px. That is why we can see a big blank space. It might need some changes related to CSS/html. Anyway, the format issues will be fixed in YARN-3301. > outstanding_resource_requests table should not be shown in AHS > -- > > Key: YARN-3300 > URL: https://issues.apache.org/jira/browse/YARN-3300 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3300.1.patch, YARN-3300.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3321) "Health-Report" column of NodePage should display more information.
[ https://issues.apache.org/jira/browse/YARN-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3091 to YARN-3321: --- Component/s: (was: nodemanager) (was: resourcemanager) resourcemanager nodemanager Assignee: (was: Subroto Sanyal) Affects Version/s: (was: 0.23.0) Key: YARN-3321 (was: MAPREDUCE-3091) Project: Hadoop YARN (was: Hadoop Map/Reduce) > "Health-Report" column of NodePage should display more information. > --- > > Key: YARN-3321 > URL: https://issues.apache.org/jira/browse/YARN-3321 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Subroto Sanyal > Labels: javascript > > The Health-Checker script of the Nodes can run and generate some output, > error and exit code. > These information is not available in the GUI. > It is possible the Health-Checker script generates some statistics about > node. The same can displayed to GUI user. I suggest we display the > information in pop-up balloon(using CSS/Javascript)? > Any suggestions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3320) Support a Priority OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354035#comment-14354035 ] Craig Welch commented on YARN-3320: --- The initial intent is to bring the appropriate parts of the implementation of ApplicationPriorities from [YARN-2004] into the OrderingPolicy framework as a SchedulerComparator which can be composed with Fair and Fifo comparators to achieve Fair and Fifo behavior WITHIN priority bands > Support a Priority OrderingPolicy > - > > Key: YARN-3320 > URL: https://issues.apache.org/jira/browse/YARN-3320 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > > When [YARN-2004] is complete, bring relevant logic into the OrderingPolicy > framework -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3320) Support a Priority OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3320: -- Summary: Support a Priority OrderingPolicy (was: Support a Priority SchedulerOrderingPolicy composible with Fair and Fifo ordering) > Support a Priority OrderingPolicy > - > > Key: YARN-3320 > URL: https://issues.apache.org/jira/browse/YARN-3320 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > > When [YARN-2004] is complete, bring relevant logic into the OrderingPolicy > framework -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3320) Support a Priority SchedulerOrderingPolicy composible with Fair and Fifo ordering
Craig Welch created YARN-3320: - Summary: Support a Priority SchedulerOrderingPolicy composible with Fair and Fifo ordering Key: YARN-3320 URL: https://issues.apache.org/jira/browse/YARN-3320 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch When [YARN-2004] is complete, bring relevant logic into the OrderingPolicy framework -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354029#comment-14354029 ] Hadoop QA commented on YARN-1884: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703534/YARN-1884.2.patch against trunk revision d6e05c5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6897//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6897//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6897//console This message is automatically generated. > ContainerReport should have nodeHttpAddress > --- > > Key: YARN-1884 > URL: https://issues.apache.org/jira/browse/YARN-1884 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Xuan Gong > Attachments: YARN-1884.1.patch, YARN-1884.2.patch > > > In web UI, we're going to show the node, which used to be to link to the NM > web page. However, on AHS web UI, and RM web UI after YARN-1809, the node > field has to be set to nodeID where the container is allocated. We need to > add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.14.patch Same as .13, except it should be possible to apply this patch after applying [YARN-3318] 's .14 patch > Implement a Fair SchedulerOrderingPolicy > > > Key: YARN-3319 > URL: https://issues.apache.org/jira/browse/YARN-3319 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3319.13.patch, YARN-3319.14.patch > > > Implement a Fair SchedulerOrderingPolicy which prefers to allocate to > SchedulerProcesses with least current usage, very similar to the > FairScheduler's FairSharePolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.14.patch Same as .13 except it should be possible to apply with [YARN-3319] 's .14 patch > Create Initial OrderingPolicy Framework, integrate with CapacityScheduler > LeafQueue supporting present behavior > --- > > Key: YARN-3318 > URL: https://issues.apache.org/jira/browse/YARN-3318 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3318.13.patch, YARN-3318.14.patch > > > Create the initial framework required for using OrderingPolicies with > SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This > will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.13.patch Attaching initial/incomplete patch, it depends on the [YARN-3318] patch of the same index - it is just the additional logic specific to Fairness. Major TODO, sizeBasedWeight. > Implement a Fair SchedulerOrderingPolicy > > > Key: YARN-3319 > URL: https://issues.apache.org/jira/browse/YARN-3319 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3319.13.patch > > > Implement a Fair SchedulerOrderingPolicy which prefers to allocate to > SchedulerProcesses with least current usage, very similar to the > FairScheduler's FairSharePolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354006#comment-14354006 ] Craig Welch commented on YARN-3319: --- Initially this will be implemented for SchedulerApplicationAttempts in the CapacityScheduler LeafQueue (similar to the FIFO implementation in [YARN-3318]). The expectation is that this will be implement the SchedulerComparator interface and will be used as a comparator within the SchedulerComparatorPolicy implementation to achieve the intended behavior. > Implement a Fair SchedulerOrderingPolicy > > > Key: YARN-3319 > URL: https://issues.apache.org/jira/browse/YARN-3319 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > > Implement a Fair SchedulerOrderingPolicy which prefers to allocate to > SchedulerProcesses with least current usage, very similar to the > FairScheduler's FairSharePolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354003#comment-14354003 ] Karthik Kambatla commented on YARN-2928: +1 to renaming. Prefer - TimelineCollector and TimelineReceiver in that order. > Application Timeline Server (ATS) next gen: phase 1 > --- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal > v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS
[ https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354005#comment-14354005 ] Jian He commented on YARN-3300: --- actually, after looking at the UI, on app page, there's a big blank space above the resource requests table, similarly for the attempt page. could you fix that too ? > outstanding_resource_requests table should not be shown in AHS > -- > > Key: YARN-3300 > URL: https://issues.apache.org/jira/browse/YARN-3300 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3300.1.patch, YARN-3300.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
Craig Welch created YARN-3319: - Summary: Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Implement a Fair SchedulerOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS
[ https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353990#comment-14353990 ] Jian He commented on YARN-3300: --- lgtm, +1 > outstanding_resource_requests table should not be shown in AHS > -- > > Key: YARN-3300 > URL: https://issues.apache.org/jira/browse/YARN-3300 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3300.1.patch, YARN-3300.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.13.patch Initial, incomplete patch with the overall framework & implementation of the SchedulerComparatorPolicy and FifoComparator, major TODO includes integrating with capacity scheduler configuration. Also includes a CompoundComparator for chaining comparator based policies where desired. > Create Initial OrderingPolicy Framework, integrate with CapacityScheduler > LeafQueue supporting present behavior > --- > > Key: YARN-3318 > URL: https://issues.apache.org/jira/browse/YARN-3318 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3318.13.patch > > > Create the initial framework required for using OrderingPolicies with > SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This > will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353965#comment-14353965 ] Craig Welch commented on YARN-3318: --- The proposed initial implementation of the framework to support FIFO SchedulerApplicationAttempt ordering for the CapacityScheduler: A SchedulerComparatorPolicy which implements OrderingPolicy above. This implementation will take care of the common logic required for cases where the policy can be effectively implemented as a comparator (which is expected to be the case for several potential policies, including FIFO). A SchedulerComparator which is used by the SchedulerComparatorPolicy above. This is an extension of the java Comparator interface with additional logic required by the SchedulerComparatorPolicy, initially a method to accept SchedulerProcessEvents and indicate whether the require re-ordering of the associated SchedulerProcess. > Create Initial OrderingPolicy Framework, integrate with CapacityScheduler > LeafQueue supporting present behavior > --- > > Key: YARN-3318 > URL: https://issues.apache.org/jira/browse/YARN-3318 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > > Create the initial framework required for using OrderingPolicies with > SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This > will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch reassigned YARN-3318: - Assignee: Craig Welch > Create Initial OrderingPolicy Framework, integrate with CapacityScheduler > LeafQueue supporting present behavior > --- > > Key: YARN-3318 > URL: https://issues.apache.org/jira/browse/YARN-3318 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > > Create the initial framework required for using OrderingPolicies with > SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This > will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353953#comment-14353953 ] Craig Welch commented on YARN-3318: --- Proposed elements of the framework: A SchedulerProcess interface which generalizes processes to be managed by the OrderingPolicy (initially, potentially in the future by other Policies as well) Initial implementer will be the SchedulerApplicaitonAttempt. An OrderingPolicy interface which exposes a collection of scheduler processes which will be ordered by the policy for container assignment and preemption. The ordering policy will provide one Iterator which presents processes in the policy specific order for container assignment and another Iterator which presents them in the proper order for preemption. It will also accept SchedulerProcessEvents which may indicate a need to re-order the associated SchedulerProcess (for example, after container completion, preemption, assignment, etc) > Create Initial OrderingPolicy Framework, integrate with CapacityScheduler > LeafQueue supporting present behavior > --- > > Key: YARN-3318 > URL: https://issues.apache.org/jira/browse/YARN-3318 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch > > Create the initial framework required for using OrderingPolicies with > SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This > will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
Craig Welch created YARN-3318: - Summary: Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353928#comment-14353928 ] Craig Welch commented on YARN-2495: --- I understand the desire for fail-fast behavior to indicate an issue, but I wonder if this should really be a fatal case - I'm wondering if we might introduce a situation where a script error or other configuration issue could bring down an entire cluster (or even just a portion of the cluster) which would otherwise be able to remain functional. It's not clear to me that this should be thought of as a "fatal condition", esp. when the potential exists for escalating a rather minor issue to a major one. > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353897#comment-14353897 ] Wangda Tan commented on YARN-2495: -- I think the two issues are identical, and we should have a consistent way to handle them. If we stop node when any invalid labels during registration, we should stop node when same issue happened when heartbeat after registration. I think we can either allow them running or stop both of them, I'm fine with both approach. > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353895#comment-14353895 ] Vrushali C commented on YARN-2928: -- + 1 to renaming TimelineAggregator. TimelineReceiver is good. Some other suggestions are TimelineAccumulator or TimelineCollector. > Application Timeline Server (ATS) next gen: phase 1 > --- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal > v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353890#comment-14353890 ] Wangda Tan commented on YARN-3215: -- Yes, it works for no-labeled environment only, I added some details in description, please feel free to let me know your ideas. Thanks, > Respect labels in CapacityScheduler when computing headroom > --- > > Key: YARN-3215 > URL: https://issues.apache.org/jira/browse/YARN-3215 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > In existing CapacityScheduler, when computing headroom of an application, it > will only consider "non-labeled" nodes of this application. > But it is possible the application is asking for labeled resources, so > headroom-by-label (like 5G resource available under node-label=red) is > required to get better resource allocation and avoid deadlocks such as > MAPREDUCE-5928. > This JIRA could involve both API changes (such as adding a > label-to-available-resource map in AllocateResponse) and also internal > changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353891#comment-14353891 ] Wangda Tan commented on YARN-3215: -- Yes, it works for no-labeled environment only, I added some details in description, please feel free to let me know your ideas. Thanks, > Respect labels in CapacityScheduler when computing headroom > --- > > Key: YARN-3215 > URL: https://issues.apache.org/jira/browse/YARN-3215 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > In existing CapacityScheduler, when computing headroom of an application, it > will only consider "non-labeled" nodes of this application. > But it is possible the application is asking for labeled resources, so > headroom-by-label (like 5G resource available under node-label=red) is > required to get better resource allocation and avoid deadlocks such as > MAPREDUCE-5928. > This JIRA could involve both API changes (such as adding a > label-to-available-resource map in AllocateResponse) and also internal > changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3215: - Description: In existing CapacityScheduler, when computing headroom of an application, it will only consider "non-labeled" nodes of this application. But it is possible the application is asking for labeled resources, so headroom-by-label (like 5G resource available under node-label=red) is required to get better resource allocation and avoid deadlocks such as MAPREDUCE-5928. This JIRA could involve both API changes (such as adding a label-to-available-resource map in AllocateResponse) and also internal changes in CapacityScheduler. > Respect labels in CapacityScheduler when computing headroom > --- > > Key: YARN-3215 > URL: https://issues.apache.org/jira/browse/YARN-3215 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > In existing CapacityScheduler, when computing headroom of an application, it > will only consider "non-labeled" nodes of this application. > But it is possible the application is asking for labeled resources, so > headroom-by-label (like 5G resource available under node-label=red) is > required to get better resource allocation and avoid deadlocks such as > MAPREDUCE-5928. > This JIRA could involve both API changes (such as adding a > label-to-available-resource map in AllocateResponse) and also internal > changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353882#comment-14353882 ] Robert Kanter commented on YARN-2928: - I agree; we're using "aggregator" for too many things. For TimelineAggregator, IIRC, [~kasha] had suggested TimelineCollector at one point, and that sounded good. TimelineReceiver also sounds fine. > Application Timeline Server (ATS) next gen: phase 1 > --- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal > v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353872#comment-14353872 ] Sangjin Lee commented on YARN-2928: --- A couple of more comments on the plan: - I think the metrics API should be part of phase 2 since we will handle aggregation - It's a small item, but we should make the per-node aggregator a standalone daemon part of phase 2 Speaking of "aggregator", the word "aggregation/aggregator" is now getting quite overloaded. Originally it meant "rolling up metrics to parent entities". Now it's really used in two quite different contexts. For example, the TimelineAggregator classes have little to do with that original meaning. I'm not quite sure what aggregation means in that context, although, I know, I know, I said +1 to the name TimelineAggregator. :) Should we clear up this confusion? IMO, we should stick with the original meaning of aggregation when we talk about aggregation. For TimelineAggregator, perhaps we could rename it to TimelineReceiver or another name? > Application Timeline Server (ATS) next gen: phase 1 > --- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal > v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353864#comment-14353864 ] Xuan Gong commented on YARN-1884: - The new patch addressed all the comments > ContainerReport should have nodeHttpAddress > --- > > Key: YARN-1884 > URL: https://issues.apache.org/jira/browse/YARN-1884 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Xuan Gong > Attachments: YARN-1884.1.patch, YARN-1884.2.patch > > > In web UI, we're going to show the node, which used to be to link to the NM > web page. However, on AHS web UI, and RM web UI after YARN-1809, the node > field has to be set to nodeID where the container is allocated. We need to > add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353853#comment-14353853 ] Sangjin Lee commented on YARN-2928: --- I suppose the "ApplicationMaster events" refer to the ones that are written by the distributed shell AM. Correct? > Application Timeline Server (ATS) next gen: phase 1 > --- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal > v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353852#comment-14353852 ] Wangda Tan commented on YARN-3298: -- [~nroberts], As you mentioned, it is mostly as same as what we have today, and I think it cannot solve the jitter problem. What I really want to say is enforce the limit. To solve "small amount of resource cannot be used in a queue" problem which you mentioned in https://issues.apache.org/jira/browse/YARN-3298?focusedCommentId=14353053&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14353053, setting user-limit a little bit higher should solve the problem also. (like from 50 to 51). Sounds like a plan? > User-limit should be enforced in CapacityScheduler > -- > > Key: YARN-3298 > URL: https://issues.apache.org/jira/browse/YARN-3298 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, yarn >Reporter: Wangda Tan >Assignee: Wangda Tan > > User-limit is not treat as a hard-limit for now, it will not consider > required-resource (resource of being-allocated resource request). And also, > when user's used resource equals to user-limit, it will still continue. This > will generate jitter issues when we have YARN-2069 (preemption policy kills a > container under an user, and scheduler allocate a container under the same > user soon after). > The expected behavior should be as same as queue's capacity: > Only when user.usage + required <= user-limit (1), queue will continue to > allocate container. > (1), user-limit mentioned here is determined by following computing > {code} > current-capacity = queue.used + now-required (when queue.used > > queue.capacity) >queue.capacity (when queue.used < queue.capacity) > user-limit = min(max(current-capacity / #active-users, current-capacity * > user-limit / 100), queue-capacity * user-limit-factor) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1884: Attachment: YARN-1884.2.patch > ContainerReport should have nodeHttpAddress > --- > > Key: YARN-1884 > URL: https://issues.apache.org/jira/browse/YARN-1884 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Xuan Gong > Attachments: YARN-1884.1.patch, YARN-1884.2.patch > > > In web UI, we're going to show the node, which used to be to link to the NM > web page. However, on AHS web UI, and RM web UI after YARN-1809, the node > field has to be set to nodeID where the container is allocated. We need to > add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3317) MR-279: Modularize web framework and webapps
[ https://issues.apache.org/jira/browse/YARN-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-2435 to YARN-3317: --- Tags: (was: mrv2, hamlet, module) Component/s: (was: mrv2) Key: YARN-3317 (was: MAPREDUCE-2435) Project: Hadoop YARN (was: Hadoop Map/Reduce) > MR-279: Modularize web framework and webapps > > > Key: YARN-3317 > URL: https://issues.apache.org/jira/browse/YARN-3317 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Luke Lu > > The patch moves the web framework out of yarn-common into a separate module: > yarn-web. > It also decouple webapps into separate modules/jars from their respective > server modules/jars to allow webapp updates independent of servers. Servers > use ServiceLoader to discover its webapp modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353833#comment-14353833 ] Nathan Roberts commented on YARN-3298: -- [~leftnoteasy], won't that be extremely close to what it is today? If so, then does it really solve the jitter issue you originally cited? Just to make sure I'm in-sync with your proposed direction, this is the code you're thinking about modifying, correct? {code} // Note: We aren't considering the current request since there is a fixed // overhead of the AM, but it's a > check, not a >= check, so... if (Resources .greaterThan(resourceCalculator, clusterResource, user.getConsumedResourceByLabel(label), limit)) { {code} > User-limit should be enforced in CapacityScheduler > -- > > Key: YARN-3298 > URL: https://issues.apache.org/jira/browse/YARN-3298 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, yarn >Reporter: Wangda Tan >Assignee: Wangda Tan > > User-limit is not treat as a hard-limit for now, it will not consider > required-resource (resource of being-allocated resource request). And also, > when user's used resource equals to user-limit, it will still continue. This > will generate jitter issues when we have YARN-2069 (preemption policy kills a > container under an user, and scheduler allocate a container under the same > user soon after). > The expected behavior should be as same as queue's capacity: > Only when user.usage + required <= user-limit (1), queue will continue to > allocate container. > (1), user-limit mentioned here is determined by following computing > {code} > current-capacity = queue.used + now-required (when queue.used > > queue.capacity) >queue.capacity (when queue.used < queue.capacity) > user-limit = min(max(current-capacity / #active-users, current-capacity * > user-limit / 100), queue-capacity * user-limit-factor) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353825#comment-14353825 ] Allen Wittenauer commented on YARN-321: --- Looks like this should get closed out w/a fix ver of 2.4.0? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353815#comment-14353815 ] Jonathan Eagles commented on YARN-3287: --- Thanks, [~zjshen] > TimelineClient kerberos authentication failure uses wrong login context. > > > Key: YARN-3287 > URL: https://issues.apache.org/jira/browse/YARN-3287 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Daryn Sharp > Fix For: 2.7.0 > > Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, > timeline.patch > > > TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause > failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353803#comment-14353803 ] Zhijie Shen commented on YARN-3287: --- Merge it into branch-2.7 too. > TimelineClient kerberos authentication failure uses wrong login context. > > > Key: YARN-3287 > URL: https://issues.apache.org/jira/browse/YARN-3287 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Daryn Sharp > Fix For: 2.7.0 > > Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, > timeline.patch > > > TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause > failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353800#comment-14353800 ] Nathan Roberts commented on YARN-3215: -- Hi [~leftnoteasy]. Can you provide a summary of what this is about? Basic testing seems to show this works at least to some degree. e.g. jobs running on nodes without labels don't appear to include labeled-nodes as part of headroom (as expected). > Respect labels in CapacityScheduler when computing headroom > --- > > Key: YARN-3215 > URL: https://issues.apache.org/jira/browse/YARN-3215 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353787#comment-14353787 ] Jian He commented on YARN-3273: --- looks good, to distinguish scenarios like one user belongs to two queues, we probably need to add a separate queue tag too ? For the "Active Users:" field in CS queue page, it may also be useful to change that to be simply user names which links back to the user page with filtered user name. Just for implementation reference, the existing Node Labels page has some similar functionalities. thanks again for taking on this, Rohith ! > Improve web UI to facilitate scheduling analysis and debugging > -- > > Key: YARN-3273 > URL: https://issues.apache.org/jira/browse/YARN-3273 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Rohith > Attachments: 0001-YARN-3273-v1.patch, > YARN-3273-am-resource-used-AND-User-limit.PNG, > YARN-3273-application-headroom.PNG > > > Job may be stuck for reasons such as: > - hitting queue capacity > - hitting user-limit, > - hitting AM-resource-percentage > The first queueCapacity is already shown on the UI. > We may surface things like: > - what is user's current usage and user-limit; > - what is the AM resource usage and limit; > - what is the application's current HeadRoom; > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3316) Make the ResourceManager, NodeManager and HistoryServer run from Eclipse.
[ https://issues.apache.org/jira/browse/YARN-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3316: --- Component/s: resourcemanager nodemanager > Make the ResourceManager, NodeManager and HistoryServer run from Eclipse. > - > > Key: YARN-3316 > URL: https://issues.apache.org/jira/browse/YARN-3316 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Affects Versions: 3.0.0 >Reporter: praveen sripati >Priority: Minor > > Make the ResourceManager, NodeManager and HistoryServer run from Eclipse, so > that it would be easy for development. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3316) Make the ResourceManager, NodeManager and HistoryServer run from Eclipse.
[ https://issues.apache.org/jira/browse/YARN-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-2798 to YARN-3316: --- Component/s: (was: mrv2) Affects Version/s: (was: 0.23.0) 3.0.0 Key: YARN-3316 (was: MAPREDUCE-2798) Project: Hadoop YARN (was: Hadoop Map/Reduce) > Make the ResourceManager, NodeManager and HistoryServer run from Eclipse. > - > > Key: YARN-3316 > URL: https://issues.apache.org/jira/browse/YARN-3316 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 3.0.0 >Reporter: praveen sripati >Priority: Minor > > Make the ResourceManager, NodeManager and HistoryServer run from Eclipse, so > that it would be easy for development. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353704#comment-14353704 ] Jian He commented on YARN-3243: --- thanks Wangda ! - ParentQueue#canAssignToThisQueue, {code} if (totalUsedCapacityRatio >= maxAvailCapacity) { canAssign = false; break; } {code} instead of comparing with ratio, I think it might be simpler to compare resource value > CapacityScheduler should pass headroom from parent to children to make sure > ParentQueue obey its capacity limits. > - > > Key: YARN-3243 > URL: https://issues.apache.org/jira/browse/YARN-3243 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3243.1.patch > > > Now CapacityScheduler has some issues to make sure ParentQueue always obeys > its capacity limits, for example: > 1) When allocating container of a parent queue, it will only check > parentQueue.usage < parentQueue.max. If leaf queue allocated a container.size > > (parentQueue.max - parentQueue.usage), parent queue can excess its max > resource limit, as following example: > {code} > A (usage=54, max=55) >/ \ > A1 A2 (usage=1, max=55) > (usage=53, max=53) > {code} > Queue-A2 is able to allocate container since its usage < max, but if we do > that, A's usage can excess A.max. > 2) When doing continous reservation check, parent queue will only tell > children "you need unreserve *some* resource, so that I will less than my > maximum resource", but it will not tell how many resource need to be > unreserved. This may lead to parent queue excesses configured maximum > capacity as well. > With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, > *here is my proposal*: > - ParentQueue will set its children's ResourceUsage.headroom, which means, > *maximum resource its children can allocate*. > - ParentQueue will set its children's headroom to be (saying parent's name is > "qA"): min(qA.headroom, qA.max - qA.used). This will make sure qA's > ancestors' capacity will be enforced as well (qA.headroom is set by qA's > parent). > - {{needToUnReserve}} is not necessary, instead, children can get how much > resource need to be unreserved to keep its parent's resource limit. > - More over, with this, YARN-3026 will make a clear boundary between > LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3311) add location to web UI so you know where you are - cluster, node, AM, job history
[ https://issues.apache.org/jira/browse/YARN-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3074 to YARN-3311: --- Component/s: (was: mrv2) Affects Version/s: (was: 3.0.0) (was: 0.23.0) 3.0.0 Key: YARN-3311 (was: MAPREDUCE-3074) Project: Hadoop YARN (was: Hadoop Map/Reduce) > add location to web UI so you know where you are - cluster, node, AM, job > history > - > > Key: YARN-3311 > URL: https://issues.apache.org/jira/browse/YARN-3311 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Thomas Graves > > Right now if you go to any of the web UIs for resource manager, node manager, > app master, or job history, they look very similar but sometimes it hard to > tell which page you are. Adding a title or something that lets you know > would be helpful. Or somehow make them more seemless so one doesn't have to > know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3315) Fix -list-blacklisted-trackers to print the blacklisted NMs
[ https://issues.apache.org/jira/browse/YARN-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3305 to YARN-3315: --- Component/s: (was: mrv2) Affects Version/s: (was: 0.23.0) Key: YARN-3315 (was: MAPREDUCE-3305) Project: Hadoop YARN (was: Hadoop Map/Reduce) > Fix -list-blacklisted-trackers to print the blacklisted NMs > --- > > Key: YARN-3315 > URL: https://issues.apache.org/jira/browse/YARN-3315 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ramya Sunil > > bin/mapred job -list-blacklisted-trackers currently prints > "getBlacklistedTrackers - Not implemented yet" This is a long pending issue. > Could not find a tracking ticket, hence opening one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353677#comment-14353677 ] Nathan Roberts commented on YARN-1963: -- {quote} Without some sort of labels, it will be very hard for users to reason about the definition and relative importance of priorities across queues and cluster. We must support the notion of priority-labels to make this feature usable in practice. {quote} Maybe I'm missing something... Isn't it relatively easy to reason about 2<4 and therefore 2 is lower priority than 4? Unix/Linux hasn't had labels for priorities and it seems to be working pretty well there. Even if I have labels, I have to make sure that all queues and clusters define them precisely the same way or I wind up just as confused, if not even more. Just my $0.02 > Support priorities across applications within the same queue > - > > Key: YARN-1963 > URL: https://issues.apache.org/jira/browse/YARN-1963 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Reporter: Arun C Murthy >Assignee: Sunil G > Attachments: 0001-YARN-1963-prototype.patch, YARN Application > Priorities Design.pdf, YARN Application Priorities Design_01.pdf > > > It will be very useful to support priorities among applications within the > same queue, particularly in production scenarios. It allows for finer-grained > controls without having to force admins to create a multitude of queues, plus > allows existing applications to continue using existing queues which are > usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3314) Write an integration test for validating MR AM restart and recovery
[ https://issues.apache.org/jira/browse/YARN-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3245 to YARN-3314: --- Component/s: (was: mrv2) (was: test) test Affects Version/s: (was: 0.23.0) Key: YARN-3314 (was: MAPREDUCE-3245) Project: Hadoop YARN (was: Hadoop Map/Reduce) > Write an integration test for validating MR AM restart and recovery > --- > > Key: YARN-3314 > URL: https://issues.apache.org/jira/browse/YARN-3314 > Project: Hadoop YARN > Issue Type: Test > Components: test >Reporter: Vinod Kumar Vavilapalli > > This, so that we can catch bugs like MAPREDUCE-3233. > We need one with recovery disabled i.e. for only restart and one for > restart+recovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3312) Web UI menu inconsistencies
[ https://issues.apache.org/jira/browse/YARN-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3075 to YARN-3312: --- Component/s: (was: mrv2) Affects Version/s: (was: 0.23.0) 3.0.0 Key: YARN-3312 (was: MAPREDUCE-3075) Project: Hadoop YARN (was: Hadoop Map/Reduce) > Web UI menu inconsistencies > --- > > Key: YARN-3312 > URL: https://issues.apache.org/jira/browse/YARN-3312 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Thomas Graves > > When you go to the various web UI's the menus on the left are inconsistent > and (atleast to me) sometimes confusing. For instance if you go to the > application master UI, one of the menus is Cluster. If you click on one of > the Cluster links it takes you back to the RM ui and you lose the app master > UI altogether. Maybe its just me but that is confusing. I like having a link > back to the cluster from AM but the way the UI is setup I would have expected > it to just open that page in the middle div/frame and leave the AM menus > there. Perhaps a different type of link or menu to indicate this is going to > take you away from AM page. > Also, the nodes and job history UI don't have the Cluster menus at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3313) Write additional tests for data locality in MRv2.
[ https://issues.apache.org/jira/browse/YARN-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3093 to YARN-3313: --- Component/s: (was: mrv2) (was: test) test Assignee: (was: Mahadev konar) Affects Version/s: (was: 0.23.0) 3.0.0 Key: YARN-3313 (was: MAPREDUCE-3093) Project: Hadoop YARN (was: Hadoop Map/Reduce) > Write additional tests for data locality in MRv2. > - > > Key: YARN-3313 > URL: https://issues.apache.org/jira/browse/YARN-3313 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 3.0.0 >Reporter: Mahadev konar > > We should add tests to make sure data locality is in place in MRv2 (with > respect to the capacity scheduler and also the matching/ask of containers in > the MR AM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3310) MR-279: Log info about the location of dist cache
[ https://issues.apache.org/jira/browse/YARN-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-2758 to YARN-3310: --- Component/s: (was: mrv2) Affects Version/s: (was: 0.23.0) Issue Type: Improvement (was: Bug) Key: YARN-3310 (was: MAPREDUCE-2758) Project: Hadoop YARN (was: Hadoop Map/Reduce) > MR-279: Log info about the location of dist cache > - > > Key: YARN-3310 > URL: https://issues.apache.org/jira/browse/YARN-3310 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ramya Sunil >Assignee: Siddharth Seth >Priority: Minor > > Currently, there is no log info available about the actual location of the > file/archive in dist cache being used by the task except for the "ln" command > in task.sh. We need to log this information to help in debugging esp in those > cases where there are more than one archive with the same name. > In 0.20.x, in task logs, one could find log info such as the following: > INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: location>/archive <- /archive -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3308) Improvements to CapacityScheduler documentation
[ https://issues.apache.org/jira/browse/YARN-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3308: --- Attachment: YARN-3308-02.patch 02: * rebased for trunk * took in arun's comments > Improvements to CapacityScheduler documentation > --- > > Key: YARN-3308 > URL: https://issues.apache.org/jira/browse/YARN-3308 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 3.0.0 >Reporter: Yoram Arnon >Priority: Minor > Labels: documentation > Attachments: MAPREDUCE-3658, MAPREDUCE-3658, YARN-3308-02.patch > > Original Estimate: 3h > Remaining Estimate: 3h > > There are some typos and some cases of incorrect English. > Also, the descriptions of yarn.scheduler.capacity..capacity, > yarn.scheduler.capacity..maximum-capacity, > yarn.scheduler.capacity..user-limit-factor, > yarn.scheduler.capacity.maximum-applications are not very clear to the > uninitiated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353639#comment-14353639 ] Vinod Kumar Vavilapalli commented on YARN-2495: --- Ah, right. Forgot about that. Given that, it seems that we have the following # Node reports with invalid labels during registration - we reject it rightaway # Node gets successfully registered, but then the labels script starts generating invalid labels mid way through I think in case (2), we are better off ignoring the newly reported invalid labels, report this in the UI/NodeReport and let the node continue running. Thoughts? > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3309) Capacity scheduler can wait a very long time for node locality
Nathan Roberts created YARN-3309: Summary: Capacity scheduler can wait a very long time for node locality Key: YARN-3309 URL: https://issues.apache.org/jira/browse/YARN-3309 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Nathan Roberts The capacity scheduler will delay scheduling a container on a rack-local node in hopes that a node-local opportunity will come along (YARN-80). It does this by counting the number of missed scheduling opportunities the application has had. When the count reaches a certain threshold, the app will accept the rack-local node. The documented recommendation is to set this threshold to the #nodes in the cluster. However, there are some early-out optimizations that can lead to this delay being a very long time. Example in allocateContainersToNode(): {code} // Try to schedule more if there are no reservations to fulfill if (node.getReservedContainer() == null) { if (calculator.computeAvailableContainers(node.getAvailableResource(), minimumAllocation) > 0) { if (LOG.isDebugEnabled()) { LOG.debug("Trying to schedule on node: " + node.getNodeName() + ", available: " + node.getAvailableResource()); } root.assignContainers(clusterResource, node, false); } {code} So, in a large cluster that is completely full (AvailableResource on each node is 0), SchedulingOpportunities will only increase at the rate of container completion rate, not the heartbeat rate, which I think was the original assumption of YARN-80. On a large cluster, this can lead to an hour+ of skipped scheduling opportunities meaning the fifo'ness of a queue is ignored for a very long time. Maybe there should be a time-based limit on this delay as well as a count of missed-scheduling opportunities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353597#comment-14353597 ] Hudson commented on YARN-3287: -- FAILURE: Integrated in Hadoop-trunk-Commit #7291 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7291/]) YARN-3287. Made TimelineClient put methods do as the correct login context. Contributed by Daryn Sharp and Jonathan Eagles. (zjshen: rev d6e05c5ee26feefc17267b7c9db1e2a3dbdef117) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineAuthenticationFilter.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java > TimelineClient kerberos authentication failure uses wrong login context. > > > Key: YARN-3287 > URL: https://issues.apache.org/jira/browse/YARN-3287 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Daryn Sharp > Fix For: 2.7.0 > > Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, > timeline.patch > > > TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause > failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353574#comment-14353574 ] Zhijie Shen commented on YARN-3287: --- +1 for the last patch. Will commit it. > TimelineClient kerberos authentication failure uses wrong login context. > > > Key: YARN-3287 > URL: https://issues.apache.org/jira/browse/YARN-3287 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Daryn Sharp > Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, > timeline.patch > > > TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause > failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353560#comment-14353560 ] Hadoop QA commented on YARN-3287: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703485/YARN-3287.3.patch against trunk revision 3241fc2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6896//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6896//console This message is automatically generated. > TimelineClient kerberos authentication failure uses wrong login context. > > > Key: YARN-3287 > URL: https://issues.apache.org/jira/browse/YARN-3287 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Daryn Sharp > Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, > timeline.patch > > > TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause > failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3308) Improvements to CapacityScheduler documentation
[ https://issues.apache.org/jira/browse/YARN-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3658 to YARN-3308: --- Component/s: (was: mrv2) documentation Assignee: (was: Yoram Arnon) Target Version/s: (was: 2.0.0-alpha, 3.0.0) Affects Version/s: (was: 0.23.0) Key: YARN-3308 (was: MAPREDUCE-3658) Project: Hadoop YARN (was: Hadoop Map/Reduce) > Improvements to CapacityScheduler documentation > --- > > Key: YARN-3308 > URL: https://issues.apache.org/jira/browse/YARN-3308 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Yoram Arnon >Priority: Minor > Labels: documentation > Attachments: MAPREDUCE-3658, MAPREDUCE-3658 > > Original Estimate: 3h > Remaining Estimate: 3h > > There are some typos and some cases of incorrect English. > Also, the descriptions of yarn.scheduler.capacity..capacity, > yarn.scheduler.capacity..maximum-capacity, > yarn.scheduler.capacity..user-limit-factor, > yarn.scheduler.capacity.maximum-applications are not very clear to the > uninitiated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3308) Improvements to CapacityScheduler documentation
[ https://issues.apache.org/jira/browse/YARN-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3308: --- Affects Version/s: 3.0.0 > Improvements to CapacityScheduler documentation > --- > > Key: YARN-3308 > URL: https://issues.apache.org/jira/browse/YARN-3308 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 3.0.0 >Reporter: Yoram Arnon >Priority: Minor > Labels: documentation > Attachments: MAPREDUCE-3658, MAPREDUCE-3658 > > Original Estimate: 3h > Remaining Estimate: 3h > > There are some typos and some cases of incorrect English. > Also, the descriptions of yarn.scheduler.capacity..capacity, > yarn.scheduler.capacity..maximum-capacity, > yarn.scheduler.capacity..user-limit-factor, > yarn.scheduler.capacity.maximum-applications are not very clear to the > uninitiated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3308) Improvements to CapacityScheduler documentation
[ https://issues.apache.org/jira/browse/YARN-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3308: --- Release Note: (was: documentation change only) > Improvements to CapacityScheduler documentation > --- > > Key: YARN-3308 > URL: https://issues.apache.org/jira/browse/YARN-3308 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Yoram Arnon >Priority: Minor > Labels: documentation > Attachments: MAPREDUCE-3658, MAPREDUCE-3658 > > Original Estimate: 3h > Remaining Estimate: 3h > > There are some typos and some cases of incorrect English. > Also, the descriptions of yarn.scheduler.capacity..capacity, > yarn.scheduler.capacity..maximum-capacity, > yarn.scheduler.capacity..user-limit-factor, > yarn.scheduler.capacity.maximum-applications are not very clear to the > uninitiated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353558#comment-14353558 ] Zhijie Shen commented on YARN-1884: --- [~xgong], thanks for the patch. Here're some comments: 1. No need to change application_history_server.proto, ApplicationHistoryManagerImpl.java, FileSystemApplicationHistoryStore.java, MemoryApplicationHistoryStore.java, ContainerFinishData.java, ContainerHistoryData.java, ContainerStartData.java, ContainerFinishDataPBImpl.java, ContainerStartDataPBImpl.java, ApplicationHistoryStoreTestUtils.java, TestFileSystemApplicationHistoryStore.java, TestMemoryApplicationHistoryStore.java, RMApplicationHistoryWriter.java, TestRMApplicationHistoryWriter.java. It's the deprecated code. 2. Why do we need conf here to compute http or https? getNodeHttpAddress() doesn't come with the prefix? If so, we need to fix it in other block, CLI and webservice too for consistency. For example, when generating the report, we should already append the http prefix. {code} 114 container.getNodeHttpAddress() == null ? "#" : WebAppUtils 115 .getHttpSchemePrefix(conf) + container.getNodeHttpAddress(), {code} 3. Is it possible if getContainer() returns null? If so, it will result in NPE. Another way is to make getNodeHttpAddress as the method of RMContainer. See how we do it for getContainerExitStatus and so on. {code} createdTime, container.getContainer().getNodeHttpAddress())); {code} > ContainerReport should have nodeHttpAddress > --- > > Key: YARN-1884 > URL: https://issues.apache.org/jira/browse/YARN-1884 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Xuan Gong > Attachments: YARN-1884.1.patch > > > In web UI, we're going to show the node, which used to be to link to the NM > web page. However, on AHS web UI, and RM web UI after YARN-1809, the node > field has to be set to nodeID where the container is allocated. We need to > add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353553#comment-14353553 ] Wangda Tan commented on YARN-3298: -- Hi [~nroberts], If I understand what you meant correctly, maybe we can just relax when user.used < user.limit (instead of user.used + now_required <= user.limit), which can solve the problem you mentioned. > User-limit should be enforced in CapacityScheduler > -- > > Key: YARN-3298 > URL: https://issues.apache.org/jira/browse/YARN-3298 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, yarn >Reporter: Wangda Tan >Assignee: Wangda Tan > > User-limit is not treat as a hard-limit for now, it will not consider > required-resource (resource of being-allocated resource request). And also, > when user's used resource equals to user-limit, it will still continue. This > will generate jitter issues when we have YARN-2069 (preemption policy kills a > container under an user, and scheduler allocate a container under the same > user soon after). > The expected behavior should be as same as queue's capacity: > Only when user.usage + required <= user-limit (1), queue will continue to > allocate container. > (1), user-limit mentioned here is determined by following computing > {code} > current-capacity = queue.used + now-required (when queue.used > > queue.capacity) >queue.capacity (when queue.used < queue.capacity) > user-limit = min(max(current-capacity / #active-users, current-capacity * > user-limit / 100), queue-capacity * user-limit-factor) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3307) Master-Worker Application on YARN
[ https://issues.apache.org/jira/browse/YARN-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3315 to YARN-3307: --- Affects Version/s: (was: 3.0.0) 3.0.0 Key: YARN-3307 (was: MAPREDUCE-3315) Project: Hadoop YARN (was: Hadoop Map/Reduce) > Master-Worker Application on YARN > - > > Key: YARN-3307 > URL: https://issues.apache.org/jira/browse/YARN-3307 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 3.0.0 >Reporter: Sharad Agarwal >Assignee: Sharad Agarwal > Attachments: MAPREDUCE-3315-1.patch, MAPREDUCE-3315-2.patch, > MAPREDUCE-3315-3.patch, MAPREDUCE-3315.patch > > > Currently master worker scenarios are forced fit into Map-Reduce. Now with > YARN, these can be first class and would benefit real/near realtime workloads > and be more effective in using the cluster resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3287: -- Attachment: YARN-3287.3.patch [~zjshen], trying to unwrap as before. Let me know if this is what you are intending. > TimelineClient kerberos authentication failure uses wrong login context. > > > Key: YARN-3287 > URL: https://issues.apache.org/jira/browse/YARN-3287 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Daryn Sharp > Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, > timeline.patch > > > TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause > failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353418#comment-14353418 ] Zhijie Shen commented on YARN-3287: --- I double checked the oozie use case. It seems that for each individual job, oozie server will create a separate client to start the MR job. The change should be safe then. Thanks for the patch, Jon! It's almost fine to me. Just one nit. 1. In private ClientResponse doPosting(Object obj, String path), doAs op will throw UndeclaredThrowableException, shall we capture and unwrap it as before. {code} 332 } catch (InterruptedException ie) { 333 throw new IOException(ie); 314 } {code} > TimelineClient kerberos authentication failure uses wrong login context. > > > Key: YARN-3287 > URL: https://issues.apache.org/jira/browse/YARN-3287 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Daryn Sharp > Attachments: YARN-3287.1.patch, YARN-3287.2.patch, timeline.patch > > > TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause > failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353397#comment-14353397 ] Craig Welch commented on YARN-2495: --- -re bq. How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration? As I understand the requirements, it's necessary to handle the case where the derived set of labels changes during the lifetime of the nodemanager, e.g. externally libraries might be installed or some other condition may change which effects the labels & no nodemanager re-registration is involved, and yet the changed labels need to be reflected > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353390#comment-14353390 ] Vinod Kumar Vavilapalli commented on YARN-1963: --- {quote} As per discussion happened in YARN-2896 with Eric Payne and Wangda Tan, there is proposal to use Integer alone as priority from client and as well as in server. As per design doc, a priority label was used as wrapper for user and internally server was using corresponding integer with same. We can continue discussion on this here in parent JIRA. Looping Vinod Kumar Vavilapalli. Current idea: yarn.prority-labels = low:2, medium:4, high:6 Proposed: yarn.application.priority = 2, 3 , 4 {quote} Without some sort of labels, it will be very hard for users to reason about the definition and relative importance of priorities across queues and cluster. We must support the notion of priority-labels to make this feature usable in practice. > Support priorities across applications within the same queue > - > > Key: YARN-1963 > URL: https://issues.apache.org/jira/browse/YARN-1963 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Reporter: Arun C Murthy >Assignee: Sunil G > Attachments: 0001-YARN-1963-prototype.patch, YARN Application > Priorities Design.pdf, YARN Application Priorities Design_01.pdf > > > It will be very useful to support priorities among applications within the > same queue, particularly in production scenarios. It allows for finer-grained > controls without having to force admins to create a multitude of queues, plus > allows existing applications to continue using existing queues which are > usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353383#comment-14353383 ] Vinod Kumar Vavilapalli commented on YARN-2495: --- Quick comments - configuration.type -> configuration-type - Should RegisterNodeManagerRequestProto.nodeLabels be a set instead? - Do we really need NodeHeartbeatRequest.areNodeLabelsSetInReq()? Why not just look at the set as mentioned in the previous comment? - RegisterNodeManagerRequest is getting changed. It will be interesting to reason about rolling-upgrades in this scenario. - How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration? - We should not even accept a node's registration when it reports invalid labels? > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353353#comment-14353353 ] Wangda Tan commented on YARN-2495: -- For your comments: 1) For the name, do you think is setAreNodeLabelsUpdated a better name since it avoids "set" occured twice :) (I understand this needs lots of refactorings, if you have any suggestions, we can finalize it before renaming. 5) I made a mistake that sent an incompleted comment :-p, what I wanted to say is: It will be problematic to ask admins make NM/RM configuration keep synchronized, so I don't want (and also not necessary) NM depends on RM's configuration. So I suggest to make a changes: - In NodeManager.java: when user doesn't configure provider, it should be null. In your patch, you can return a null directly, and YARN-2729 will implement the logic of instancing provider from config. - In NodeStatusUpdaterImpl: avoid using {{isDistributedNodeLabelsConf}}, since we will not have "distributedNodeLabelConf" in NM side if you agree on previously comment, instead, it will check null of provider. Regarding your "fail-fast" concern, it shouldn't be a problem if you agree on comment I just made. (I know there could be some back-and-forth comment from my side on this, I feel sorry about this since this feature is evolving itself, please just feel free to let me know your ideas.). 7) I should address your question on 5). 8) You can add an additional comments in line 626 for this. 9) Took a look at TestNodeStatusUpdater, your comment make sense to me, it's a very complex class, you can just leave this comment alone. 10) Few comments for your added code: - updateNodeLabelsInNodeLabelsManager -> updateNodeLabelsFromNMReport - {{LOG.info(... accepted from RM}}, use LOG.debug and check {{isDebugEnabled}}. - Make errorMessage clear: indicate 1# this is node labels reported from NM, and 2# it's failed to be put to RM instead of "not properly configured". In addition: Another thing we should do is, when distributed node label configuration is set, any direct modify node to labels mapping from RMAdminCLI should be rejected (like -replaceNodeToLabels). This can be done in a separated JIRA. > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, > YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, > YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, > YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or > using script suggested by [~aw] (YARN-2729) ) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN
[ https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3306: -- Attachment: PerQueuePolicydrivenschedulinginYARN.pdf Here's a detailed proposal doc. It's light on details on the leaf-queue policy interface - will do so in one of the sub-tasks. [~cwelch] is helping with most of the implementation, Tx Craig. > [Umbrella] Proposing per-queue Policy driven scheduling in YARN > --- > > Key: YARN-3306 > URL: https://issues.apache.org/jira/browse/YARN-3306 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: PerQueuePolicydrivenschedulinginYARN.pdf > > > Scheduling layout in Apache Hadoop YARN today is very coarse grained. This > proposal aims at converting today’s rigid scheduling in YARN to a per-queue > policy driven architecture. > We propose the creation of a common policy framework and implement acommon > set of policies that administrators can pick and chose per queue > - Make scheduling policies configurable per queue > - Initially, we limit ourselves to a new type of scheduling policy that > determines the ordering of applications within the leaf queue > - In the near future, we will also pursue parent queue level policies and > potential algorithm reuse through a separate type of policies that control > resource limits per queue, user, application etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353277#comment-14353277 ] Zhijie Shen commented on YARN-3287: --- Sure, I'll take a look again. > TimelineClient kerberos authentication failure uses wrong login context. > > > Key: YARN-3287 > URL: https://issues.apache.org/jira/browse/YARN-3287 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Daryn Sharp > Attachments: YARN-3287.1.patch, YARN-3287.2.patch, timeline.patch > > > TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause > failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN
Vinod Kumar Vavilapalli created YARN-3306: - Summary: [Umbrella] Proposing per-queue Policy driven scheduling in YARN Key: YARN-3306 URL: https://issues.apache.org/jira/browse/YARN-3306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Scheduling layout in Apache Hadoop YARN today is very coarse grained. This proposal aims at converting today’s rigid scheduling in YARN to a per-queue policy driven architecture. We propose the creation of a common policy framework and implement acommon set of policies that administrators can pick and chose per queue - Make scheduling policies configurable per queue - Initially, we limit ourselves to a new type of scheduling policy that determines the ordering of applications within the leaf queue - In the near future, we will also pursue parent queue level policies and potential algorithm reuse through a separate type of policies that control resource limits per queue, user, application etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353272#comment-14353272 ] Anubhav Dhoot commented on YARN-3304: - The intention of setting -1 as was for this issue (distinguishing unavailable vs actually zero). Ideally we should prevent adding the metrics to collection until they are available. One possibility is doing it at ContainerMetrics#recordCpuUsage. Suggest investigating if this ideal case is achievable, and if not i am fine with making these 0 to be consistent. > ResourceCalculatorProcessTree#getCpuUsagePercent default return value is > inconsistent with other getters > > > Key: YARN-3304 > URL: https://issues.apache.org/jira/browse/YARN-3304 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Junping Du >Assignee: Karthik Kambatla >Priority: Blocker > > Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for > unavailable case while other resource metrics are return 0 in the same case > which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)