[jira] [Updated] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized
[ https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-3024: Attachment: YARN-3024.03.patch Updated patch. I refactored {{LocalizerRunner#update()}}, separating the following two phase: * first read and process resource statuses from the localizer through heartbeat * then find the next resource to be localized and send it through response Also, in the original code base, there is a small problem about the response action. Now if one of the following conditions is met, the response action will be DIE: * Got at least one FETCH_FAILURE * {{findNextResource()}} returns null, and {{pending}} is empty > LocalizerRunner should give DIE action when all resources are localized > --- > > Key: YARN-3024 > URL: https://issues.apache.org/jira/browse/YARN-3024 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: YARN-3024.01.patch, YARN-3024.02.patch, > YARN-3024.03.patch > > > We have observed that {{LocalizerRunner}} always gives a LIVE action at the > end of localization process. > The problem is {{findNextResource()}} can return null even when {{pending}} > was not empty prior to the call. This method removes localized resources from > {{pending}}, therefore we should check the return value, and gives DIE action > when it returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3035) create a test-only backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274761#comment-14274761 ] Sunil G commented on YARN-3035: --- Hi [~sjlee0] As what I understood from the design document, HBASE or HDFS will be the real storage. And the focus of this jira is to have a test/dev perspective storage which can emulate the real storage. If you have not started working in this, I would like to participate on same and take up this. Other wise I will definitely try to share my thoughts for reviews. Thank you. > create a test-only backing storage implementation for ATS writes > > > Key: YARN-3035 > URL: https://issues.apache.org/jira/browse/YARN-3035 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee > > Per design in YARN-2928, create a test-only bare bone backing storage > implementation for ATS writes. > We could consider something like a no-op or in-memory storage strictly for > development and testing purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage
[ https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274688#comment-14274688 ] Hadoop QA commented on YARN-2984: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691840/yarn-2984-1.patch against trunk revision 5188153. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6317//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6317//console This message is automatically generated. > Metrics for container's actual memory usage > --- > > Key: YARN-2984 > URL: https://issues.apache.org/jira/browse/YARN-2984 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-2984-1.patch, yarn-2984-prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track memory usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2984) Metrics for container's actual memory usage
[ https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2984: --- Attachment: yarn-2984-1.patch Thanks for the feedback, Anubhav. Uploading new patch that adds the following: # Rename to ContainerMetrics # Config guard to enable/disable container metrics publishing # Config and behavior to publish metrics periodically, instead of waiting until the container finishes # TestContainerMetrics tests the metrics publishing logic > Metrics for container's actual memory usage > --- > > Key: YARN-2984 > URL: https://issues.apache.org/jira/browse/YARN-2984 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-2984-1.patch, yarn-2984-prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track memory usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3031) create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3031: -- Assignee: Varun Saxena > create backing storage write interface for ATS writers > -- > > Key: YARN-3031 > URL: https://issues.apache.org/jira/browse/YARN-3031 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Per design in YARN-2928, come up with the interface for the ATS writer to > write to various backing storages. The interface should be created to capture > the right level of abstractions so that it will enable all backing storage > implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3053) review and implement for property security in ATS v.2
[ https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-3053: - Assignee: Zhijie Shen > review and implement for property security in ATS v.2 > - > > Key: YARN-3053 > URL: https://issues.apache.org/jira/browse/YARN-3053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > > Per design in YARN-2928, we want to evaluate and review the system for > security, and ensure proper security in the system. > This includes proper authentication, token management, access control, and > any other relevant security aspects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274597#comment-14274597 ] Vinod Kumar Vavilapalli commented on YARN-2928: --- Thanks for the design summary, Sangjin. For public disclosure, a bunch of YARN community members synced offline about this design discussion - tx to Joep Rottinghuis, Karthik Kambatla,, Li Lu, Mayank Bansal, Maysam Yabandeh, Mohammad Kamrul Islam, Ram Venkatesh, Robert Kanter, Sangjin Lee, Vinod Kumar Vavilapalli, Vrushali Channapattan, Zhijie Shen in no order. Overall I'd like to push other efforts like YARN-2141, YARN-1012 to fit into the current architecture being proposed in this JIRA. This is so that we don't duplicate stats collection between efforts. One suggestion to the proposal - for the first cut, instead of spawning per AM container (Section 4.1) to represent an Application Level Aggregator (call it ALA), we can have a per-node agent which serves multiple AMs running on the same node. Nothing else changes - NMs sending data still have to discover the ALA, only the ALAs can send data to the underlying storage etc. It's just that the ALA is not a special container to begin with. The advantages are that we can postpone the hard part of scheduling, fault-tolerance of a special ALA container till after we wire everything else. Even long term, for small apps in a cluster, ALA running inside/side-by-side of NM with rate-limits reduces the 'heaviness' of the system. This per-node agent is very useful outside of this context too. An additional shortcut for now is to also potentially embed the ALA inside NM using say Aux Services. Obviously the biggest problem with a single ALA per node or embedded ALA per node is resource-management - which we can defer for now given it still runs system code and till we have everything else figured out. On the process side, I propose we do work on a branch with a goal to borrow whatever code is possible to from current Timeline service. Regarding timelines (pun intended) I'd like to think that we have a first alpha release of this as part of say 2.8. > Application Timeline Server (ATS) next gen: phase 1 > --- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call
[ https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274591#comment-14274591 ] Hudson commented on YARN-2643: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6850 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6850/]) YARN-2643. Don't create a new DominantResourceCalculator on every FairScheduler.allocate call. (kasha via rkanter) (rkanter: rev 51881535e659940b1b332d0c5952ee1f9958cc7f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt > Don't create a new DominantResourceCalculator on every FairScheduler.allocate > call > -- > > Key: YARN-2643 > URL: https://issues.apache.org/jira/browse/YARN-2643 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sandy Ryza >Assignee: Karthik Kambatla >Priority: Trivial > Fix For: 2.7.0 > > Attachments: yarn-2643-1.patch, yarn-2643.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call
[ https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274581#comment-14274581 ] Robert Kanter commented on YARN-2643: - +1 > Don't create a new DominantResourceCalculator on every FairScheduler.allocate > call > -- > > Key: YARN-2643 > URL: https://issues.apache.org/jira/browse/YARN-2643 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sandy Ryza >Assignee: Karthik Kambatla >Priority: Trivial > Attachments: yarn-2643-1.patch, yarn-2643.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3051) create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3051: -- Assignee: Varun Saxena > create backing storage read interface for ATS readers > - > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3043) create ATS configuration, metadata, etc. as part of entities
[ https://issues.apache.org/jira/browse/YARN-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3043: -- Assignee: Varun Saxena > create ATS configuration, metadata, etc. as part of entities > > > Key: YARN-3043 > URL: https://issues.apache.org/jira/browse/YARN-3043 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Per design in YARN-2928, create APIs for configuration, metadata, etc. and > integrate them into entities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3038) handle ATS writer failure scenarios
[ https://issues.apache.org/jira/browse/YARN-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3038: -- Assignee: Varun Saxena > handle ATS writer failure scenarios > --- > > Key: YARN-3038 > URL: https://issues.apache.org/jira/browse/YARN-3038 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Per design in YARN-2928, consider various ATS writer failure scenarios, and > implement proper handling. > For example, ATS writers may fail and exit due to OOM. It should be retried a > certain number of times in that case. We also need to tie fatal ATS writer > failures (after exhausting all retries) to the application failure, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2928: -- Description: We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. was: We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be address. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. > Application Timeline Server (ATS) next gen: phase 1 > --- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3047) set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3047: -- Assignee: Varun Saxena > set up ATS reader with basic request serving structure and lifecycle > > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3050) implement new flow-based ATS queries in the new ATS design
[ https://issues.apache.org/jira/browse/YARN-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3050: -- Assignee: Varun Saxena > implement new flow-based ATS queries in the new ATS design > -- > > Key: YARN-3050 > URL: https://issues.apache.org/jira/browse/YARN-3050 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Implement new flow-based ATS queries in the new ATS design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3049) implement existing ATS queries in the new ATS design
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3049: -- Assignee: Varun Saxena > implement existing ATS queries in the new ATS design > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3048) handle how to set up and start/stop ATS reader instances
[ https://issues.apache.org/jira/browse/YARN-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3048: -- Assignee: Varun Saxena > handle how to set up and start/stop ATS reader instances > > > Key: YARN-3048 > URL: https://issues.apache.org/jira/browse/YARN-3048 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Per design in YARN-2928, come up with a way to set up and start/stop ATS > reader instances. > This should allow setting up multiple instances and managing user traffic to > those instances. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3037) create HBase cluster backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-3037: - Assignee: Zhijie Shen > create HBase cluster backing storage implementation for ATS writes > -- > > Key: YARN-3037 > URL: https://issues.apache.org/jira/browse/YARN-3037 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > > Per design in YARN-2928, create a backing storage implementation for ATS > writes based on a full HBase cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3036) create standalone HBase backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-3036: - Assignee: Zhijie Shen > create standalone HBase backing storage implementation for ATS writes > - > > Key: YARN-3036 > URL: https://issues.apache.org/jira/browse/YARN-3036 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > > Per design in YARN-2928, create a (default) standalone HBase backing storage > implementation for ATS writes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3042) create ATS metrics API
[ https://issues.apache.org/jira/browse/YARN-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-3042: - Assignee: Zhijie Shen > create ATS metrics API > -- > > Key: YARN-3042 > URL: https://issues.apache.org/jira/browse/YARN-3042 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > > Per design in YARN-2928, create the ATS metrics API and integrate it into the > entities. > The concept may be based on the existing hadoop metrics, but we want to make > sure we have something that would satisfy all ATS use cases. > It also needs to capture whether a metric should be aggregated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call
[ https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274562#comment-14274562 ] Karthik Kambatla commented on YARN-2643: I think the change is too trivial to add a new unit test. > Don't create a new DominantResourceCalculator on every FairScheduler.allocate > call > -- > > Key: YARN-2643 > URL: https://issues.apache.org/jira/browse/YARN-2643 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sandy Ryza >Assignee: Karthik Kambatla >Priority: Trivial > Attachments: yarn-2643-1.patch, yarn-2643.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call
[ https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274546#comment-14274546 ] Hadoop QA commented on YARN-2643: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691784/yarn-2643-1.patch against trunk revision f3507fa. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6316//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6316//console This message is automatically generated. > Don't create a new DominantResourceCalculator on every FairScheduler.allocate > call > -- > > Key: YARN-2643 > URL: https://issues.apache.org/jira/browse/YARN-2643 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sandy Ryza >Assignee: Karthik Kambatla >Priority: Trivial > Attachments: yarn-2643-1.patch, yarn-2643.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3033) implement NM starting the ATS writer companion
[ https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3033: --- Assignee: Naganarasimha G R > implement NM starting the ATS writer companion > -- > > Key: YARN-3033 > URL: https://issues.apache.org/jira/browse/YARN-3033 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > > Per design in YARN-2928, implement node managers starting the ATS writer > companion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3034) implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3034: --- Assignee: Naganarasimha G R > implement RM starting its ATS writer > > > Key: YARN-3034 > URL: https://issues.apache.org/jira/browse/YARN-3034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > > Per design in YARN-2928, implement resource managers starting their own ATS > writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3044) implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3044: --- Assignee: Naganarasimha G R > implement RM writing app lifecycle events to ATS > > > Key: YARN-3044 > URL: https://issues.apache.org/jira/browse/YARN-3044 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > > Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274538#comment-14274538 ] Robert Kanter commented on YARN-3039: - We should look into using YARN-913's Yarn Service Registry for this, assuming it doesn't require an external ZooKeeper to be setup; at the very least, we should reuse Curator/ZooKeeper code from it if possible. > implement ATS writer service discovery > -- > > Key: YARN-3039 > URL: https://issues.apache.org/jira/browse/YARN-3039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Robert Kanter > > Per design in YARN-2928, implement ATS writer service discovery. This is > essential for off-node clients to send writes to the right ATS writer. This > should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3045) implement NM writing container lifecycle events and container system metrics to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3045: --- Assignee: Naganarasimha G R > implement NM writing container lifecycle events and container system metrics > to ATS > --- > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3053) review and implement for property security in ATS v.2
Sangjin Lee created YARN-3053: - Summary: review and implement for property security in ATS v.2 Key: YARN-3053 URL: https://issues.apache.org/jira/browse/YARN-3053 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, we want to evaluate and review the system for security, and ensure proper security in the system. This includes proper authentication, token management, access control, and any other relevant security aspects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3041) create the ATS entity/event API
[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter reassigned YARN-3041: --- Assignee: Robert Kanter > create the ATS entity/event API > --- > > Key: YARN-3041 > URL: https://issues.apache.org/jira/browse/YARN-3041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Robert Kanter > > Per design in YARN-2928, create the ATS entity and events API. > Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, > flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3046) implement MapReduce AM writing some MR metrics to ATS
[ https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter reassigned YARN-3046: --- Assignee: Robert Kanter > implement MapReduce AM writing some MR metrics to ATS > - > > Key: YARN-3046 > URL: https://issues.apache.org/jira/browse/YARN-3046 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Robert Kanter > > Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes > written) and have the MR AM write the framework-specific metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3052) provide a very simple POC html ATS UI
Sangjin Lee created YARN-3052: - Summary: provide a very simple POC html ATS UI Key: YARN-3052 URL: https://issues.apache.org/jira/browse/YARN-3052 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee As part of accomplishing a minimum viable product, we want to be able to show some UI in html (however crude it is). This subtask calls for creating a barebones UI to do that. This should be replaced later with a better-designed and implemented proper UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3051) create backing storage read interface for ATS readers
Sangjin Lee created YARN-3051: - Summary: create backing storage read interface for ATS readers Key: YARN-3051 URL: https://issues.apache.org/jira/browse/YARN-3051 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, create backing storage read interface that can be implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3039) implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter reassigned YARN-3039: --- Assignee: Robert Kanter > implement ATS writer service discovery > -- > > Key: YARN-3039 > URL: https://issues.apache.org/jira/browse/YARN-3039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Robert Kanter > > Per design in YARN-2928, implement ATS writer service discovery. This is > essential for off-node clients to send writes to the right ATS writer. This > should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3050) implement new flow-based ATS queries in the new ATS design
Sangjin Lee created YARN-3050: - Summary: implement new flow-based ATS queries in the new ATS design Key: YARN-3050 URL: https://issues.apache.org/jira/browse/YARN-3050 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Implement new flow-based ATS queries in the new ATS design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3049) implement existing ATS queries in the new ATS design
Sangjin Lee created YARN-3049: - Summary: implement existing ATS queries in the new ATS design Key: YARN-3049 URL: https://issues.apache.org/jira/browse/YARN-3049 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3048) handle how to set up and start/stop ATS reader instances
Sangjin Lee created YARN-3048: - Summary: handle how to set up and start/stop ATS reader instances Key: YARN-3048 URL: https://issues.apache.org/jira/browse/YARN-3048 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, come up with a way to set up and start/stop ATS reader instances. This should allow setting up multiple instances and managing user traffic to those instances. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3047) set up ATS reader with basic request serving structure and lifecycle
Sangjin Lee created YARN-3047: - Summary: set up ATS reader with basic request serving structure and lifecycle Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3046) implement MapReduce AM writing some MR metrics to ATS
Sangjin Lee created YARN-3046: - Summary: implement MapReduce AM writing some MR metrics to ATS Key: YARN-3046 URL: https://issues.apache.org/jira/browse/YARN-3046 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes written) and have the MR AM write the framework-specific metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3045) implement NM writing container lifecycle events and container system metrics to ATS
Sangjin Lee created YARN-3045: - Summary: implement NM writing container lifecycle events and container system metrics to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3040) implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3040: --- Assignee: Naganarasimha G R > implement client-side API for handling flows > > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3044) implement RM writing app lifecycle events to ATS
Sangjin Lee created YARN-3044: - Summary: implement RM writing app lifecycle events to ATS Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3043) create ATS configuration, metadata, etc. as part of entities
Sangjin Lee created YARN-3043: - Summary: create ATS configuration, metadata, etc. as part of entities Key: YARN-3043 URL: https://issues.apache.org/jira/browse/YARN-3043 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, create APIs for configuration, metadata, etc. and integrate them into entities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274515#comment-14274515 ] Naganarasimha G R commented on YARN-2928: - Hi [~sjlee0], i have assigned YARN-3032 to my name, if you want me to take some other subtask please inform? > Application Timeline Server (ATS) next gen: phase 1 > --- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be address. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3042) create ATS metrics API
Sangjin Lee created YARN-3042: - Summary: create ATS metrics API Key: YARN-3042 URL: https://issues.apache.org/jira/browse/YARN-3042 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, create the ATS metrics API and integrate it into the entities. The concept may be based on the existing hadoop metrics, but we want to make sure we have something that would satisfy all ATS use cases. It also needs to capture whether a metric should be aggregated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3032) implement ATS writer functionality to serve ATS readers' requests for live apps
[ https://issues.apache.org/jira/browse/YARN-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274512#comment-14274512 ] Naganarasimha G R commented on YARN-3032: - Hi [~sjlee0], I would like to work on this issue hence i have assigned this to myself, but if you have already started working on this or you want to take up this issue, please feel free to assign it back to yourself. > implement ATS writer functionality to serve ATS readers' requests for live > apps > --- > > Key: YARN-3032 > URL: https://issues.apache.org/jira/browse/YARN-3032 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > > Per design in YARN-2928, implement the functionality in ATS writer to serve > data for live apps coming from ATS readers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3041) create the ATS entity/event API
Sangjin Lee created YARN-3041: - Summary: create the ATS entity/event API Key: YARN-3041 URL: https://issues.apache.org/jira/browse/YARN-3041 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, create the ATS entity and events API. Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3032) implement ATS writer functionality to serve ATS readers' requests for live apps
[ https://issues.apache.org/jira/browse/YARN-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3032: --- Assignee: Naganarasimha G R > implement ATS writer functionality to serve ATS readers' requests for live > apps > --- > > Key: YARN-3032 > URL: https://issues.apache.org/jira/browse/YARN-3032 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > > Per design in YARN-2928, implement the functionality in ATS writer to serve > data for live apps coming from ATS readers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3040) implement client-side API for handling flows
Sangjin Lee created YARN-3040: - Summary: implement client-side API for handling flows Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3039) implement ATS writer service discovery
Sangjin Lee created YARN-3039: - Summary: implement ATS writer service discovery Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3038) handle ATS writer failure scenarios
Sangjin Lee created YARN-3038: - Summary: handle ATS writer failure scenarios Key: YARN-3038 URL: https://issues.apache.org/jira/browse/YARN-3038 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, consider various ATS writer failure scenarios, and implement proper handling. For example, ATS writers may fail and exit due to OOM. It should be retried a certain number of times in that case. We also need to tie fatal ATS writer failures (after exhausting all retries) to the application failure, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3037) create HBase cluster backing storage implementation for ATS writes
Sangjin Lee created YARN-3037: - Summary: create HBase cluster backing storage implementation for ATS writes Key: YARN-3037 URL: https://issues.apache.org/jira/browse/YARN-3037 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, create a backing storage implementation for ATS writes based on a full HBase cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274497#comment-14274497 ] Naganarasimha G R commented on YARN-2928: - Hi Sangjin, i would like to work on this feature, can i take some task of this jira ? > Application Timeline Server (ATS) next gen: phase 1 > --- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be address. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3036) create standalone HBase backing storage implementation for ATS writes
Sangjin Lee created YARN-3036: - Summary: create standalone HBase backing storage implementation for ATS writes Key: YARN-3036 URL: https://issues.apache.org/jira/browse/YARN-3036 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, create a (default) standalone HBase backing storage implementation for ATS writes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3035) create a test-only backing storage implementation for ATS writes
Sangjin Lee created YARN-3035: - Summary: create a test-only backing storage implementation for ATS writes Key: YARN-3035 URL: https://issues.apache.org/jira/browse/YARN-3035 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, create a test-only bare bone backing storage implementation for ATS writes. We could consider something like a no-op or in-memory storage strictly for development and testing purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3034) implement RM starting its ATS writer
Sangjin Lee created YARN-3034: - Summary: implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3033) implement NM starting the ATS writer companion
Sangjin Lee created YARN-3033: - Summary: implement NM starting the ATS writer companion Key: YARN-3033 URL: https://issues.apache.org/jira/browse/YARN-3033 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, implement node managers starting the ATS writer companion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3032) implement ATS writer functionality to serve ATS readers' requests for live apps
Sangjin Lee created YARN-3032: - Summary: implement ATS writer functionality to serve ATS readers' requests for live apps Key: YARN-3032 URL: https://issues.apache.org/jira/browse/YARN-3032 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, implement the functionality in ATS writer to serve data for live apps coming from ATS readers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3031) create backing storage write interface for ATS writers
Sangjin Lee created YARN-3031: - Summary: create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle
Sangjin Lee created YARN-3030: - Summary: set up ATS writer with basic request serving structure and lifecycle Key: YARN-3030 URL: https://issues.apache.org/jira/browse/YARN-3030 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee Per design in YARN-2928, create an ATS writer as a service, and implement the basic service structure including the lifecycle management. Also, as part of this JIRA, we should come up with the ATS client API for sending requests to this ATS writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2932) Add entry for preemption setting to queue status screen and startup/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274448#comment-14274448 ] Wangda Tan commented on YARN-2932: -- 1. Rename isQueuePreemptable to getQueuePreemptable for getter/setter consistentency in CapacitySchedulerConfiguration 2. Should consider queue reinitialize when queue preemptable in configuration changes (See {{TestQueueParsing}}). And it's best to add a test for verify that. 3. It's better to remove the defaultVal parameter in CapacitySchedulerConfiguration.isPreemptable: {code} public boolean isQueuePreemptable(String queue, boolean defaultVal) {code} And the default_value should be placed in CapacitySchedulerConfiguration, like other queue configuration options. I understand what you trying to do is moving some logic from queue to CapacitySchedulerConfiguration, but I still think it's better to keep the CapacitySchedulerConfiguration simply gets some values from configuration file. Thanks, > Add entry for preemption setting to queue status screen and startup/refresh > logging > --- > > Key: YARN-2932 > URL: https://issues.apache.org/jira/browse/YARN-2932 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.7.0 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt > > > YARN-2056 enables the ability to turn preemption on or off on a per-queue > level. This JIRA will provide the preemption status for each queue in the > {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue > refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call
[ https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2643: --- Attachment: yarn-2643-1.patch Rebased patch. > Don't create a new DominantResourceCalculator on every FairScheduler.allocate > call > -- > > Key: YARN-2643 > URL: https://issues.apache.org/jira/browse/YARN-2643 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sandy Ryza >Assignee: Karthik Kambatla >Priority: Trivial > Attachments: yarn-2643-1.patch, yarn-2643.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call
[ https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274387#comment-14274387 ] Hadoop QA commented on YARN-2643: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672848/yarn-2643.patch against trunk revision f3507fa. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6315//console This message is automatically generated. > Don't create a new DominantResourceCalculator on every FairScheduler.allocate > call > -- > > Key: YARN-2643 > URL: https://issues.apache.org/jira/browse/YARN-2643 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sandy Ryza >Assignee: Karthik Kambatla >Priority: Trivial > Attachments: yarn-2643.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call
[ https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274373#comment-14274373 ] Robert Kanter commented on YARN-2643: - LGTM. [~kasha], can you rebase the patch? It doesn't apply cleanly anymore. > Don't create a new DominantResourceCalculator on every FairScheduler.allocate > call > -- > > Key: YARN-2643 > URL: https://issues.apache.org/jira/browse/YARN-2643 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sandy Ryza >Assignee: Karthik Kambatla >Priority: Trivial > Attachments: yarn-2643.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274369#comment-14274369 ] Ray Chiang commented on YARN-2868: -- Okay. Let me look at the changes and submit a new patch. Thanks for the comments [~kasha] and [~leftnoteasy]. > Add metric for initial container launch time > > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274363#comment-14274363 ] Wangda Tan commented on YARN-2868: -- Thanks for explanation, #3 and [~kasha]'s comment make sense to me. I would prefer to rename following fields for better accuracy: 1) Rename {{allocationRequestStart}} to {{firstAllocationRequestSentTime}} (I misunderstood in my last comment) 2) Rename {{firstContainerAllocation}} to {{firstContainerAllocatedTime}} And rename setAllocationRequestStart to "trySet.."(or other better name), what it tries to do is just trying to do the set. Return boolean for setAllocationRequestStart make two methods consistent. Future caller may need it. Thoughts? > Add metric for initial container launch time > > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274317#comment-14274317 ] Karthik Kambatla commented on YARN-2868: bq. implementation of scheduler should make sure that I would prefer thread-safety be handled at the callee, and not be deferred to the caller. > Add metric for initial container launch time > > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274308#comment-14274308 ] Ray Chiang commented on YARN-2868: -- RE: Comment 1) Using AtomicLong seemed like the least error prone approach. Since we only expect this to change once, are you asking to implement this as a lock-free comparison and only do the locking when updating the value? RE: Comment 2) I'm fine with changing names for accuracy. RE: Comment 3) The goal for this metric is to measure the functionality plus overhead allocation time (for some definition of overhead). I believe your version will differ in the following ways from the current patch: A) Some overhead in FairScheduler#allocate() will be missed (not necessarily a bad thing or big deal) B) Your suggestion will not work for unmanaged applications. Were you making your suggestion just for managed applications? Let me know your thoughts on this. > Add metric for initial container launch time > > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator
[ https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned YARN-2716: -- Assignee: Karthik Kambatla > Refactor ZKRMStateStore retry code with Apache Curator > -- > > Key: YARN-2716 > URL: https://issues.apache.org/jira/browse/YARN-2716 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Karthik Kambatla > > Per suggestion by [~kasha] in YARN-2131, it's nice to use curator to > simplify the retry logic in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274252#comment-14274252 ] Wangda Tan commented on YARN-2868: -- Some comments: 1. It seems overkill to me to use AtomicLong to ensure allocation request sent time will only be set once -- implementation of scheduler should make sure that. If lock-free access is required, use volatile is enough. 2. Rename {{allocationRequestStart}} to {{amContainerAllocationRequestSentTime}}? 3. It's not necessary to track first AM-container allocation in SchedulerApplicationAttempt. When scheduler knows AM container allocated, it should use AM-Container.getCreationTime() and amAllocationRequestSent time to get the delay. > Add metric for initial container launch time > > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling
[ https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274171#comment-14274171 ] Wei Yan commented on YARN-2791: --- [~srivas], YARN-2618 is a simple solution that limits the maximum allowed running containers on each node. Each DataNode is configured with a maximum disk value, and YARN treats each container's disk request is 1. > Add Disk as a resource for scheduling > - > > Key: YARN-2791 > URL: https://issues.apache.org/jira/browse/YARN-2791 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Swapnil Daingade >Assignee: Yuliya Feldman > Attachments: DiskDriveAsResourceInYARN.pdf > > > Currently, the number of disks present on a node is not considered a factor > while scheduling containers on that node. Having large amount of memory on a > node can lead to high number of containers being launched on that node, all > of which compete for I/O bandwidth. This multiplexing of I/O across > containers can lead to slower overall progress and sub-optimal resource > utilization as containers starved for I/O bandwidth hold on to other > resources like cpu and memory. This problem can be solved by considering disk > as a resource and including it in deciding how many containers can be > concurrently run on a node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2957) Create unit test to automatically compare YarnConfiguration and yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274170#comment-14274170 ] Ray Chiang commented on YARN-2957: -- Thanks! This should let me start documenting any missing yarn-default.xml properties. > Create unit test to automatically compare YarnConfiguration and > yarn-default.xml > > > Key: YARN-2957 > URL: https://issues.apache.org/jira/browse/YARN-2957 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Minor > Labels: supportability > Fix For: 2.7.0 > > Attachments: YARN-2957.001.patch > > > Create a unit test that will automatically compare the fields in > YarnConfiguration and yarn-default.xml. It should throw an error if a > property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274162#comment-14274162 ] Hadoop QA commented on YARN-2868: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688084/YARN-2868.005.patch against trunk revision ae7bf31. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6313//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6313//console This message is automatically generated. > Add metric for initial container launch time > > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2957) Create unit test to automatically compare YarnConfiguration and yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274131#comment-14274131 ] Hudson commented on YARN-2957: -- FAILURE: Integrated in Hadoop-trunk-Commit #6844 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6844/]) YARN-2957. Create unit test to automatically compare YarnConfiguration and yarn-default.xml. (rchiang via rkanter) (rkanter: rev f45163191583eadcfbe0df233a3185fd1b2b78f3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/CHANGES.txt > Create unit test to automatically compare YarnConfiguration and > yarn-default.xml > > > Key: YARN-2957 > URL: https://issues.apache.org/jira/browse/YARN-2957 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Minor > Labels: supportability > Fix For: 2.7.0 > > Attachments: YARN-2957.001.patch > > > Create a unit test that will automatically compare the fields in > YarnConfiguration and yarn-default.xml. It should throw an error if a > property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2932) Add entry for preemption setting to queue status screen and startup/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274118#comment-14274118 ] Hadoop QA commented on YARN-2932: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691699/YARN-2932.v3.txt against trunk revision ae7bf31. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/6312//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6312//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6312//console This message is automatically generated. > Add entry for preemption setting to queue status screen and startup/refresh > logging > --- > > Key: YARN-2932 > URL: https://issues.apache.org/jira/browse/YARN-2932 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.7.0 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt > > > YARN-2056 enables the ability to turn preemption on or off on a per-queue > level. This JIRA will provide the preemption status for each queue in the > {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue > refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3019) Enable RM work-preserving restart by default
[ https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274102#comment-14274102 ] Hadoop QA commented on YARN-3019: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691708/YARN-3019.1.patch against trunk revision ae7bf31. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6314//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6314//console This message is automatically generated. > Enable RM work-preserving restart by default > - > > Key: YARN-3019 > URL: https://issues.apache.org/jira/browse/YARN-3019 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3019.1.patch > > > The proposal is to set > "yarn.resourcemanager.work-preserving-recovery.enabled" to true by default > to flip recovery mode to work-preserving recovery from non-work-preserving > recovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274073#comment-14274073 ] Wangda Tan commented on YARN-2868: -- [~rchiang], Thanks for working on this. [~rkanter], could you wait for a moment before committing it? Thanks, > Add metric for initial container launch time > > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2957) Create unit test to automatically compare YarnConfiguration and yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274064#comment-14274064 ] Robert Kanter commented on YARN-2957: - +1 Not sure what's wrong with Jenkins, but it's a test-only change and passes on my machine. > Create unit test to automatically compare YarnConfiguration and > yarn-default.xml > > > Key: YARN-2957 > URL: https://issues.apache.org/jira/browse/YARN-2957 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Minor > Labels: supportability > Attachments: YARN-2957.001.patch > > > Create a unit test that will automatically compare the fields in > YarnConfiguration and yarn-default.xml. It should throw an error if a > property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274040#comment-14274040 ] Ray Chiang commented on YARN-2868: -- Great. Thanks! > Add metric for initial container launch time > > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274035#comment-14274035 ] Hadoop QA commented on YARN-3026: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691682/YARN-3026.1.patch against trunk revision 5b0d060. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6310//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6310//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6310//console This message is automatically generated. > Move application-specific container allocation logic from LeafQueue to > FiCaSchedulerApp > --- > > Key: YARN-3026 > URL: https://issues.apache.org/jira/browse/YARN-3026 > Project: Hadoop YARN > Issue Type: Task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3026.1.patch > > > Have a discussion with [~vinodkv] and [~jianhe]: > In existing Capacity Scheduler, all allocation logics of and under LeafQueue > are located in LeafQueue.java in implementation. To make a cleaner scope of > LeafQueue, we'd better move some of them to FiCaSchedulerApp. > Ideal scope of LeafQueue should be: when a LeafQueue receives some resources > from ParentQueue (like 15% of cluster resource), and it distributes resources > to children apps, and it should be agnostic to internal logic of children > apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how > application allocating container from given resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274029#comment-14274029 ] Wangda Tan commented on YARN-2637: -- [~cwelch], Thanks for updating, reviewed latest patch, some comments: *LeafQueue.java:* 1) getUserAMResourceLimit() I think according to how we compute user-limit in LeafQueue, should we do it following way to compute user-am-resource-limit? {code} user-am-resource-limit = am-resource-percent * min( queue-max-capacity * max(user-limit, 1/#active-user), queue-configured-capacity * user-limit-factor)? {code} Thoughts? *FiCaSchedulerApp.java:* 2) Is it necessary to get scheduler instance just for minimum allocation, do you think is it better to just get minimum allocation using: {code} minAllocMb = rmContext.getConf().getInt( YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB); {code} Which can avoid creating mocked scheduler in some of test changes, including changes in RMContext.java. *TestApplicationLimits:* 3) I think RMContext.getRMApps can be updated directly, it's no need to spy it, I suggest avoid spying objects in tests as much as we can. It's not very understandable and easily cause problems when we update implementations. 4) Such spy invokes are unnecessary? {code} FiCaSchedulerApp app_0_0 = spy(new FiCaSchedulerApp(appAttemptId_0_0, user_0, queue, {code} 5) Nobody is using it, should remove it. {code} private FiCaSchedulerApp getMockApplication(int appId, String user) { return getMockApplication(appId, user, Resource.newInstance(0, 0)); } {code} Wangda > maximum-am-resource-percent could be respected for both LeafQueue/User when > trying to activate applications. > > > Key: YARN-2637 > URL: https://issues.apache.org/jira/browse/YARN-2637 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Wangda Tan >Assignee: Craig Welch >Priority: Critical > Attachments: YARN-2637.0.patch, YARN-2637.1.patch, > YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, > YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, > YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, > YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, > YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, > YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, > YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.36.patch, > YARN-2637.38.patch, YARN-2637.39.patch, YARN-2637.6.patch, YARN-2637.7.patch, > YARN-2637.9.patch > > > Currently, number of AM in leaf queue will be calculated in following way: > {code} > max_am_resource = queue_max_capacity * maximum_am_resource_percent > #max_am_number = max_am_resource / minimum_allocation > #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor > {code} > And when submit new application to RM, it will check if an app can be > activated in following way: > {code} > for (Iterator i=pendingApplications.iterator(); > i.hasNext(); ) { > FiCaSchedulerApp application = i.next(); > > // Check queue limit > if (getNumActiveApplications() >= getMaximumActiveApplications()) { > break; > } > > // Check user limit > User user = getUser(application.getUser()); > if (user.getActiveApplications() < > getMaximumActiveApplicationsPerUser()) { > user.activateApplication(); > activeApplications.add(application); > i.remove(); > LOG.info("Application " + application.getApplicationId() + > " from user: " + application.getUser() + > " activated in queue: " + getQueueName()); > } > } > {code} > An example is, > If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum > resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be > launched is 200, and if user uses 5M for each AM (> minimum_allocation). All > apps can still be activated, and it will occupy all resource of a queue > instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274023#comment-14274023 ] Robert Kanter commented on YARN-2868: - Looks good to me. The last Jenkins run was a while ago and had some failures, so I've just kicked off another Jenkins run. > Add metric for initial container launch time > > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3019) Enable RM work-preserving restart by default
[ https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3019: -- Attachment: YARN-3019.1.patch Uploaded a patch to flip "yarn.resourcemanager.work-preserving-recovery.enabled" from false to true. > Enable RM work-preserving restart by default > - > > Key: YARN-3019 > URL: https://issues.apache.org/jira/browse/YARN-3019 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3019.1.patch > > > The proposal is to set > "yarn.resourcemanager.work-preserving-recovery.enabled" to true by default > to flip recovery mode to work-preserving recovery from non-work-preserving > recovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3019) Enable RM work-preserving restart by default
[ https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274013#comment-14274013 ] Jian He commented on YARN-3019: --- bq. Can you explain the behavior different with "yarn.resourcemanager.work-preserving-recovery.enabled" to be true while "yarn.resourcemanager.recovery.enabled" to be false? "yarn.resourcemanager.recovery.enabled" controls disabling/enabling RM restart feature, while "yarn.resourcemanager.work-preserving-recovery.enabled" controls the recovery mode whether it's non-work-preserving or work-preserving. The final goal is to support work-preserving recovery only. So the config "yarn.resourcemanager.work-preserving-recovery.enabled" is not needed any more. > Enable RM work-preserving restart by default > - > > Key: YARN-3019 > URL: https://issues.apache.org/jira/browse/YARN-3019 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > > The proposal is to set > "yarn.resourcemanager.work-preserving-recovery.enabled" to true by default > to flip recovery mode to work-preserving recovery from non-work-preserving > recovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling
[ https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274007#comment-14274007 ] M. C. Srivas commented on YARN-2791: The scope in https://issues.apache.org/jira/browse/YARN-2139 is just too bloated. We have this problem immediately with YARN overprovisioning since it doesn't take into account how performance is impacted by the number of disks on each node. We need this fix now, not later. YARN-2139 is too elaborate, and is trying to do too much. On the the other hand, it doesn't take into account how running DataNodes on the same spindles will impact shuffle performance. I would say get this piece of work done, and we can wait on YARN-2139 whenever it gets done. > Add Disk as a resource for scheduling > - > > Key: YARN-2791 > URL: https://issues.apache.org/jira/browse/YARN-2791 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Swapnil Daingade >Assignee: Yuliya Feldman > Attachments: DiskDriveAsResourceInYARN.pdf > > > Currently, the number of disks present on a node is not considered a factor > while scheduling containers on that node. Having large amount of memory on a > node can lead to high number of containers being launched on that node, all > of which compete for I/O bandwidth. This multiplexing of I/O across > containers can lead to slower overall progress and sub-optimal resource > utilization as containers starved for I/O bandwidth hold on to other > resources like cpu and memory. This problem can be solved by considering disk > as a resource and including it in deciding how many containers can be > concurrently run on a node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274003#comment-14274003 ] Varun Saxena commented on YARN-644: --- Kindly review > Basic null check is not performed on passed in arguments before using them in > ContainerManagerImpl.startContainer > - > > Key: YARN-644 > URL: https://issues.apache.org/jira/browse/YARN-644 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Varun Saxena >Priority: Minor > Attachments: YARN-644.001.patch, YARN-644.002.patch > > > I see that validation/ null check is not performed on passed in parameters. > Ex. tokenId.getContainerID().getApplicationAttemptId() inside > ContainerManagerImpl.authorizeRequest() > I guess we should add these checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274002#comment-14274002 ] Hadoop QA commented on YARN-644: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691691/YARN-644.002.patch against trunk revision 5b0d060. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6311//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6311//console This message is automatically generated. > Basic null check is not performed on passed in arguments before using them in > ContainerManagerImpl.startContainer > - > > Key: YARN-644 > URL: https://issues.apache.org/jira/browse/YARN-644 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Varun Saxena >Priority: Minor > Attachments: YARN-644.001.patch, YARN-644.002.patch > > > I see that validation/ null check is not performed on passed in parameters. > Ex. tokenId.getContainerID().getApplicationAttemptId() inside > ContainerManagerImpl.authorizeRequest() > I guess we should add these checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3020) n similar addContainerRequest()s produce n*(n+1)/2 containers
[ https://issues.apache.org/jira/browse/YARN-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273990#comment-14273990 ] Peter D Kirchner commented on YARN-3020: It looks like the bug may have come in with the code reorganization of r1494017 on 2013-06-18. I did not follow the log past this introduction of AMRMClient.java in its present form and location. In my code on my system (and I am supposing also in yours) each addContainerRequest() is taking about a second even without a sleep. The heartbeat I set in createAMRMClientAsync() was 1000 milliseconds (1 second), so I set it to 10 seconds to rule out that the addContainerRequest() was somehow synchronous with allocate(). FWIW, for 10 containers requested, I got 17 containers with a heartbeat of 10 seconds. One heartbeat call to allocate() produced 7 containers, the next call produced 10. Each heartbeat on which the AMRMClient detects a change (in the number of containers the AM has "add"ed) that needs to be sent to the RM, it sends the then-current total not the diff. Limiting the AM to ~1 container request per second is impractical, so the bug is potentially initially helpful because the application does not have to wait 2 minutes to assemble 100 containers, all it needs to do is call addContainerRequest() about 15 times, taking about 15 seconds with a 1 second heartbeat. The addContainerRequest() performance will need to be improved, or the limitation of 1 container per addContainerRequest() introduced in r1503960 2013-07-16 will need to be reversed. But by the time one naively requests 100 containers, and get 5,050, The bug is probably hurting application and cluster performance. Maybe a lot. > n similar addContainerRequest()s produce n*(n+1)/2 containers > - > > Key: YARN-3020 > URL: https://issues.apache.org/jira/browse/YARN-3020 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2 >Reporter: Peter D Kirchner > Original Estimate: 24h > Remaining Estimate: 24h > > BUG: If the application master calls addContainerRequest() n times, but with > the same priority, I get up to 1+2+3+...+n containers = n*(n+1)/2 . The most > containers are requested when the interval between calls to > addContainerRequest() exceeds the heartbeat interval of calls to allocate() > (in AMRMClientImpl's run() method). > If the application master calls addContainerRequest() n times, but with a > unique priority each time, I get n containers (as I intended). > Analysis: > There is a logic problem in AMRMClientImpl.java. > Although AMRMClientImpl.java, allocate() does an ask.clear() , on subsequent > calls to addContainerRequest(), addResourceRequest() finds the previous > matching remoteRequest and increments the container count rather than > starting anew, and does an addResourceRequestToAsk() which defeats the > ask.clear(). > From documentation and code comments, it was hard for me to discern the > intended behavior of the API, but the inconsistency reported in this issue > suggests one case or the other is implemented incorrectly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2932) Add entry for preemption setting to queue status screen and startup/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-2932: - Attachment: YARN-2932.v3.txt Upmerged and uploading new patch (v3). > Add entry for preemption setting to queue status screen and startup/refresh > logging > --- > > Key: YARN-2932 > URL: https://issues.apache.org/jira/browse/YARN-2932 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.7.0 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt > > > YARN-2056 enables the ability to turn preemption on or off on a per-queue > level. This JIRA will provide the preemption status for each queue in the > {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue > refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3027) Scheduler should use totalAvailable resource from node instead of availableResource for maxAllocation
[ https://issues.apache.org/jira/browse/YARN-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273943#comment-14273943 ] Hudson commented on YARN-3027: -- FAILURE: Integrated in Hadoop-trunk-Commit #6843 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6843/]) YARN-3027. Scheduler should use totalAvailable resource from node instead of availableResource for maxAllocation. (adhoot via rkanter) (rkanter: rev ae7bf31fe1c63f323ba5271e50fd0e4425a7510f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java > Scheduler should use totalAvailable resource from node instead of > availableResource for maxAllocation > - > > Key: YARN-3027 > URL: https://issues.apache.org/jira/browse/YARN-3027 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.7.0 > > Attachments: YARN-3027.001.patch, YARN-3027.002.patch > > > YARN-2604 added support for updating maxiumum allocation resource size based > on nodes. But it incorrectly uses available resource instead of maximum > resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3027) Scheduler should use totalAvailable resource from node instead of availableResource for maxAllocation
[ https://issues.apache.org/jira/browse/YARN-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273936#comment-14273936 ] Robert Kanter commented on YARN-3027: - +1 > Scheduler should use totalAvailable resource from node instead of > availableResource for maxAllocation > - > > Key: YARN-3027 > URL: https://issues.apache.org/jira/browse/YARN-3027 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3027.001.patch, YARN-3027.002.patch > > > YARN-2604 added support for updating maxiumum allocation resource size based > on nodes. But it incorrectly uses available resource instead of maximum > resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273919#comment-14273919 ] Varun Saxena commented on YARN-3011: [~djp] / [~vinodkv], kindly review > NM dies because of the failure of resource localization > --- > > Key: YARN-3011 > URL: https://issues.apache.org/jira/browse/YARN-3011 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Wang Hao >Assignee: Varun Saxena > Attachments: YARN-3011.001.patch > > > NM dies because of IllegalArgumentException when localize resource. > 2014-12-29 13:43:58,699 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Downloading public rsrc:{ > hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, > 1416997035456, FILE, null } > 2014-12-29 13:43:58,699 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Downloading public rsrc:{ > hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, > 1419831474153, FILE, null } > 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.IllegalArgumentException: Can not create a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:135) > at org.apache.hadoop.fs.Path.(Path.java:94) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > > at java.lang.Thread.run(Thread.java:745) > 2014-12-29 13:43:58,701 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Initializing user hadoop > 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Exiting, bbye.. > 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting > connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-644: -- Attachment: YARN-644.002.patch > Basic null check is not performed on passed in arguments before using them in > ContainerManagerImpl.startContainer > - > > Key: YARN-644 > URL: https://issues.apache.org/jira/browse/YARN-644 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Varun Saxena >Priority: Minor > Attachments: YARN-644.001.patch, YARN-644.002.patch > > > I see that validation/ null check is not performed on passed in parameters. > Ex. tokenId.getContainerID().getApplicationAttemptId() inside > ContainerManagerImpl.authorizeRequest() > I guess we should add these checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273914#comment-14273914 ] Varun Saxena commented on YARN-2777: [~zjshen], kindly review > Mark the end of individual log in aggregated log > > > Key: YARN-2777 > URL: https://issues.apache.org/jira/browse/YARN-2777 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Varun Saxena > Labels: log-aggregation > Attachments: YARN-2777.001.patch > > > Below is snippet of aggregated log showing hbase master log: > {code} > LogType: hbase-hbase-master-ip-172-31-34-167.log > LogUploadTime: 29-Oct-2014 22:31:55 > LogLength: 24103045 > Log Contents: > Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 > ... > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) > at org.apache.hadoop.hbase.Chore.run(Chore.java:80) > at java.lang.Thread.run(Thread.java:745) > LogType: hbase-hbase-master-ip-172-31-34-167.out > {code} > Since logs from various daemons are aggregated in one log file, it would be > desirable to mark the end of one log before starting with the next. > e.g. with such a line: > {code} > End of LogType: hbase-hbase-master-ip-172-31-34-167.log > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage priority labels
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: 0003-YARN-2693.patch Updating patch with test cases. > Priority Label Manager in RM to manage priority labels > -- > > Key: YARN-2693 > URL: https://issues.apache.org/jira/browse/YARN-2693 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, > 0003-YARN-2693.patch > > > Focus of this JIRA is to have a centralized service to handle priority labels. > Support operations such as > * Add/Delete priority label to a specified queue > * Manage integer mapping associated with each priority label > * Support managing default priority label of a given queue > * ACL support in queue level for priority label > * Expose interface to RM to validate priority label > Storage for this labels will be done in FileSystem and in Memory similar to > NodeLabel > * FileSystem Based : persistent across RM restart > * Memory Based: non-persistent across RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support
[ https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2896: -- Attachment: 0003-YARN-2896.patch Updating patch. Kindly check overall PB support from Server side > Server side PB changes for Priority Label Manager and Admin CLI support > --- > > Key: YARN-2896 > URL: https://issues.apache.org/jira/browse/YARN-2896 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, > 0003-YARN-2896.patch > > > Common changes: > * PB support changes required for Admin APIs > * PB support for File System store (Priority Label Store) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3026: - Attachment: YARN-3026.1.patch Attached ver.1 patch and kick Jenkins. > Move application-specific container allocation logic from LeafQueue to > FiCaSchedulerApp > --- > > Key: YARN-3026 > URL: https://issues.apache.org/jira/browse/YARN-3026 > Project: Hadoop YARN > Issue Type: Task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3026.1.patch > > > Have a discussion with [~vinodkv] and [~jianhe]: > In existing Capacity Scheduler, all allocation logics of and under LeafQueue > are located in LeafQueue.java in implementation. To make a cleaner scope of > LeafQueue, we'd better move some of them to FiCaSchedulerApp. > Ideal scope of LeafQueue should be: when a LeafQueue receives some resources > from ParentQueue (like 15% of cluster resource), and it distributes resources > to children apps, and it should be agnostic to internal logic of children > apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how > application allocating container from given resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3029) FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere
[ https://issues.apache.org/jira/browse/YARN-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273849#comment-14273849 ] Varun Saxena commented on YARN-3029: [~ste...@apache.org], kindly review > FSDownload.unpack() uses local locale for FS case conversion, may not work > everywhere > - > > Key: YARN-3029 > URL: https://issues.apache.org/jira/browse/YARN-3029 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Varun Saxena > Attachments: YARN-3029.001.patch > > > {{FSDownload.unpack()}} lower-cases filenames in the local locale before > looking at extensions for, "tar, "zip", .. > {code} > String lowerDst = dst.getName().toLowerCase(); > {code} > it MUST use LOCALE_EN for the locale, else a file .ZIP won't be recognised as > a zipfile in a turkish locale cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3029) FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere
[ https://issues.apache.org/jira/browse/YARN-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273845#comment-14273845 ] Hadoop QA commented on YARN-3029: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691673/YARN-3029.001.patch against trunk revision 5b0d060. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6309//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6309//console This message is automatically generated. > FSDownload.unpack() uses local locale for FS case conversion, may not work > everywhere > - > > Key: YARN-3029 > URL: https://issues.apache.org/jira/browse/YARN-3029 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Varun Saxena > Attachments: YARN-3029.001.patch > > > {{FSDownload.unpack()}} lower-cases filenames in the local locale before > looking at extensions for, "tar, "zip", .. > {code} > String lowerDst = dst.getName().toLowerCase(); > {code} > it MUST use LOCALE_EN for the locale, else a file .ZIP won't be recognised as > a zipfile in a turkish locale cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number
[ https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273834#comment-14273834 ] Zhijie Shen commented on YARN-3009: --- bq. but I suspect this is better served instead of key=nested_object as path/to/attribute=literal_value (or a composition of them) query. Would you please give an example about "path/to/attribute=literal_value"? Filter are allows to be the nested JSON content, though usually people doesn't use it in this fashion. IAC, we still chose to accept an object to be generalized. For example, the value of primary filter could be a composite: {{primaryFilter=name:\{"first name":"Chris", "last name" ="Wensel"\}.}} Personally, I'm a bit reluctant to modifying the behavior of translating the value to a JSON object, because it may break the compatibility to the existing Timeline API users. On the other side, we have the walk around to force the interpreter to treat the value as a string. > TimelineWebServices always parses primary and secondary filters as numbers if > first char is a number > > > Key: YARN-3009 > URL: https://issues.apache.org/jira/browse/YARN-3009 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Chris K Wensel >Assignee: Naganarasimha G R > Attachments: YARN-3009.20150108-1.patch, YARN-3009.20150111-1.patch > > > If you pass a filter value that starts with a number (7CCA...), the filter > value will be parsed into the Number '7' causing the filter to fail the > search. > Should be noted the actual value as stored via a PUT operation is properly > parsed and stored as a String. > This manifests as a very hard to identify issue with DAGClient in Apache Tez > and naming dags/vertices with alphanumeric guid values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3028) Better syntax for replace label CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273802#comment-14273802 ] Wangda Tan commented on YARN-3028: -- +1 for this proposal, we should do like this at the beginning. Now we may need to support both of them. > Better syntax for replace label CLI > --- > > Key: YARN-3028 > URL: https://issues.apache.org/jira/browse/YARN-3028 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Jian He >Assignee: Rohith > > The command to replace label now is such: > {code} > yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 > node2:port,label1,label2] > {code} > Instead of {code} node1:port,label1,label2 {code} I think it's better to say > {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)