[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357038#comment-14357038 ] Sangjin Lee commented on YARN-3039: --- Thanks [~djp]! I'll take a look at it and add my comments. [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3295) Fix documentation nits found in markdown conversion
[ https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357044#comment-14357044 ] Hudson commented on YARN-3295: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2079 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2079/]) YARN-3295. Fix documentation nits found in markdown conversion. Contributed by Masatake Iwasaki. (ozawa: rev 30c428a858c179645d6dc82b7027f6b7e871b439) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md Fix documentation nits found in markdown conversion --- Key: YARN-3295 URL: https://issues.apache.org/jira/browse/YARN-3295 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3295.001.patch * In ResourceManagerRestart page - Inside the Notes, the _e{epoch}_ , was highlighted before but not now. * yarn container command {noformat} list ApplicationId (should be Application Attempt ID ?) Lists containers for the application attempt. {noformat} * yarn application attempt command {noformat} list ApplicationId Lists applications attempts from the RM (should be Lists applications attempts for the given application) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357172#comment-14357172 ] Tsuyoshi Ozawa commented on YARN-3248: -- [~vvasudev] It would be good to have new test file TestApplicationReportPBImpl under ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb/. TestSerializedExceptionPBImpl.java would be helpful for you to write the test cases. Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: All applications.png, App page.png, Screenshot.jpg, apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, apache-yarn-3248.3.patch It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357010#comment-14357010 ] Hudson commented on YARN-2280: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #129 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/129/]) YARN-2280. Resource manager web service fields are not accessible (Krisztian Horvath via aw) (aw: rev a5cf985bf501fd032124d121dcae80538db9e380) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerTypeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodesInfo.java Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Trivial Fix For: 3.0.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3334) [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2.
Junping Du created YARN-3334: Summary: [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2. Key: YARN-3334 URL: https://issues.apache.org/jira/browse/YARN-3334 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: YARN-2928 Reporter: Junping Du Assignee: Junping Du -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357013#comment-14357013 ] Hudson commented on YARN-3187: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #129 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/129/]) YARN-3187. Documentation of Capacity Scheduler Queue mapping based on user or group. Contributed by Gururaj Shetty (jianhe: rev a380643d2044a4974e379965f65066df2055d003) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md * hadoop-yarn-project/CHANGES.txt Documentation of Capacity Scheduler Queue mapping based on user or group Key: YARN-3187 URL: https://issues.apache.org/jira/browse/YARN-3187 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, documentation Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Gururaj Shetty Labels: documentation Fix For: 2.7.0 Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch, YARN-3187.4.patch YARN-2411 exposes a very useful feature {{support simple user and group mappings to queues}} but its not captured in the documentation. So in this jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357042#comment-14357042 ] Hudson commented on YARN-3187: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2079 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2079/]) YARN-3187. Documentation of Capacity Scheduler Queue mapping based on user or group. Contributed by Gururaj Shetty (jianhe: rev a380643d2044a4974e379965f65066df2055d003) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md Documentation of Capacity Scheduler Queue mapping based on user or group Key: YARN-3187 URL: https://issues.apache.org/jira/browse/YARN-3187 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, documentation Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Gururaj Shetty Labels: documentation Fix For: 2.7.0 Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch, YARN-3187.4.patch YARN-2411 exposes a very useful feature {{support simple user and group mappings to queues}} but its not captured in the documentation. So in this jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357039#comment-14357039 ] Hudson commented on YARN-2280: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2079 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2079/]) YARN-2280. Resource manager web service fields are not accessible (Krisztian Horvath via aw) (aw: rev a5cf985bf501fd032124d121dcae80538db9e380) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerTypeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodesInfo.java Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Trivial Fix For: 3.0.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357520#comment-14357520 ] Vinod Kumar Vavilapalli commented on YARN-3154: --- Looks close. Can you also update the javadoc for existing APIs to say that those APIs only take affect on logs that exist at the time of application finish? Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357536#comment-14357536 ] Vinod Kumar Vavilapalli commented on YARN-3304: --- How about we simplify this and throw an explicit exception when we think it is unavailable and let higher layers handle it appropriately when that happens? IAC, I'd like us to make some progress to unblock 2.7. Tx. ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Karthik Kambatla Priority: Blocker Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357564#comment-14357564 ] Wangda Tan commented on YARN-3298: -- [~nroberts], I think I got your point now. Yes, as you said, if we enforce the limit (used + requred = user-limit), and don't change the user-limit computation, queue cannot over its configured capacity. Originally, this ticket trying to solve the jitter problem when we have the YARN-2069. However, YARN-2069 will only take effect when queue becomes over-satisfied, at that time, CS will not give queue more resources. So the jitter won't happen actually. Jitter will happen when we have YARN-2113 (preemption will happen to balance usage between users when queue doesn't over its capacity), at that time, user-limit enforcement should be done. Basically, I agree with your method, which is {{current_capacity = max(queue.used,queue.capacity)+now_required}}, it can solve the queue cannot over its configured capacity problem, but it seems not necessary at least for now. We can delay this change until YARN-2113 is required. Thoughts? Thanks, Wangda User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357600#comment-14357600 ] Eric Payne commented on YARN-1963: -- Thanks, [~sunilg], for your work on in-queue priorities. Along with [~nroberts], I'm confused about why priority labels are needed. As a user, I just need to know that the higher the number, the higher the priority. Then, I just need a way to see what priority each application is using and a way to set the priority of applications. To me, it just seems like labels will get in the way. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: 0001-YARN-1963-prototype.patch, YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357552#comment-14357552 ] Hadoop QA commented on YARN-3267: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703968/YARN-3267.3.patch against trunk revision 30c428a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: org.apache.hadoop.mapred.TestMRTimelineEventHandling The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6920//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6920//console This message is automatically generated. Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Attachments: YARN-3267.3.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3034: Attachment: YARN-3034-20150312-1.patch Have attached a patch with Basic structure up, Please review [~gtCarrera9] Please check whether package structuring is fine [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2836) RM behaviour on token renewal failures is broken
[ https://issues.apache.org/jira/browse/YARN-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2836: -- Target Version/s: 2.8.0 (was: 2.7.0) This is too late for 2.7. I'll try getting something done in 2.8. Moving it out. RM behaviour on token renewal failures is broken Key: YARN-2836 URL: https://issues.apache.org/jira/browse/YARN-2836 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Found this while reviewing YARN-2834. We now completely ignore token renewal failures. For things like Timeline tokens which are automatically obtained whether the app needs it or not (we should fix this to be user driven), we can ignore failures. But for HDFS Tokens etc, ignoring failures is bad because it (1) wastes resources as AMs will continue and eventually fail (2) app doesn't know what happened it fails eventually. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357579#comment-14357579 ] Varun Saxena commented on YARN-3047: Code related to TimelineClientServiceManager i.e. to handle YARN CLI requests, I will handle as part of another JIRA. Will raise it later. [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.02.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357618#comment-14357618 ] Li Lu commented on YARN-3047: - Hi [~varun_saxena], thanks for updating this patch! I have a quick question that, is the reader supposed to be a separate daemon in the server (there is a main function in TimelineReaderServer)? I think it would be very helpful if you could have a simple write up for you current reader architecture. Thanks! [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.02.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357629#comment-14357629 ] Hadoop QA commented on YARN-3047: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704008/YARN-3047.02.patch against trunk revision 344d7cb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6925//console This message is automatically generated. [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.02.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3154: Attachment: YARN-3154.4.patch Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, YARN-3154.4.patch Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357553#comment-14357553 ] Vinod Kumar Vavilapalli commented on YARN-3021: --- bq. Hi Vinod Kumar Vavilapalli and Harsh J, comments on this approach that Jian described above? Caught up with the discussion. The latest proposal seems like a reasonable approach without adding too much throw-away functionality in YARN. +1 for the approach, let's get this done. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357569#comment-14357569 ] Yongjun Zhang commented on YARN-3021: - Hi [~vinodkv], Thanks for the comments. We do have the consensus about the approach too, I have been caught on other critical stuff. Will try to get to this asap. Thanks. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3335) Job In Error State Will Lost Jobhistory Of Second and Later Attempts
[ https://issues.apache.org/jira/browse/YARN-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3335: --- Summary: Job In Error State Will Lost Jobhistory Of Second and Later Attempts (was: Job In Error State Will Lost Jobhistory For Second and Later Attempts) Job In Error State Will Lost Jobhistory Of Second and Later Attempts Key: YARN-3335 URL: https://issues.apache.org/jira/browse/YARN-3335 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3335.1.patch Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error state. In that situation Job's second or some later attempt could succeed but those later attempts' history file will all be lost. Because the first attempt in error state will copy its history file to intermediate dir while mistakenly think of itself as lastattempt. Jobhistory server will later move the history file of that error attempt from intermediate dir to done dir while ignore all later that job attempt's history file in intermediate dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357241#comment-14357241 ] Sunil G commented on YARN-1963: --- Thank you [~vinodkv] and [~nroberts] for the comments. Considering usability ways, labels will be handy. And scheduler must be agnostic of labels and should handle only integers like in linux. This will have a complexity on priority manager inside RM which will translate label - integer an vice versa. But a call can be taken by seeing all possibilities and can be standardized the same so that a minimal working version can be pushed in by improvising on the patches submitted (working prototype was attached). Hoping [~leftnoteasy] and [~eepayne] to join the discussion. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: 0001-YARN-1963-prototype.patch, YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3047: --- Attachment: YARN-3047.02.patch [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.02.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357625#comment-14357625 ] Varun Saxena commented on YARN-3047: Yes. It is a daemon. Multiple reader instances will come in phase 2(YARN-3118). Sure, will update a write up. [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.02.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357644#comment-14357644 ] Jonathan Eagles commented on YARN-3267: --- [~lichangleo], Couple more minor things with this patch * LeveldbTimelineStore, MemoryTimelineStore, and TimelineReader all have extra UserGroupInformation import * Spacing issues ** 'Check{' should be written as 'Check {' ** 'ugi=callerUGI;' should be written as 'ugi = callerUGI;' ** 'throws IOException{' should be written as 'throws IOException {' * check logic simplification {code} try { if (!timelineACLsManager.checkAccess( ugi, ApplicationAccessType.VIEW_APP, entity)) { return false; } } {code} might be simpler as {code} try { return timelineACLsManager.checkAccess( ugi, ApplicationAccessType.VIEW_APP, entity); } {code} * reduce logging level {code} } catch (YarnException e) { LOG.error(Error when verifying access for user + ugi + on the events of the timeline entity + new EntityIdentifier(entity.getEntityId(), entity.getEntityType()), e); return false; } {code} this might be better suited as info level since any missing domain can trying this scenario. Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Attachments: YARN-3267.3.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357684#comment-14357684 ] Hadoop QA commented on YARN-3243: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703964/YARN-3243.4.patch against trunk revision 30c428a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 43 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/6923//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6923//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6923//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6923//console This message is automatically generated. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
[ https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357300#comment-14357300 ] Ravindra Naik commented on YARN-3324: - Do you think that these steps will be sufficient ? 1. Stop any container that is using that docker image. 2. Delete any container that is using that docker image. 3. Delete the docker image. TestDockerContainerExecutor should clean test docker image from local repository after test is done --- Key: YARN-3324 URL: https://issues.apache.org/jira/browse/YARN-3324 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Chen He Attachments: YARN-3324-branch-2.6.0.002.patch, YARN-3324-trunk.002.patch Current TestDockerContainerExecutor only cleans the temp directory in local file system but leaves the test docker image in local docker repository. It should be cleaned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
[ https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357313#comment-14357313 ] Hadoop QA commented on YARN-3324: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703955/YARN-3324-branch-2.6.0.002.patch against trunk revision 30c428a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6917//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6917//console This message is automatically generated. TestDockerContainerExecutor should clean test docker image from local repository after test is done --- Key: YARN-3324 URL: https://issues.apache.org/jira/browse/YARN-3324 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Chen He Attachments: YARN-3324-branch-2.6.0.002.patch, YARN-3324-trunk.002.patch Current TestDockerContainerExecutor only cleans the temp directory in local file system but leaves the test docker image in local docker repository. It should be cleaned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3204) Fix new findbug warnings in hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
[ https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357353#comment-14357353 ] Brahma Reddy Battula commented on YARN-3204: Kindly review if you find time ..thanks.. Fix new findbug warnings in hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair) -- Key: YARN-3204 URL: https://issues.apache.org/jira/browse/YARN-3204 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Blocker Attachments: YARN-3204-001.patch, YARN-3204-002.patch, YARN-3204-003.patch Please check following findbug report.. https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile
[ https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357402#comment-14357402 ] Hadoop QA commented on YARN-3080: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703979/YARN-3080.patch against trunk revision 30c428a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6921//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6921//console This message is automatically generated. The DockerContainerExecutor could not write the right pid to container pidFile -- Key: YARN-3080 URL: https://issues.apache.org/jira/browse/YARN-3080 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Beckham007 Assignee: Abin Shahab Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, YARN-3080.patch The docker_container_executor_session.sh is like this: {quote} #!/usr/bin/env bash echo `/usr/bin/docker inspect --format {{.State.Pid}} container_1421723685222_0008_01_02` /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp /bin/mv -f /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid /usr/bin/docker run --rm --name container_1421723685222_0008_01_02 -e GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M --cpu-shares=1024 -v /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02 -v /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02 -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh {quote} The DockerContainerExecutor use docker inspect before docker run, so the docker inspect couldn't get the right pid for the docker, signalContainer() and nm restart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1884: Attachment: YARN-1884.4.patch address all the comments ContainerReport should have nodeHttpAddress --- Key: YARN-1884 URL: https://issues.apache.org/jira/browse/YARN-1884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, YARN-1884.4.patch In web UI, we're going to show the node, which used to be to link to the NM web page. However, on AHS web UI, and RM web UI after YARN-1809, the node field has to be set to nodeID where the container is allocated. We need to add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3243: - Attachment: YARN-3243.4.patch Addressed all comments from Jian except: bq. Do you think passing down a QueueHeadRoom compared with QueueMaxLimit may make the code simpler Some places needs available resource and some places needs limit, there should be no much difference in code effort no matter we pass down headroom or limit. Attached new patch (ver.4) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357432#comment-14357432 ] Hadoop QA commented on YARN-1884: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703957/YARN-1884.4.patch against trunk revision 30c428a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestGetGroups org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6918//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6918//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6918//console This message is automatically generated. ContainerReport should have nodeHttpAddress --- Key: YARN-1884 URL: https://issues.apache.org/jira/browse/YARN-1884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, YARN-1884.4.patch In web UI, we're going to show the node, which used to be to link to the NM web page. However, on AHS web UI, and RM web UI after YARN-1809, the node field has to be set to nodeID where the container is allocated. We need to add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357465#comment-14357465 ] Xuan Gong commented on YARN-1884: - Testcase failures are not related ContainerReport should have nodeHttpAddress --- Key: YARN-1884 URL: https://issues.apache.org/jira/browse/YARN-1884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, YARN-1884.4.patch In web UI, we're going to show the node, which used to be to link to the NM web page. However, on AHS web UI, and RM web UI after YARN-1809, the node field has to be set to nodeID where the container is allocated. We need to add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
[ https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Naik updated YARN-3324: Attachment: (was: YARN-3324-trunk.patch) TestDockerContainerExecutor should clean test docker image from local repository after test is done --- Key: YARN-3324 URL: https://issues.apache.org/jira/browse/YARN-3324 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Chen He Attachments: YARN-3324-branch-2.6.0.002.patch, YARN-3324-trunk.002.patch Current TestDockerContainerExecutor only cleans the temp directory in local file system but leaves the test docker image in local docker repository. It should be cleaned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
[ https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Naik updated YARN-3324: Attachment: YARN-3324-branch-2.6.0.002.patch YARN-3324-trunk.002.patch updated patches to consider the case when docker is not installed. TestDockerContainerExecutor should clean test docker image from local repository after test is done --- Key: YARN-3324 URL: https://issues.apache.org/jira/browse/YARN-3324 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Chen He Attachments: YARN-3324-branch-2.6.0.002.patch, YARN-3324-branch-2.6.0.patch, YARN-3324-trunk.002.patch, YARN-3324-trunk.patch Current TestDockerContainerExecutor only cleans the temp directory in local file system but leaves the test docker image in local docker repository. It should be cleaned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357292#comment-14357292 ] Chang Li commented on YARN-3267: [~jeagles] Thanks for review. Updated my patch according to your suggestions. Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Attachments: YARN-3267.3.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile
[ https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-3080: -- Attachment: YARN-3080.patch Removed gitignore The DockerContainerExecutor could not write the right pid to container pidFile -- Key: YARN-3080 URL: https://issues.apache.org/jira/browse/YARN-3080 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Beckham007 Assignee: Abin Shahab Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, YARN-3080.patch The docker_container_executor_session.sh is like this: {quote} #!/usr/bin/env bash echo `/usr/bin/docker inspect --format {{.State.Pid}} container_1421723685222_0008_01_02` /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp /bin/mv -f /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid /usr/bin/docker run --rm --name container_1421723685222_0008_01_02 -e GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M --cpu-shares=1024 -v /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02 -v /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02 -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh {quote} The DockerContainerExecutor use docker inspect before docker run, so the docker inspect couldn't get the right pid for the docker, signalContainer() and nm restart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3335) Job In Error State Will Lost Jobhistory For Second and Later Attempts
[ https://issues.apache.org/jira/browse/YARN-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3335: --- Attachment: YARN-3335.1.patch Job In Error State Will Lost Jobhistory For Second and Later Attempts - Key: YARN-3335 URL: https://issues.apache.org/jira/browse/YARN-3335 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3335.1.patch Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error state. In that situation Job's second or some later attempt could succeed but those later attempts' history file will all be lost. Because the first attempt in error state will copy its history file to intermediate dir while mistakenly think of itself as lastattempt. Jobhistory server will later move the history file of that error attempt from intermediate dir to done dir while ignore all later that job attempt's history file in intermediate dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
[ https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357249#comment-14357249 ] Chen He commented on YARN-3324: --- Hi [~ravindra.naik], thank you for the patch. Actually, just add a docker rmi testImage can not guarantee deletion of the testImage. TestDockerContainerExecutor should clean test docker image from local repository after test is done --- Key: YARN-3324 URL: https://issues.apache.org/jira/browse/YARN-3324 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Chen He Attachments: YARN-3324-branch-2.6.0.002.patch, YARN-3324-trunk.002.patch Current TestDockerContainerExecutor only cleans the temp directory in local file system but leaves the test docker image in local docker repository. It should be cleaned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
[ https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Naik updated YARN-3324: Attachment: (was: YARN-3324-branch-2.6.0.patch) TestDockerContainerExecutor should clean test docker image from local repository after test is done --- Key: YARN-3324 URL: https://issues.apache.org/jira/browse/YARN-3324 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Chen He Attachments: YARN-3324-branch-2.6.0.002.patch, YARN-3324-trunk.002.patch Current TestDockerContainerExecutor only cleans the temp directory in local file system but leaves the test docker image in local docker repository. It should be cleaned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3335) Job In Error State Will Lost Jobhistory For Second and Later Attempts
[ https://issues.apache.org/jira/browse/YARN-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357464#comment-14357464 ] Chang Li commented on YARN-3335: Another plausible solution is to always let ERROR state to retry. Job In Error State Will Lost Jobhistory For Second and Later Attempts - Key: YARN-3335 URL: https://issues.apache.org/jira/browse/YARN-3335 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3335.1.patch Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error state. In that situation Job's second or some later attempt could succeed but those later attempts' history file will all be lost. Because the first attempt in error state will copy its history file to intermediate dir while mistakenly think of itself as lastattempt. Jobhistory server will later move the history file of that error attempt from intermediate dir to done dir while ignore all later that job attempt's history file in intermediate dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3267: --- Attachment: YARN-3267.3.patch Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Attachments: YARN-3267.3.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2828) Enable auto refresh of web pages (using http parameter)
[ https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay Bhat updated YARN-2828: - Attachment: (was: HADOOP-9329.004.patch) Enable auto refresh of web pages (using http parameter) --- Key: YARN-2828 URL: https://issues.apache.org/jira/browse/YARN-2828 Project: Hadoop YARN Issue Type: Improvement Reporter: Tim Robertson Assignee: Vijay Bhat Priority: Minor Attachments: YARN-2828.001.patch, YARN-2828.002.patch, YARN-2828.003.patch, YARN-2828.004.patch The MR1 Job Tracker had a useful HTTP parameter of e.g. refresh=3 that could be appended to URLs which enabled a page reload. This was very useful when developing mapreduce jobs, especially to watch counters changing. This is lost in the the Yarn interface. Could be implemented as a page element (e.g. drop down or so), but I'd recommend that the page not be more cluttered, and simply bring back the optional refresh HTTP param. It worked really nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2828) Enable auto refresh of web pages (using http parameter)
[ https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay Bhat updated YARN-2828: - Attachment: YARN-2828.004.patch Enable auto refresh of web pages (using http parameter) --- Key: YARN-2828 URL: https://issues.apache.org/jira/browse/YARN-2828 Project: Hadoop YARN Issue Type: Improvement Reporter: Tim Robertson Assignee: Vijay Bhat Priority: Minor Attachments: YARN-2828.001.patch, YARN-2828.002.patch, YARN-2828.003.patch, YARN-2828.004.patch The MR1 Job Tracker had a useful HTTP parameter of e.g. refresh=3 that could be appended to URLs which enabled a page reload. This was very useful when developing mapreduce jobs, especially to watch counters changing. This is lost in the the Yarn interface. Could be implemented as a page element (e.g. drop down or so), but I'd recommend that the page not be more cluttered, and simply bring back the optional refresh HTTP param. It worked really nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
zhihai xu created YARN-3336: --- Summary: FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357928#comment-14357928 ] Naganarasimha G R commented on YARN-2495: - bq. It's just that I don't like this areNodeLabelsSetInReq flag in the protocol. Are there other ways of achieving this? Other way out is always send the set of labels as part of every heartbeat. We wanted to avoid this traffic, hence initially we came up with this approach when we were supporting multiple labels for a node (may be in future we might support multiple labels again right ?) {{I think treating invalid labels as a disaster case will be, well, a disaster.}} : liked the sentence :) bq. How about we let the node run (just like we let an unhealthy node run) and report it in the diagnostics? I'm okay keeping that same behavior during registration too. Yes this is similar to the earlier behavior we had in the Dec's patch , Additionally to inform back the failure, we had added one flag to inform NM whether RM accepted the labels or not and Diag Message was also set with appropriate message. Whether this approach is fine ? Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357724#comment-14357724 ] Hadoop QA commented on YARN-3034: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704029/YARN-3034-20150312-1.patch against trunk revision 7a346bc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6927//console This message is automatically generated. [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3336: Attachment: YARN-3336.000.patch FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2828) Enable auto refresh of web pages (using http parameter)
[ https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357886#comment-14357886 ] Hadoop QA commented on YARN-2828: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704041/YARN-2828.004.patch against trunk revision 7a346bc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.security.token.delegation.web.TestWebDelegationToken org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6930//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6930//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6930//console This message is automatically generated. Enable auto refresh of web pages (using http parameter) --- Key: YARN-2828 URL: https://issues.apache.org/jira/browse/YARN-2828 Project: Hadoop YARN Issue Type: Improvement Reporter: Tim Robertson Assignee: Vijay Bhat Priority: Minor Attachments: YARN-2828.001.patch, YARN-2828.002.patch, YARN-2828.003.patch, YARN-2828.004.patch The MR1 Job Tracker had a useful HTTP parameter of e.g. refresh=3 that could be appended to URLs which enabled a page reload. This was very useful when developing mapreduce jobs, especially to watch counters changing. This is lost in the the Yarn interface. Could be implemented as a page element (e.g. drop down or so), but I'd recommend that the page not be more cluttered, and simply bring back the optional refresh HTTP param. It worked really nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357736#comment-14357736 ] Wangda Tan commented on YARN-3243: -- Javadoc warning are mis-reported by Jenkins, findbugs warnings are tracked by YARN-3204, test failures can pass locally. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357762#comment-14357762 ] Hadoop QA commented on YARN-1884: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703957/YARN-1884.4.patch against trunk revision 344d7cb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6926//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6926//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6926//console This message is automatically generated. ContainerReport should have nodeHttpAddress --- Key: YARN-1884 URL: https://issues.apache.org/jira/browse/YARN-1884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch, YARN-1884.4.patch In web UI, we're going to show the node, which used to be to link to the NM web page. However, on AHS web UI, and RM web UI after YARN-1809, the node field has to be set to nodeID where the container is allocated. We need to add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2792) Have a public Test-only API for creating important records that ecosystem projects can depend on
[ https://issues.apache.org/jira/browse/YARN-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2792: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1953 Have a public Test-only API for creating important records that ecosystem projects can depend on Key: YARN-2792 URL: https://issues.apache.org/jira/browse/YARN-2792 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Priority: Blocker From YARN-2789, {quote} Sigh. Even though this is a private API, it will be used by downstream projects for testing. It'll be useful for this to be re-instated, maybe with a deprecated annotation, so that older versions of downstream projects can build against Hadoop 2.6. I am inclined to have a separate test-only public util API that keeps compatibility for tests. Rather than opening unwanted APIs up. I'll file a separate ticket for this, we need all YARN apps/frameworks to move to that API instead of these private unstable APIs. For now, I am okay keeping a private compat for the APIs changed in YARN-2698. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357846#comment-14357846 ] Jian He commented on YARN-3243: --- - getResourceLimitsOfChild: code comments should match the variable name - CapacityScheduler, why following code is moved ? {code} // update this node to node label manager if (labelManager != null) { labelManager.activateNode(nodeManager.getNodeID(), nodeManager.getTotalCapability()); } {code} - needToUnreserve is removed, comments is invalid any more {code} // we got here by possibly ignoring parent queue capacity limits. If // the parameter needToUnreserve i {code} - the checkReservedContainers flag in canAssignToThisQueue is not needed, instead check whether resourceCouldBeUnreserved is none or not CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357852#comment-14357852 ] Vinod Kumar Vavilapalli commented on YARN-2495: --- bq. Well as craig informed, RegisterNodeManagerRequestProto.nodeLabels is already a set but as by default empty set is provided by protoc, its req to inform whether labels are set as part of request hence areNodeLabelsSetInReq is required. It's just that I don't like this _areNodeLabelsSetInReq_ flag in the protocol. Are there other ways of achieving this? bq. Well i am little confused here, As per wangda's earlier comment i understand that it was your comment to send shutdown (which i felt correct in terms of maintenance) I think treating invalid labels as a disaster case will be, well, a disaster. How about we let the node run (just like we let an unhealthy node run) and report it in the diagnostics? I'm okay keeping that same behavior during registration too. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357900#comment-14357900 ] zhihai xu commented on YARN-3336: - I upload a patch for this issue. The fix is to call FileSystem.get(getConfig()) outside of the proxyUser.doAs. So FileSystem.get(getConfig()) is at the current RM user context, it will return a FileSystem from the FileSystem.CACHE instead of creating a new FileSystem. Since the patch is straightforward and a very small change, I think we don't need a test case for this patch. FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2828) Enable auto refresh of web pages (using http parameter)
[ https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay Bhat updated YARN-2828: - Attachment: HADOOP-9329.004.patch Enable auto refresh of web pages (using http parameter) --- Key: YARN-2828 URL: https://issues.apache.org/jira/browse/YARN-2828 Project: Hadoop YARN Issue Type: Improvement Reporter: Tim Robertson Assignee: Vijay Bhat Priority: Minor Attachments: HADOOP-9329.004.patch, YARN-2828.001.patch, YARN-2828.002.patch, YARN-2828.003.patch The MR1 Job Tracker had a useful HTTP parameter of e.g. refresh=3 that could be appended to URLs which enabled a page reload. This was very useful when developing mapreduce jobs, especially to watch counters changing. This is lost in the the Yarn interface. Could be implemented as a page element (e.g. drop down or so), but I'd recommend that the page not be more cluttered, and simply bring back the optional refresh HTTP param. It worked really nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357706#comment-14357706 ] Hadoop QA commented on YARN-3154: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704007/YARN-3154.4.patch against trunk revision 344d7cb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6924//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6924//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6924//console This message is automatically generated. Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, YARN-3154.4.patch Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1334) YARN should give more info on errors when running failed distributed shell command
[ https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357802#comment-14357802 ] Hadoop QA commented on YARN-1334: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609555/YARN-1334.1.patch against trunk revision 7a346bc. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6931//console This message is automatically generated. YARN should give more info on errors when running failed distributed shell command -- Key: YARN-1334 URL: https://issues.apache.org/jira/browse/YARN-1334 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Affects Versions: 2.3.0 Reporter: Tassapol Athiapinya Assignee: Xuan Gong Attachments: YARN-1334.1.patch Run incorrect command such as: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distributedshell jar -shell_command ./test1.sh -shell_script ./ would show shell exit code exception with no useful message. It should print out sysout/syserr of containers/AM of why it is failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3337) Provide YARN chaos monkey
Steve Loughran created YARN-3337: Summary: Provide YARN chaos monkey Key: YARN-3337 URL: https://issues.apache.org/jira/browse/YARN-3337 Project: Hadoop YARN Issue Type: New Feature Components: test Affects Versions: 2.7.0 Reporter: Steve Loughran To test failure resilience today you either need custom scripts or implement Chaos Monkey-like logic in your application (SLIDER-202). Killing AMs and containers on a schedule probability is the core activity here, one that could be handled by a CLI App/client lib that does this. # entry point to have a startup delay before acting # frequency of chaos wakeup/polling # probability to AM failure generation (0-100) # probability of non-AM container kill # future: other operations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357922#comment-14357922 ] Hadoop QA commented on YARN-2890: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704036/YARN-2890.patch against trunk revision 7a346bc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.mapred.TestSequenceFileAsBinaryOutputFormat org.apache.hadoop.mapred.TestResourceMgrDelegate org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.mapred.TestMRIntermediateDataEncryption org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6929//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6929//console This message is automatically generated. MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2854: Attachment: YARN-2854.20150311-1.patch The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, YARN-2854.20150311-1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356462#comment-14356462 ] Hadoop QA commented on YARN-2854: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703860/YARN-2854.20150311-1.patch against trunk revision 30c428a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6914//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6914//console This message is automatically generated. The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, YARN-2854.20150311-1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1876) Document the REST APIs of timeline and generic history services
[ https://issues.apache.org/jira/browse/YARN-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356479#comment-14356479 ] Gururaj Shetty commented on YARN-1876: -- Hi [~zjshen], I can convert this content to Markdown and append with YARN-2854. Kindly let me know if I can handle. Document the REST APIs of timeline and generic history services --- Key: YARN-1876 URL: https://issues.apache.org/jira/browse/YARN-1876 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: documentaion Attachments: YARN-1876.1.patch, YARN-1876.2.patch, YARN-1876.3.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3295) Fix documentation nits found in markdown conversion
[ https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356373#comment-14356373 ] Hudson commented on YARN-3295: -- FAILURE: Integrated in Hadoop-trunk-Commit #7303 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7303/]) YARN-3295. Fix documentation nits found in markdown conversion. Contributed by Masatake Iwasaki. (ozawa: rev 30c428a858c179645d6dc82b7027f6b7e871b439) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md Fix documentation nits found in markdown conversion --- Key: YARN-3295 URL: https://issues.apache.org/jira/browse/YARN-3295 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3295.001.patch * In ResourceManagerRestart page - Inside the Notes, the _e{epoch}_ , was highlighted before but not now. * yarn container command {noformat} list ApplicationId (should be Application Attempt ID ?) Lists containers for the application attempt. {noformat} * yarn application attempt command {noformat} list ApplicationId Lists applications attempts from the RM (should be Lists applications attempts for the given application) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3248: Attachment: apache-yarn-3248.3.patch Uploaded a new patch which applies after YARN-1809 Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: All applications.png, App page.png, Screenshot.jpg, apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, apache-yarn-3248.3.patch It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-3329) There's no way to rebuild containers Managed by NMClientAsync If AM restart
[ https://issues.apache.org/jira/browse/YARN-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reopened YARN-3329: - There's no way to rebuild containers Managed by NMClientAsync If AM restart --- Key: YARN-3329 URL: https://issues.apache.org/jira/browse/YARN-3329 Project: Hadoop YARN Issue Type: Bug Components: api, applications, client Affects Versions: 2.6.0 Reporter: sandflee If work preserving is enabled and AM restart, AM could't stop containers or query container status launched by pre-am, because there's no corresponding container in NMClientAsync.containers. And there‘s no way to rebuild NMClientAsync.containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3329) There's no way to rebuild containers Managed by NMClientAsync If AM restart
[ https://issues.apache.org/jira/browse/YARN-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K resolved YARN-3329. - Resolution: Duplicate Release Note: (was: the same to YARN-3328, sorry for creating twice) There's no way to rebuild containers Managed by NMClientAsync If AM restart --- Key: YARN-3329 URL: https://issues.apache.org/jira/browse/YARN-3329 Project: Hadoop YARN Issue Type: Bug Components: api, applications, client Affects Versions: 2.6.0 Reporter: sandflee If work preserving is enabled and AM restart, AM could't stop containers or query container status launched by pre-am, because there's no corresponding container in NMClientAsync.containers. And there‘s no way to rebuild NMClientAsync.containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356394#comment-14356394 ] Gururaj Shetty commented on YARN-3187: -- Thanks [~jianhe] and [~Naganarasimha Garla] for committing and reviewing the patch. Documentation of Capacity Scheduler Queue mapping based on user or group Key: YARN-3187 URL: https://issues.apache.org/jira/browse/YARN-3187 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, documentation Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Gururaj Shetty Labels: documentation Fix For: 2.7.0 Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch, YARN-3187.4.patch YARN-2411 exposes a very useful feature {{support simple user and group mappings to queues}} but its not captured in the documentation. So in this jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3328) There's no way to rebuild containers Managed by NMClientAsync If AM restart
[ https://issues.apache.org/jira/browse/YARN-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356384#comment-14356384 ] sandflee commented on YARN-3328: Is there any necessary to keep containers info in NMClientAsync? YARN-3327 also caused by this. There's no way to rebuild containers Managed by NMClientAsync If AM restart --- Key: YARN-3328 URL: https://issues.apache.org/jira/browse/YARN-3328 Project: Hadoop YARN Issue Type: Bug Components: api, applications, client Affects Versions: 2.6.0 Reporter: sandflee If work preserving is enabled and AM restart, AM could't stop containers launched by pre-am, because there's no corresponding container in NMClientAsync.containers. There‘s no way to rebuild NMClientAsync.containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356668#comment-14356668 ] Hadoop QA commented on YARN-3248: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703863/apache-yarn-3248.3.patch against trunk revision 30c428a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6915//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6915//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6915//console This message is automatically generated. Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: All applications.png, App page.png, Screenshot.jpg, apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, apache-yarn-3248.3.patch It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
[ https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Naik updated YARN-3324: Target Version/s: 2.6.0 (was: trunk-win, 2.6.0) Affects Version/s: (was: trunk-win) TestDockerContainerExecutor should clean test docker image from local repository after test is done --- Key: YARN-3324 URL: https://issues.apache.org/jira/browse/YARN-3324 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Chen He Attachments: YARN-3324-branch-2.6.0.patch, YARN-3324-trunk.patch Current TestDockerContainerExecutor only cleans the temp directory in local file system but leaves the test docker image in local docker repository. It should be cleaned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
[ https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356709#comment-14356709 ] Hadoop QA commented on YARN-3324: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703871/YARN-3324-trunk.patch against trunk revision 30c428a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestDockerContainerExecutor org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6916//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6916//console This message is automatically generated. TestDockerContainerExecutor should clean test docker image from local repository after test is done --- Key: YARN-3324 URL: https://issues.apache.org/jira/browse/YARN-3324 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Chen He Attachments: YARN-3324-branch-2.6.0.patch, YARN-3324-trunk.patch Current TestDockerContainerExecutor only cleans the temp directory in local file system but leaves the test docker image in local docker repository. It should be cleaned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
[ https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Naik updated YARN-3324: Attachment: YARN-3324-trunk.patch YARN-3324-branch-2.6.0.patch TestDockerContainerExecutor should clean test docker image from local repository after test is done --- Key: YARN-3324 URL: https://issues.apache.org/jira/browse/YARN-3324 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: trunk-win, 2.6.0 Reporter: Chen He Attachments: YARN-3324-branch-2.6.0.patch, YARN-3324-trunk.patch Current TestDockerContainerExecutor only cleans the temp directory in local file system but leaves the test docker image in local docker repository. It should be cleaned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356857#comment-14356857 ] Varun Vasudev commented on YARN-3248: - Thanks for the review [~ozawa]. Can you help me out - when you say add a test about ApplicationReportPBImpl, can you let me know where to add the tests? I've modified/added tests in ClientRMService but I'm not sure where tests for *PBImpl go. Thanks! Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: All applications.png, App page.png, Screenshot.jpg, apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, apache-yarn-3248.3.patch It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356858#comment-14356858 ] Varun Vasudev commented on YARN-3248: - Sorry that should have been TestClientRMService, and not ClientRMService. Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: All applications.png, App page.png, Screenshot.jpg, apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, apache-yarn-3248.3.patch It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356831#comment-14356831 ] Tsuyoshi Ozawa commented on YARN-3248: -- [~vvasudev], thank you for updating. Could you add a test about ApplicationReportPBImpl since sometimes we struggle NPEs by *PBImpl? Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: All applications.png, App page.png, Screenshot.jpg, apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, apache-yarn-3248.3.patch It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356758#comment-14356758 ] Hudson commented on YARN-2280: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #129 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/129/]) YARN-2280. Resource manager web service fields are not accessible (Krisztian Horvath via aw) (aw: rev a5cf985bf501fd032124d121dcae80538db9e380) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerTypeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodesInfo.java * hadoop-yarn-project/CHANGES.txt Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Trivial Fix For: 3.0.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3295) Fix documentation nits found in markdown conversion
[ https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356763#comment-14356763 ] Hudson commented on YARN-3295: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #129 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/129/]) YARN-3295. Fix documentation nits found in markdown conversion. Contributed by Masatake Iwasaki. (ozawa: rev 30c428a858c179645d6dc82b7027f6b7e871b439) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md Fix documentation nits found in markdown conversion --- Key: YARN-3295 URL: https://issues.apache.org/jira/browse/YARN-3295 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3295.001.patch * In ResourceManagerRestart page - Inside the Notes, the _e{epoch}_ , was highlighted before but not now. * yarn container command {noformat} list ApplicationId (should be Application Attempt ID ?) Lists containers for the application attempt. {noformat} * yarn application attempt command {noformat} list ApplicationId Lists applications attempts from the RM (should be Lists applications attempts for the given application) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356761#comment-14356761 ] Hudson commented on YARN-3187: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #129 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/129/]) YARN-3187. Documentation of Capacity Scheduler Queue mapping based on user or group. Contributed by Gururaj Shetty (jianhe: rev a380643d2044a4974e379965f65066df2055d003) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md * hadoop-yarn-project/CHANGES.txt Documentation of Capacity Scheduler Queue mapping based on user or group Key: YARN-3187 URL: https://issues.apache.org/jira/browse/YARN-3187 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, documentation Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Gururaj Shetty Labels: documentation Fix For: 2.7.0 Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch, YARN-3187.4.patch YARN-2411 exposes a very useful feature {{support simple user and group mappings to queues}} but its not captured in the documentation. So in this jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356766#comment-14356766 ] Hudson commented on YARN-2280: -- FAILURE: Integrated in Hadoop-Yarn-trunk #863 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/863/]) YARN-2280. Resource manager web service fields are not accessible (Krisztian Horvath via aw) (aw: rev a5cf985bf501fd032124d121dcae80538db9e380) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerTypeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodesInfo.java Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Trivial Fix For: 3.0.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356774#comment-14356774 ] Junping Du commented on YARN-3225: -- bq. Here what would happen to the decommissioning node if the RMAdmin issued refreshNodeGracefully() and gets terminated(exited) before issuing the 'refreshNode forcefully'? This can be done by doing Ctrl+C on the command prompt. The Node will be in decommissioning state forever and becomes unusable for new containers allocation. From v3 version of proposal in umbrella JIRA, If CLI get interrupted, then it won’t keep track of timeout to forcefully decommissioned left nodes. However, nodes in “DECOMMISSIONING” will still get terminated later (after running apps get finished) except admin call recommission CLI on these nodes explicitly. The node in decommissioning state are terminated for 2 trigger events, one is timeout and the other is all application (on that node) get finished (will be covered in YARN-3212). We can document what's that means if user do Ctrl +C to gracefully decommissioning CLI. If application (like LRS) never ends in this situation, then user need to refresh these node forcefully (or gracefully with timeout but not interrupt). Make sense? New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-3225.patch, YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356769#comment-14356769 ] Hudson commented on YARN-3187: -- FAILURE: Integrated in Hadoop-Yarn-trunk #863 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/863/]) YARN-3187. Documentation of Capacity Scheduler Queue mapping based on user or group. Contributed by Gururaj Shetty (jianhe: rev a380643d2044a4974e379965f65066df2055d003) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md Documentation of Capacity Scheduler Queue mapping based on user or group Key: YARN-3187 URL: https://issues.apache.org/jira/browse/YARN-3187 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, documentation Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Gururaj Shetty Labels: documentation Fix For: 2.7.0 Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch, YARN-3187.4.patch YARN-2411 exposes a very useful feature {{support simple user and group mappings to queues}} but its not captured in the documentation. So in this jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3295) Fix documentation nits found in markdown conversion
[ https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356771#comment-14356771 ] Hudson commented on YARN-3295: -- FAILURE: Integrated in Hadoop-Yarn-trunk #863 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/863/]) YARN-3295. Fix documentation nits found in markdown conversion. Contributed by Masatake Iwasaki. (ozawa: rev 30c428a858c179645d6dc82b7027f6b7e871b439) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md Fix documentation nits found in markdown conversion --- Key: YARN-3295 URL: https://issues.apache.org/jira/browse/YARN-3295 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3295.001.patch * In ResourceManagerRestart page - Inside the Notes, the _e{epoch}_ , was highlighted before but not now. * yarn container command {noformat} list ApplicationId (should be Application Attempt ID ?) Lists containers for the application attempt. {noformat} * yarn application attempt command {noformat} list ApplicationId Lists applications attempts from the RM (should be Lists applications attempts for the given application) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3334) [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358072#comment-14358072 ] Naganarasimha G R commented on YARN-3334: - Is the scope of this jira different from YARN-3045, If same i was planning to work on this part ... once yarn-3039 is up [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2. - Key: YARN-3334 URL: https://issues.apache.org/jira/browse/YARN-3334 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: YARN-2928 Reporter: Junping Du Assignee: Junping Du -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2890: Attachment: YARN-2890.patch Attaching the updated patch MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2828) Enable auto refresh of web pages (using http parameter)
[ https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357716#comment-14357716 ] Hadoop QA commented on YARN-2828: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704032/HADOOP-9329.004.patch against trunk revision 7a346bc. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6928//console This message is automatically generated. Enable auto refresh of web pages (using http parameter) --- Key: YARN-2828 URL: https://issues.apache.org/jira/browse/YARN-2828 Project: Hadoop YARN Issue Type: Improvement Reporter: Tim Robertson Assignee: Vijay Bhat Priority: Minor Attachments: HADOOP-9329.004.patch, YARN-2828.001.patch, YARN-2828.002.patch, YARN-2828.003.patch The MR1 Job Tracker had a useful HTTP parameter of e.g. refresh=3 that could be appended to URLs which enabled a page reload. This was very useful when developing mapreduce jobs, especially to watch counters changing. This is lost in the the Yarn interface. Could be implemented as a page element (e.g. drop down or so), but I'd recommend that the page not be more cluttered, and simply bring back the optional refresh HTTP param. It worked really nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1551) Allow user-specified reason for killApplication
[ https://issues.apache.org/jira/browse/YARN-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-1551: - Target Version/s: 2.8.0 (was: 2.4.0) Allow user-specified reason for killApplication --- Key: YARN-1551 URL: https://issues.apache.org/jira/browse/YARN-1551 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1551.v01.patch, YARN-1551.v02.patch, YARN-1551.v03.patch, YARN-1551.v04.patch, YARN-1551.v05.patch, YARN-1551.v06.patch, YARN-1551.v06.patch This completes MAPREDUCE-5648 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3337) Provide YARN chaos monkey
[ https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357917#comment-14357917 ] Steve Loughran commented on YARN-3337: -- the slider chaos monkey is pretty sophisticated as it is configured/deployed from within the AM itself, can trigger container failure and AM failure itself. I'm proposing something more minimal here, a CLI tool/class that can look up an app by ID or (user, type, name) and repeatedly sleep, decide whether or not to act, and act. Initial actions: kill the AM container, kill other containers. I don't think the current client API lets me do this, as the tool would need # ability to kill a specific container of an application, ideally forcing exit code # ability to identify which container the AM is currently running in (so as not to kill it in a worker container kill operation, but do kill it in an AM kill operation) In SLIDER-202 we are doing this in-AM, which has the data operations; this is a visible change to the code which could easily be handled in a re-usable lib/CLI tool for better YARN app testing Provide YARN chaos monkey - Key: YARN-3337 URL: https://issues.apache.org/jira/browse/YARN-3337 Project: Hadoop YARN Issue Type: New Feature Components: test Affects Versions: 2.7.0 Reporter: Steve Loughran To test failure resilience today you either need custom scripts or implement Chaos Monkey-like logic in your application (SLIDER-202). Killing AMs and containers on a schedule probability is the core activity here, one that could be handled by a CLI App/client lib that does this. # entry point to have a startup delay before acting # frequency of chaos wakeup/polling # probability to AM failure generation (0-100) # probability of non-AM container kill # future: other operations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3336: Description: FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. was: FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal
[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356943#comment-14356943 ] Hudson commented on YARN-2280: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2061 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2061/]) YARN-2280. Resource manager web service fields are not accessible (Krisztian Horvath via aw) (aw: rev a5cf985bf501fd032124d121dcae80538db9e380) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerTypeInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodesInfo.java Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Trivial Fix For: 3.0.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3295) Fix documentation nits found in markdown conversion
[ https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356948#comment-14356948 ] Hudson commented on YARN-3295: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2061 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2061/]) YARN-3295. Fix documentation nits found in markdown conversion. Contributed by Masatake Iwasaki. (ozawa: rev 30c428a858c179645d6dc82b7027f6b7e871b439) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md Fix documentation nits found in markdown conversion --- Key: YARN-3295 URL: https://issues.apache.org/jira/browse/YARN-3295 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3295.001.patch * In ResourceManagerRestart page - Inside the Notes, the _e{epoch}_ , was highlighted before but not now. * yarn container command {noformat} list ApplicationId (should be Application Attempt ID ?) Lists containers for the application attempt. {noformat} * yarn application attempt command {noformat} list ApplicationId Lists applications attempts from the RM (should be Lists applications attempts for the given application) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356946#comment-14356946 ] Hudson commented on YARN-3187: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2061 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2061/]) YARN-3187. Documentation of Capacity Scheduler Queue mapping based on user or group. Contributed by Gururaj Shetty (jianhe: rev a380643d2044a4974e379965f65066df2055d003) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md * hadoop-yarn-project/CHANGES.txt Documentation of Capacity Scheduler Queue mapping based on user or group Key: YARN-3187 URL: https://issues.apache.org/jira/browse/YARN-3187 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, documentation Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Gururaj Shetty Labels: documentation Fix For: 2.7.0 Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch, YARN-3187.4.patch YARN-2411 exposes a very useful feature {{support simple user and group mappings to queues}} but its not captured in the documentation. So in this jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356966#comment-14356966 ] Hudson commented on YARN-3187: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #120 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/120/]) YARN-3187. Documentation of Capacity Scheduler Queue mapping based on user or group. Contributed by Gururaj Shetty (jianhe: rev a380643d2044a4974e379965f65066df2055d003) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md Documentation of Capacity Scheduler Queue mapping based on user or group Key: YARN-3187 URL: https://issues.apache.org/jira/browse/YARN-3187 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, documentation Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Gururaj Shetty Labels: documentation Fix For: 2.7.0 Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch, YARN-3187.4.patch YARN-2411 exposes a very useful feature {{support simple user and group mappings to queues}} but its not captured in the documentation. So in this jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3295) Fix documentation nits found in markdown conversion
[ https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356968#comment-14356968 ] Hudson commented on YARN-3295: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #120 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/120/]) YARN-3295. Fix documentation nits found in markdown conversion. Contributed by Masatake Iwasaki. (ozawa: rev 30c428a858c179645d6dc82b7027f6b7e871b439) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md Fix documentation nits found in markdown conversion --- Key: YARN-3295 URL: https://issues.apache.org/jira/browse/YARN-3295 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3295.001.patch * In ResourceManagerRestart page - Inside the Notes, the _e{epoch}_ , was highlighted before but not now. * yarn container command {noformat} list ApplicationId (should be Application Attempt ID ?) Lists containers for the application attempt. {noformat} * yarn application attempt command {noformat} list ApplicationId Lists applications attempts from the RM (should be Lists applications attempts for the given application) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356963#comment-14356963 ] Hudson commented on YARN-2280: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #120 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/120/]) YARN-2280. Resource manager web service fields are not accessible (Krisztian Horvath via aw) (aw: rev a5cf985bf501fd032124d121dcae80538db9e380) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerTypeInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodesInfo.java Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Trivial Fix For: 3.0.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356418#comment-14356418 ] Karthik Kambatla commented on YARN-3332: bq. the machine level big picture is fragmented between YARN and HDFS (and HBase etc) What constitutes the machine level big picture? Isn't this just the overall node's resource usage? YARN, at least as of today, doesn't need to know about the usage stats of HDFS or HBase. I have nothing against going the server route, except the additional daemon one might end up having to run. bq. I anyways needed a service to expose an API for both admins/users as well as external systems beyond HDFS too - I can imagine tools being built on top of this. It is not as clear to me. Let us say an admin and a user want usage stats about their YARN containers. The service can only provide the usage stats, while YARN will be able to provide other container metadata. Also, we should consider privacy of usage information. Will auth against this new service be additional overhead? bq. That said, it doesn't need to be service or library. I can think of a library that wires into the exposed API, though I haven't found uses for that yet. Sorry, didn't get that. Can you clarify/ elaborate? [Umbrella] Unified Resource Statistics Collection per node -- Key: YARN-3332 URL: https://issues.apache.org/jira/browse/YARN-3332 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: Design - UnifiedResourceStatisticsCollection.pdf Today in YARN, NodeManager collects statistics like per container resource usage and overall physical resources available on the machine. Currently this is used internally in YARN by the NodeManager for only a limited usage: automatically determining the capacity of resources on node and enforcing memory usage to what is reserved per container. This proposal is to extend the existing architecture and collect statistics for usage beyond the existing usecases. Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356431#comment-14356431 ] Naganarasimha G R commented on YARN-2854: - Hi [~zjshen], Thanks for the reviewing the patch. I have attached a new patch with your comments fixed. please reveiw The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, YARN-2854.20150311-1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3338) Exclude jline dependency from YARN
[ https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3338: -- Attachment: YARN-3338.1.patch Create a patch to fix the issue. Exclude jline dependency from YARN -- Key: YARN-3338 URL: https://issues.apache.org/jira/browse/YARN-3338 Project: Hadoop YARN Issue Type: Bug Components: build Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3338.1.patch It was fixed in YARN-2815, but is broken again by YARN-1514. -- This message was sent by Atlassian JIRA (v6.3.4#6332)