[jira] [Updated] (YARN-3091) [Umbrella] Improve and fix locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3091: - Issue Type: Task (was: Improvement) [Umbrella] Improve and fix locks of RM scheduler Key: YARN-3091 URL: https://issues.apache.org/jira/browse/YARN-3091 Project: Hadoop YARN Issue Type: Task Components: capacityscheduler, fairscheduler, resourcemanager, scheduler Reporter: Wangda Tan In existing YARN RM scheduler, there're some issues of using locks. For example: - Many unnecessary synchronized locks, we have seen several cases recently that too frequent access of scheduler makes scheduler hang. Which could be addressed by using read/write lock. Components include scheduler, CS queues, apps - Some fields not properly locked (Like clusterResource) We can address them together in this ticket. (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3091) [Umbrella] Improve and fix locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3091: - Summary: [Umbrella] Improve and fix locks of RM scheduler (was: [Umbrella] Improve locks of RM scheduler) [Umbrella] Improve and fix locks of RM scheduler Key: YARN-3091 URL: https://issues.apache.org/jira/browse/YARN-3091 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, fairscheduler, resourcemanager, scheduler Reporter: Wangda Tan In existing YARN RM scheduler, there're some issues of using locks. For example: - Many unnecessary synchronized locks, we have seen several cases recently that too frequent access of scheduler makes scheduler hang. Which could be addressed by using read/write lock. Components include scheduler, CS queues, apps - Some fields not properly locked (Like clusterResource) We can address them together in this ticket. (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3095) Enable DockerContainerExecutor to update Docker image
Chen He created YARN-3095: - Summary: Enable DockerContainerExecutor to update Docker image Key: YARN-3095 URL: https://issues.apache.org/jira/browse/YARN-3095 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.6.0 Reporter: Chen He Assignee: Chen He This JIRA allows DCE to check and update docker image before running a container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289567#comment-14289567 ] Zhijie Shen commented on YARN-3087: --- It's known issue for long time. I found the ticket that reports this problem before: YARN-1142 the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1142) MiniYARNCluster web ui does not work properly
[ https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289569#comment-14289569 ] Zhijie Shen commented on YARN-1142: --- Hi [~tucu00], did you have a chance to find where the exact singleton is? MiniYARNCluster web ui does not work properly - Key: YARN-1142 URL: https://issues.apache.org/jira/browse/YARN-1142 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.7.0 When going to the RM http port, the NM web ui is displayed. It seems there is a singleton somewhere that breaks things when RM NMs run in the same process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289591#comment-14289591 ] Ray Chiang commented on YARN-2868: -- Okay, my bad. I'll put it back the way it was. Add metric for initial container launch time Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289619#comment-14289619 ] Sunil G commented on YARN-1963: --- As per discussion happened in YARN-2896 with [~eepayne] and [~leftnoteasy], there is proposal to use Integer alone as priority from client and as well as in server. As per design doc, a priority label was used as wrapper for user and internally server was using corresponding integer with same. We can continue discussion on this here in parent JIRA. Looping [~vinodkv]. Current idea: {noformat} yarn.prority-labels = low:2, medium:4, high:6 {noformat} Proposed: {noformat} yarn.application.priority = 2, 3 , 4 {noformat} Thank you for sharing your thoughts. I will now upload scheduler changes which can be reviewed meantime. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289546#comment-14289546 ] Robert Metzger commented on YARN-3086: -- It seems that even on trunk tests are failing in the hadoop-yarn-server-resourcemanager package. Looks like its pretty hard to verify if my change is breaking anything. I'm uploading an updated patch in a few hours... {code} Failed tests: TestAMRestart.testRMAppAttemptFailuresValidityInterval:630 AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry:405 AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort:316-checkShortCircuitRenewCancel:363 expected:getProxy but was:null TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort:327-checkShortCircuitRenewCancel:363 expected:getProxy but was:null TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort:305-checkShortCircuitRenewCancel:363 expected:getProxy but was:null TestRMRestart.testQueueMetricsOnRMRestart:1812-assertQueueMetrics:1837 expected:2 but was:1 TestRMRestart.testRMRestartGetApplicationList:965 Wanted but not invoked: rMAppManager.logApplicationSummary( isA(org.apache.hadoop.yarn.api.records.ApplicationId) ); - at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:965) However, there were other interactions with this mock: - at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1188) - at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1188) - at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1188) - at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1188) TestContainerResourceUsage.testUsageAfterAMRestartWithMultipleContainers:252-amRestartTests:393 Unexcpected MemorySeconds value expected:-1456158548889 but was:3265 Tests in error: TestClientRMTokens.testShortCircuitRenewCancel:285-checkShortCircuitRenewCancel:353 » NullPointer TestClientRMTokens.testShortCircuitRenewCancelWildcardAddress:294-checkShortCircuitRenewCancel:353 » NullPointer TestAMAuthorization.testUnauthorizedAccess:273 » UnknownHost Invalid host name... TestAMAuthorization.testUnauthorizedAccess:273 » UnknownHost Invalid host name... {code} Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: Robert Metzger Priority: Minor Attachments: YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3091) [Umbrella] Improve and fix locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289552#comment-14289552 ] Wangda Tan commented on YARN-3091: -- Thanks for jumping in and provide your thoughts. [~gtCarrera], [~sunilg], [~ozawa], [~rohithsharma], [~varun_saxena]. I've just updated title of this JIRA a little bit according to suggestions from [~gtCarrera]. I think it's better to put improvement and fix together in this ticket. Since they share a lot of background works. And +1 to fix bugs prior to improvements, but it is possible we can address both of them at some places. I agree to run Jcarder first to pinpoint problems first, with that, we can get some valid inputs. But I'm not sure what's the plan of HADOOP-9213, if it needs take more time, we can do some works our side parallelly. [Umbrella] Improve and fix locks of RM scheduler Key: YARN-3091 URL: https://issues.apache.org/jira/browse/YARN-3091 Project: Hadoop YARN Issue Type: Task Components: capacityscheduler, fairscheduler, resourcemanager, scheduler Reporter: Wangda Tan In existing YARN RM scheduler, there're some issues of using locks. For example: - Many unnecessary synchronized locks, we have seen several cases recently that too frequent access of scheduler makes scheduler hang. Which could be addressed by using read/write lock. Components include scheduler, CS queues, apps - Some fields not properly locked (Like clusterResource) We can address them together in this ticket. (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289603#comment-14289603 ] Wangda Tan commented on YARN-2800: -- Thanks review from [~vinodkv] and commit by [~ozawa]. Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature Key: YARN-2800 URL: https://issues.apache.org/jira/browse/YARN-2800 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20150122-1.patch In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad use experience. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot get started if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, any operations trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289779#comment-14289779 ] Sangjin Lee commented on YARN-3087: --- Thanks for that Zhijie. the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-2868: - Summary: Add metric for initial container launch time to FairScheduler (was: Add metric for initial container launch time) Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289835#comment-14289835 ] Ray Chiang commented on YARN-2868: -- [~rohithsharma], it looks like CapacityScheduler/AbstractYarnScheduler is missing a couple of things needed to record container launch time. The Clock stuff is easy to add, but the queue related stuff looks like it could get complicated. I think I'd rather wait for YARN-2986 before this JIRA is implemented in CapacityScheduler. I can open a new JIRA for that once this one is done. Does that sound reasonable? Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3081) Potential indefinite wait in ContainerManagementProtocolProxy#addProxyToCache()
[ https://issues.apache.org/jira/browse/YARN-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289773#comment-14289773 ] Tsuyoshi OZAWA commented on YARN-3081: -- Thank you, Ted! Potential indefinite wait in ContainerManagementProtocolProxy#addProxyToCache() --- Key: YARN-3081 URL: https://issues.apache.org/jira/browse/YARN-3081 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: yarn-3081-001.patch {code} if (!removedProxy) { // all of the proxies are currently in use and already scheduled // for removal, so we need to wait until at least one of them closes try { this.wait(); {code} The above code can wait for a condition that has already been satisfied, leading to indefinite wait. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289832#comment-14289832 ] Wangda Tan commented on YARN-1963: -- Thanks for summary from [~sunilg], I think priority should be a range instead of a set of numbers, may be we can refer to how linux do it, the range \[-N, +N], and 0 is default priority. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3028) Better syntax for replace label CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289846#comment-14289846 ] Wangda Tan commented on YARN-3028: -- Patch LGTM, will commit this week. Thanks, Better syntax for replace label CLI --- Key: YARN-3028 URL: https://issues.apache.org/jira/browse/YARN-3028 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3028.patch, 0002-YARN-3028.patch The command to replace label now is such: {code} yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2] {code} Instead of {code} node1:port,label1,label2 {code} I think it's better to say {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289769#comment-14289769 ] Hadoop QA commented on YARN-3021: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694177/YARN-3021.001.patch against trunk revision 24aa462. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.TestLargeSort org.apache.hadoop.conf.TestJobConf org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6398//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6398//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6398//console This message is automatically generated. YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3088) LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error
[ https://issues.apache.org/jira/browse/YARN-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-3088: - Attachment: YARN-3088.v1.txt [~jlowe], would you please take a look at this patch? LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error - Key: YARN-3088 URL: https://issues.apache.org/jira/browse/YARN-3088 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Eric Payne Attachments: YARN-3088.v1.txt If the native executor returns an error trying to delete a path as a particular user when dir==null then the code can NPE trying to build a log message for the error. It blindly deferences dir in the log message despite the code just above explicitly handling the cases when dir could be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-650) User guide for preemption
[ https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-650: -- Fix Version/s: (was: 2.7.0) User guide for preemption - Key: YARN-650 URL: https://issues.apache.org/jira/browse/YARN-650 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Chris Douglas Priority: Minor Attachments: Y650-0.patch YARN-45 added a protocol for the RM to ask back resources. The docs on writing YARN applications should include a section on how to interpret this message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-2868: - Attachment: YARN-2868.007.patch - Move metric to QueueMetrics parent class (to be compatible with CapacityScheduler later) - Remove initialized boolean variable - Restore AtomicLong to SchedulerApplicationAttempt Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289974#comment-14289974 ] Zhijie Shen commented on YARN-3030: --- I'm not sure if we will have quick solution for YARN-3087. I'm okay if we want to work around to put the web service module in the existing webapp. I think we can make per-node aggregator as a singleton because an NM will just have one. In this way, we can easily refer to it in different places of NM. Thoughts? set up ATS writer with basic request serving structure and lifecycle Key: YARN-3030 URL: https://issues.apache.org/jira/browse/YARN-3030 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3030.001.patch, YARN-3030.002.patch, YARN-3030.003.patch Per design in YARN-2928, create an ATS writer as a service, and implement the basic service structure including the lifecycle management. Also, as part of this JIRA, we should come up with the ATS client API for sending requests to this ATS writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290042#comment-14290042 ] Anubhav Dhoot commented on YARN-3079: - Should we add a couple of more combinations to the test to ensure coverage? Update node2 where it increases resources but the increase is less than current max and see no change Update node2 where it decreases resources but the original and new are both less than the current max and see no change? Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3092: - Attachment: YARN-3092.1.patch Updated ver.1 patch. Create common resource usage class to track labeled resource/capacity in Capacity Scheduler --- Key: YARN-3092 URL: https://issues.apache.org/jira/browse/YARN-3092 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3092.1.patch Since we have labels on nodes, so we need to track resource usage *by labels*, includes - AM resource (to enforce max-am-resource-by-label after YARN-2637) - Used resource (includes AM resource usage) - Reserved resource - Pending resource - Headroom Benefits to have such a common class are: - Reuse lots of code in different places (Queue/App/User), better maintainability and readability. - Can make fine-grained locking (e.g. accessing used resource in a queue doesn't need lock a queue) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290163#comment-14290163 ] Jian He commented on YARN-3011: --- lgtm overall, IIUC, if {{yarn.dispatcher.exit-on-error}} is set to false, NM will not crash in this case? one nit on the patch: {{next.getResource().getFile()}} , I feel using {{ConverterUtils#getPathFromYarnURL}} to print the full URL will be more debuggable. NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Attachments: YARN-3011.001.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290208#comment-14290208 ] Hadoop QA commented on YARN-3092: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694258/YARN-3092.1.patch against trunk revision 5c93ca2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6399//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6399//console This message is automatically generated. Create common resource usage class to track labeled resource/capacity in Capacity Scheduler --- Key: YARN-3092 URL: https://issues.apache.org/jira/browse/YARN-3092 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3092.1.patch Since we have labels on nodes, so we need to track resource usage *by labels*, includes - AM resource (to enforce max-am-resource-by-label after YARN-2637) - Used resource (includes AM resource usage) - Reserved resource - Pending resource - Headroom Benefits to have such a common class are: - Reuse lots of code in different places (Queue/App/User), better maintainability and readability. - Can make fine-grained locking (e.g. accessing used resource in a queue doesn't need lock a queue) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery
[ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289200#comment-14289200 ] Hadoop QA commented on YARN-3094: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694133/YARN-3094.patch against trunk revision 3aab354. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6397//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6397//console This message is automatically generated. reset timer for liveness monitors after RM recovery --- Key: YARN-3094 URL: https://issues.apache.org/jira/browse/YARN-3094 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3094.patch When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). In our system, we found the recover process took about 3 mins, and all AM time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290363#comment-14290363 ] Chun Chen commented on YARN-3077: - Thanks for reviewing the patch, [~jianhe]. upload a new patch addressing your comments. RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Attachments: YARN-3077.2.patch, YARN-3077.patch If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290384#comment-14290384 ] Hadoop QA commented on YARN-3086: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694155/YARN-3086.patch against trunk revision 8f26d5a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6404//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6404//console This message is automatically generated. Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: Robert Metzger Priority: Minor Attachments: YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290232#comment-14290232 ] Jian He commented on YARN-3077: --- [~chenchun], thanks for working on this. The newly added test is passing without the patch change. mind taking a deeper look ? For test case: I suggest changing ZK_RM_STATE_STORE_PARENT_PATH to use /foo/bar for the existing test cases, instead of adding a new unit test to test the until method. RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Attachments: YARN-3077.patch If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3092: - Attachment: YARN-3092.2.patch Thanks review from [~jianhe], updated patch addressed all comments. Create common resource usage class to track labeled resource/capacity in Capacity Scheduler --- Key: YARN-3092 URL: https://issues.apache.org/jira/browse/YARN-3092 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3092.1.patch, YARN-3092.2.patch Since we have labels on nodes, so we need to track resource usage *by labels*, includes - AM resource (to enforce max-am-resource-by-label after YARN-2637) - Used resource (includes AM resource usage) - Reserved resource - Pending resource - Headroom Benefits to have such a common class are: - Reuse lots of code in different places (Queue/App/User), better maintainability and readability. - Can make fine-grained locking (e.g. accessing used resource in a queue doesn't need lock a queue) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290300#comment-14290300 ] Eric Payne commented on YARN-1963: -- +1 on using numbers and not labels. It seems that the use of labels adds more complexity in mapping, sending via PB, and converting back to numbers, and does not seem to add much clarity. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290406#comment-14290406 ] Hadoop QA commented on YARN-3077: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694322/YARN-3077.2.patch against trunk revision 8f26d5a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6405//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6405//console This message is automatically generated. RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Attachments: YARN-3077.2.patch, YARN-3077.patch If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290420#comment-14290420 ] Anubhav Dhoot commented on YARN-2868: - Adding it to ClusterMetrics will only give you a single value for the entire cluster which is pretty much useless if you want to investigate queue related issues. Adding it to a per queue metrics will give you more granular data. If you only care about the cluster wide metrics you still get that by looking at the root queue metrics. Hence we need to keep it per queue. All clustermetrics that are related to a queue should be moved to per queue metrics. I will open other jiras for moving those Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290235#comment-14290235 ] Jian He commented on YARN-3077: --- doing above, the createRootDirRecursively visibility can be changed to private RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Attachments: YARN-3077.patch If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290242#comment-14290242 ] Jian He commented on YARN-3092: --- looks good overall. - minor optimization, return the reference directly if not existing {code} if (!usages.containsKey(label)) { usages.put(label, new UsageByLabel(label)); } return usages.get(label); {code} - demand - pending? - test case, just throw exception for better readability. {code} NoSuchMethodException, SecurityException, IllegalAccessException, IllegalArgumentException, InvocationTargetException {code} - the new class can be inside scheduler package Create common resource usage class to track labeled resource/capacity in Capacity Scheduler --- Key: YARN-3092 URL: https://issues.apache.org/jira/browse/YARN-3092 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3092.1.patch Since we have labels on nodes, so we need to track resource usage *by labels*, includes - AM resource (to enforce max-am-resource-by-label after YARN-2637) - Used resource (includes AM resource usage) - Reserved resource - Pending resource - Headroom Benefits to have such a common class are: - Reuse lots of code in different places (Queue/App/User), better maintainability and readability. - Can make fine-grained locking (e.g. accessing used resource in a queue doesn't need lock a queue) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290272#comment-14290272 ] Wangda Tan commented on YARN-2868: -- Hmm, I think it may not good enough to put in QueueMetrics (I just noticed this). Every new app will overwrite this value, which is confusing to me, also to end users. When you look at metrics fields in QueueMetrics, all of them are generic metrics of a queue, but this field seems not so generic to me. Is there any must-have reason or use cases to add it to QueueMetrics or alternatively you can add an application-metrics so you can add it there? Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290286#comment-14290286 ] Ray Chiang commented on YARN-2868: -- I had it previously in FSQueueMetrics, then moved it to QueueMetrics based on Rohith's feedback, and then determined that I updating CapacityScheduler with all the matching queue stuff right now would be potentially in conflict with YARN-2986. I could push the metric back to FSQueueMetrics. Since this is a MutableRate, the metric shouldn't get clobbered with each app, but will get averaged in (unless I'm misunderstanding something). I'm going to wait on further feedback before I do more editing. Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chen updated YARN-3077: Attachment: YARN-3077.2.patch RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Attachments: YARN-3077.2.patch, YARN-3077.patch If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290250#comment-14290250 ] Hadoop QA commented on YARN-2868: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694276/YARN-2868.007.patch against trunk revision 8f26d5a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6400//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6400//console This message is automatically generated. Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3088) LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error
[ https://issues.apache.org/jira/browse/YARN-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290246#comment-14290246 ] Hadoop QA commented on YARN-3088: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694287/YARN-3088.v1.txt against trunk revision 8f26d5a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6401//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6401//console This message is automatically generated. LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error - Key: YARN-3088 URL: https://issues.apache.org/jira/browse/YARN-3088 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Eric Payne Attachments: YARN-3088.v1.txt If the native executor returns an error trying to delete a path as a particular user when dir==null then the code can NPE trying to build a log message for the error. It blindly deferences dir in the log message despite the code just above explicitly handling the cases when dir could be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290247#comment-14290247 ] Wangda Tan commented on YARN-3075: -- {code} I had put it inside node.labels != null condition earlier but this leads to test case failures. If you see the code in getNodeLabels, you would find that we get host.labels if nodeId doesnt have specific labels associated with it. I on the other hand am storing whatever is required right from the beginning. So dont need to make this decision at the time of call to getLabelsToNodes So its just a difference in approach. Doesn't lead to any functional issues. Let me know your opinion on this. {code} The problem is not only functional issue. I think two parts of code need to be consistent, mostly to avoid misunderstanding and could be easier debug. - In NodeLabelsManager, when trying to get a labels on a Node, if node.label not exist (null), return host.label. And If node and host has same label, we should set node.label = null to make structure as simple as possible. - In NodeLabel, I think we should have similar logic, if a node's label is as same as host, we should only store host in NodeLabel. Sounds good? Thanks, NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290289#comment-14290289 ] Hadoop QA commented on YARN-3092: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694258/YARN-3092.1.patch against trunk revision 8f26d5a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6402//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6402//console This message is automatically generated. Create common resource usage class to track labeled resource/capacity in Capacity Scheduler --- Key: YARN-3092 URL: https://issues.apache.org/jira/browse/YARN-3092 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3092.1.patch, YARN-3092.2.patch Since we have labels on nodes, so we need to track resource usage *by labels*, includes - AM resource (to enforce max-am-resource-by-label after YARN-2637) - Used resource (includes AM resource usage) - Reserved resource - Pending resource - Headroom Benefits to have such a common class are: - Reuse lots of code in different places (Queue/App/User), better maintainability and readability. - Can make fine-grained locking (e.g. accessing used resource in a queue doesn't need lock a queue) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290291#comment-14290291 ] Wangda Tan commented on YARN-2868: -- Just checked code, is it good to put in ClusterMetrics? aMRegisterDelay/amLaunchDelay seems more related to such initial container allocation time. And name of the metric could be App first container allocation delay. Sounds good? [~rohithsharma]. Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3022) Expose Container resource information from NodeManager for monitoring
[ https://issues.apache.org/jira/browse/YARN-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290316#comment-14290316 ] Robert Kanter commented on YARN-3022: - LGTM, just one minor thing: - In {{ContainerMetrics}}, can you create a {{public static final String}} for {{pMemUsage}} like you did for the others? Expose Container resource information from NodeManager for monitoring - Key: YARN-3022 URL: https://issues.apache.org/jira/browse/YARN-3022 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3022.001.patch, YARN-3022.002.patch Along with exposing resource consumption of each container such as (YARN-2141) its worth exposing the actual resource limit associated with them to get better insight into YARN allocation and consumption -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3093) Support load command from admin [Helps to load big set of labels]
[ https://issues.apache.org/jira/browse/YARN-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290324#comment-14290324 ] Wangda Tan commented on YARN-3093: -- That will be helpful, +1 for the propose too!. I think we can make it compatible with syntax in YARN-3028, and in addition, do you think is it possible we can add labels appeared in the conf file to clusterNodeLabels automatically? Which we can make the config file simpler. Support load command from admin [Helps to load big set of labels] - Key: YARN-3093 URL: https://issues.apache.org/jira/browse/YARN-3093 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Proposing yarn rmadmin -load -nodelabels filename nodelabels can be one such option here, and this can be generalized by giving other options later. Advantage of this command will be an easier configuration. Assume admin need to load labels to more than 20+ nodes, current command is little difficult. If these config can be preloaded in a file, and then can upload to RM. With existing parsing and update logic, same can be achieved. I am showing a simpler proposed config file. {noformat} rm1 $ cat node_label.conf add [ label1,label2,label3,label4,label11,label12,label13,label14,abel21,label22,label23,label24 ] replace[ node1:port=label1,label2,label23,label24 node2:port=label4,abel11,label12,label13,label14,label21 node3:port=label2,label3,label4,label11,label12,label13,label14 node4:port=label14,label21,label22,label23,label24 node5:port=label14,label21,label22,label23,label24 node6:port=label4,label11,label12,label13,label14,label21,label22,label23,label24 ] {noformat} A restriction on file size can be kept to avoid uploading very huge files. Please share your opinion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery
[ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290377#comment-14290377 ] Jun Gong commented on YARN-3094: [~rohithsharma] Thanks for your review. I will add a test case if needed. {quote} How many RUNNING applications are running in cluster? {quote} Just several hundreds apps running. The reason for slow recovery might be because a lot of exceptions when storing RMApps' data using RMApplicationHistoryWriter. We will make further investigation. {quote} What is the AM liveliness timeout configured in cluster? {quote} 3 mins. Then we could find it earlier if AM crashes. reset timer for liveness monitors after RM recovery --- Key: YARN-3094 URL: https://issues.apache.org/jira/browse/YARN-3094 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3094.patch When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). In our system, we found the recover process took about 3 mins, and all AM time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290411#comment-14290411 ] Rohith commented on YARN-2868: -- I had a thought that metric can be common to all schedulers. If there is any complexities now, can be added later. Even I had specific doubt that this metic is for application level but not for scheduler level. I was mentioned in previous comment 2nd point. I was in dilemma that where to place exactly!! Now I see clusterMetrics has already some metrics related to AM. +1 for keeping in ClusterMetrics and for metric name. Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290371#comment-14290371 ] Hadoop QA commented on YARN-3092: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694303/YARN-3092.2.patch against trunk revision 8f26d5a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6403//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6403//console This message is automatically generated. Create common resource usage class to track labeled resource/capacity in Capacity Scheduler --- Key: YARN-3092 URL: https://issues.apache.org/jira/browse/YARN-3092 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3092.1.patch, YARN-3092.2.patch Since we have labels on nodes, so we need to track resource usage *by labels*, includes - AM resource (to enforce max-am-resource-by-label after YARN-2637) - Used resource (includes AM resource usage) - Reserved resource - Pending resource - Headroom Benefits to have such a common class are: - Reuse lots of code in different places (Queue/App/User), better maintainability and readability. - Can make fine-grained locking (e.g. accessing used resource in a queue doesn't need lock a queue) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery
[ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290395#comment-14290395 ] Chun Chen commented on YARN-3094: - Since RM can't receive ping from AM util ApplicationMasterService starts, I think it is more accurate to reset time in AMLivelinessMonitor service after ApplicationMasterService starts. I suggest init AMLivelinessMonitor service after ApplicationMasterService in RMActiveServices#serviceInit. reset timer for liveness monitors after RM recovery --- Key: YARN-3094 URL: https://issues.apache.org/jira/browse/YARN-3094 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3094.patch When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). In our system, we found the recover process took about 3 mins, and all AM time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290415#comment-14290415 ] Varun Saxena commented on YARN-3075: [~leftnoteasy], regarding your comment below. bq. In NodeLabel, I think we should have similar logic, if a node's label is as same as host, we should only store host in NodeLabel. Right now, when we {{getLabelsToNodes}} we simply query {{labelCollections}}. If we change like above, we will have to query {{nodeCollections}} as well to find out what all nodes are associated with the host stored. Are you fine with doing that ? NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288979#comment-14288979 ] Hadoop QA commented on YARN-2800: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12693958/YARN-2800-20150122-1.patch against trunk revision 3aab354. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6395//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6395//console This message is automatically generated. Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature Key: YARN-2800 URL: https://issues.apache.org/jira/browse/YARN-2800 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20150122-1.patch In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad use experience. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot get started if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, any operations trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289117#comment-14289117 ] Tsuyoshi OZAWA commented on YARN-2800: -- Committing this to trunk and branch-2. Thanks [~leftnoteasy] for your contribution and thanks [~vinodkv] for the review! Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature Key: YARN-2800 URL: https://issues.apache.org/jira/browse/YARN-2800 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20150122-1.patch In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad use experience. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot get started if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, any operations trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289120#comment-14289120 ] Hudson commented on YARN-2800: -- FAILURE: Integrated in Hadoop-trunk-Commit #6917 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6917/]) YARN-2800. Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature. Contributed by Wangda Tan. (ozawa: rev 24aa462673d392fed859f8088acf9679ae62a129) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/MemoryRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NullRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature Key: YARN-2800 URL: https://issues.apache.org/jira/browse/YARN-2800 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20150122-1.patch In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad use experience. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot get started if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, any operations trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery
[ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289141#comment-14289141 ] Rohith commented on YARN-3094: -- Thanks [~hex108] for reporting the issue and for your contributions. Patch looks to me good. Can you add tests for this? And could you give some general information like # How many RUNNING applications are running in cluster? # What is the AM liveliness timeout configured in cluster? reset timer for liveness monitors after RM recovery --- Key: YARN-3094 URL: https://issues.apache.org/jira/browse/YARN-3094 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3094.patch When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). In our system, we found the recover process took about 3 mins, and all AM time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing
[ https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289109#comment-14289109 ] Hudson commented on YARN-3082: -- FAILURE: Integrated in Hadoop-Yarn-trunk #816 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/816/]) YARN-3082. Non thread safe access to systemCredentials in NodeHeartbeatResponse processing. Contributed by Anubhav Dhoot. (ozawa: rev 3aab354e664a3ce09e0d638bf0c1e7d273d40579) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/CHANGES.txt Non thread safe access to systemCredentials in NodeHeartbeatResponse processing --- Key: YARN-3082 URL: https://issues.apache.org/jira/browse/YARN-3082 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3082.001.patch, YARN-3082.002.patch When you use system credentials via feature added in YARN-2704, the proto conversion code throws exception in converting ByteBuffer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3085) Application summary should include the application type
[ https://issues.apache.org/jira/browse/YARN-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289119#comment-14289119 ] Hadoop QA commented on YARN-3085: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694131/0001-YARN-3085.patch against trunk revision 3aab354. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6396//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6396//console This message is automatically generated. Application summary should include the application type --- Key: YARN-3085 URL: https://issues.apache.org/jira/browse/YARN-3085 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Jason Lowe Assignee: Rohith Attachments: 0001-YARN-3085.patch Adding the application type to the RM application summary log makes it easier to audit the number of applications from various app frameworks that are running on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289123#comment-14289123 ] Varun Saxena commented on YARN-3075: bq. 4) In add(remove/replace)NodeToLabels, such null check is not necessary: if (label != null). It will be checked in check... methods in CommonsNodeLabelsManager. That's true. Will remove the additional null check. bq. 1) When op (add/remove/replace) is on a host nodeId.getPort() == WILDCARD_PORT, (of course you need update label-host), you only need update label-Nodes when check node.labels != null is true. I had put it inside node.labels != null condition earlier but this leads to test case failures. If you see the code in {{getNodeLabels}}, you would find that we get host.labels if nodeId doesnt have specific labels associated with it. I on the other hand am storing whatever is required right from the beginning. So dont need to make this decision at the time of call to {{getLabelsToNodes}} So its just a difference in approach. Doesn't lead to any functional issues. Let me know your opinion on this. bq. 3.3 When a label contains (nodeId.port = WILDCARD_PORT), you should add Nodes in the host if (node.labels == null). It is possible a. admin specify host1.label = x; b. nm1 on host1 activated. You should get nm1 when you inquire nodes of label=x. You may need to add a test to TestRMNodeLabelsManager. You can take a look at testNodeActiveDeactiveUpdate Thanks for the input. Yes, activate and deactivate node needs to delete node from labelCollections as well. Will do so. I will modify {{testNodeActiveDeactiveUpdate}} accordingly. NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289130#comment-14289130 ] Varun Saxena commented on YARN-3075: bq. 2) When op is on a node, as mentioned by Sunil, replace opertions not correct, it should be remove and then add. bq. That is what I am doing. Removing and add. Sunil G meant that we can refactor replaceNodeForLabels and not reduplicate add already present in removeNodeFromLabels and addNodeForLabels function. Did you mean something else ? Typing mistake. I meant That is what I am doing. Removing and add. Sunil G meant that we can refactor replaceNodeForLabels and not reduplicate code already present in removeNodeFromLabels and addNodeForLabels function. Did you mean something else ? NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3011: --- Attachment: YARN-3011.002.patch NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Attachments: YARN-3011.001.patch, YARN-3011.002.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290434#comment-14290434 ] Varun Saxena commented on YARN-3011: bq. IIUC, if yarn.dispatcher.exit-on-error is set to false, NM will not crash in this case? Yes, you are correct. NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Attachments: YARN-3011.001.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3011: --- Attachment: YARN-3011.002.patch NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Attachments: YARN-3011.001.patch, YARN-3011.002.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3011: --- Attachment: (was: YARN-3011.002.patch) NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Attachments: YARN-3011.001.patch, YARN-3011.002.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290465#comment-14290465 ] Hadoop QA commented on YARN-3079: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694331/YARN-3079.002.patch against trunk revision 8f26d5a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6406//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6406//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6406//console This message is automatically generated. Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290466#comment-14290466 ] Hadoop QA commented on YARN-3011: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694335/YARN-3011.002.patch against trunk revision 8f26d5a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6407//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6407//console This message is automatically generated. NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Attachments: YARN-3011.001.patch, YARN-3011.002.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3079: Attachment: YARN-3079.002.patch Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290431#comment-14290431 ] Rohith commented on YARN-2868: -- bq. All clustermetrics that are related to a queue should be moved to per queue metrics. I will open other jiras for moving those IIUC, be inform that this would breaks compatibility. ClusterMetrics are exposed to users. It would be better to keep current metrics as it is and only work on new metrics addition. Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290444#comment-14290444 ] zhihai xu commented on YARN-3079: - thanks [~leftnoteasy] and [~adhoot]'s review, addressed the comment from [~adhoot] in the new patch YARN-3079.002.patch. about the comment from [~leftnoteasy], bq. 1) Suggest to change signature of updateMaximumAllocation(SchedulerNode, bool) to updateMaximumAllocation(Resource nodeResource, bool), since we only uses nodeResource here. This is discussable. I prefer to keep the current signature because the current signature is more flexible and more meaningful for the other parameter(added node or removed node). Two nodes can have same nodeResource and you can access more information from SchedulerNode. bq. 2) Change resource for a NM is equivalent to {{updateMaximumAllocation(oldNodeResource, false)}} and {{updateMaximumAllocation(newNoderesource, true)}}. We can avoid some duplicated logic. I think it is not completely equivalent . because when you call {{updateMaximumAllocation(oldNodeResource, false)}}, you supposed the node is already removed from the HashMap nodes based on both the implementation of updateMaximumAllocation and caller of updateMaximumAllocation. But in the context updateNodeResource, the node whose resource to be changed is still in the HashMap nodes. bq. 3) Suggest rename updateMaximumAllocation(void) to refreshMaximumAllocation() or other name reflects the behavior: scan all cluster nodes and get maximum allocation. good suggestion. refreshMaximumAllocation is very good name. addressed this comment in the new patch YARN-3079.002.patch. Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3079: Attachment: YARN-3079.002.patch Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3079: Attachment: (was: YARN-3079.002.patch) Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290490#comment-14290490 ] Tsuyoshi OZAWA commented on YARN-3086: -- [~rmetzger] Please check whether your tests fail without your patch. If they fails without your patch, it can be environment-depend problem. In that case, you can throw a patch regardless of the failures. Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: Robert Metzger Priority: Minor Attachments: YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290493#comment-14290493 ] Yongjun Zhang commented on YARN-3021: - I reran the failed tests locally, The following tests {quote} org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStatorg.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.mapred.TestJobConf {quote} are successful The following test failed: {quote} org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem TestJobConf.testNegativeValueForTaskVmem:111 expected:1024 but was:-1 {quote} and it was reported as MAPREDUCE-6223. TestLargeSort failed in https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2033/ YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1456) IntelliJ IDEA gets dependencies wrong for hadoop-yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289052#comment-14289052 ] Steve Loughran commented on YARN-1456: -- marking as a duplicate of YARN-888; I've not seen it for a while IntelliJ IDEA gets dependencies wrong for hadoop-yarn-server-resourcemanager - Key: YARN-1456 URL: https://issues.apache.org/jira/browse/YARN-1456 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Environment: IntelliJ IDEA 12.x 13.x beta Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Attachments: YARN-1456-001.patch When IntelliJ IDEA imports the hadoop POMs into the IDE, somehow it fails to pick up all the transitive dependencies of the yarn-client, and so can't resolve commons logging, com.google.* classes and the like. While this is probably an IDEA bug, it does stop you building Hadoop from inside the IDE, making debugging significantly harder -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3093) Support load command from admin [Helps to load big set of labels]
[ https://issues.apache.org/jira/browse/YARN-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289053#comment-14289053 ] Rohith commented on YARN-3093: -- +1 for the propose. This is much useful for very large cluster. Support load command from admin [Helps to load big set of labels] - Key: YARN-3093 URL: https://issues.apache.org/jira/browse/YARN-3093 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Proposing yarn rmadmin -load -nodelabels filename nodelabels can be one such option here, and this can be generalized by giving other options later. Advantage of this command will be an easier configuration. Assume admin need to load labels to more than 20+ nodes, current command is little difficult. If these config can be preloaded in a file, and then can upload to RM. With existing parsing and update logic, same can be achieved. I am showing a simpler proposed config file. {noformat} rm1 $ cat node_label.conf add [ label1,label2,label3,label4,label11,label12,label13,label14,abel21,label22,label23,label24 ] replace[ node1:port=label1,label2,label23,label24 node2:port=label4,abel11,label12,label13,label14,label21 node3:port=label2,label3,label4,label11,label12,label13,label14 node4:port=label14,label21,label22,label23,label24 node5:port=label14,label21,label22,label23,label24 node6:port=label4,label11,label12,label13,label14,label21,label22,label23,label24 ] {noformat} A restriction on file size can be kept to avoid uploading very huge files. Please share your opinion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289062#comment-14289062 ] Varun Saxena commented on YARN-3075: [~leftnoteasy], thanks for the review. Kindly find my replies below : bq. 2) When op is on a node, as mentioned by Sunil, replace opertions not correct, it should be remove and then add. That is what I am doing. Removing and add. [~sunilg] meant that we can refactor replaceNodeForLabels and not reduplicate add already present in removeNodeFromLabels and addNodeForLabels function. Did you mean something else ? bq. 3.1 Two loop seems duplicated, you can set labels = labelCollections.entrySet when (labels == null or empty). That's a good suggestion. Will make the change bq. 3.2 When labels == null or empty, it will returns nodes on all labels. You need add a javadocs to brief this behavior and you need remove empty label from labelCollection like what we did in getClusterNodeLabels. The empty label exists because we need track non-labeled nodes in scheduler side but it shouldn't be seen by user. Didn't know about that. Will remove the empty label. NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3094) reset timer for liveness monitors after RM recovery
[ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3094: --- Attachment: YARN-3094.patch reset timer for liveness monitors after RM recovery --- Key: YARN-3094 URL: https://issues.apache.org/jira/browse/YARN-3094 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3094.patch When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). In our system, we found the recover process took about 3 mins, and all AM time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289097#comment-14289097 ] Tsuyoshi OZAWA commented on YARN-2800: -- I also confirmed that the test failure of TestRMWebServicesAppsModification is not related to the patch. It passes locally. Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature Key: YARN-2800 URL: https://issues.apache.org/jira/browse/YARN-2800 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20150122-1.patch In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad use experience. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot get started if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, any operations trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3085) Application summary should include the application type
[ https://issues.apache.org/jira/browse/YARN-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3085: - Attachment: 0001-YARN-3085.patch Application summary should include the application type --- Key: YARN-3085 URL: https://issues.apache.org/jira/browse/YARN-3085 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Jason Lowe Assignee: Rohith Attachments: 0001-YARN-3085.patch Adding the application type to the RM application summary log makes it easier to audit the number of applications from various app frameworks that are running on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289093#comment-14289093 ] Tsuyoshi OZAWA commented on YARN-2800: -- Thanks for your updating. I rechecked the code. RMNodeLabelsManager, FileSystemNodeLabelsStore, RMAdminCLI can access to the variable nodeLabelsEnabled, but I agree with you that we don't need to make nodeLabelsEnabled volatile since there are no problem in the code path. I'll update following comments to follow javadoc format. After that I'll commit it. {code} + /* + * Following are options for node labels {code} {code} + /* + * Error messages + */ {code} Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature Key: YARN-2800 URL: https://issues.apache.org/jira/browse/YARN-2800 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20150122-1.patch In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad use experience. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot get started if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, any operations trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3081) Potential indefinite wait in ContainerManagementProtocolProxy#addProxyToCache()
[ https://issues.apache.org/jira/browse/YARN-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289103#comment-14289103 ] Tsuyoshi OZAWA commented on YARN-3081: -- [~ted_yu], Thanks for the reporting. I checked the code path. Current code looks correct. If the tryCloseProxy() succeeds, the sleeping threads will be waken up and try to register the proxy instance as cache entry. If wait() have the timeout value, the sleeping threads are waken up before cmProxy.size() is updated. It looks consumption of CPU resource since there are no change between before and after the sleep. What do you think? Please let me know if I have some missing points. Potential indefinite wait in ContainerManagementProtocolProxy#addProxyToCache() --- Key: YARN-3081 URL: https://issues.apache.org/jira/browse/YARN-3081 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: yarn-3081-001.patch {code} if (!removedProxy) { // all of the proxies are currently in use and already scheduled // for removal, so we need to wait until at least one of them closes try { this.wait(); {code} The above code can wait for a condition that has already been satisfied, leading to indefinite wait. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing
[ https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289073#comment-14289073 ] Hudson commented on YARN-3082: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #82 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/82/]) YARN-3082. Non thread safe access to systemCredentials in NodeHeartbeatResponse processing. Contributed by Anubhav Dhoot. (ozawa: rev 3aab354e664a3ce09e0d638bf0c1e7d273d40579) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java Non thread safe access to systemCredentials in NodeHeartbeatResponse processing --- Key: YARN-3082 URL: https://issues.apache.org/jira/browse/YARN-3082 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3082.001.patch, YARN-3082.002.patch When you use system credentials via feature added in YARN-2704, the proto conversion code throws exception in converting ByteBuffer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3094) reset timer for liveness monitors after RM recovery
Jun Gong created YARN-3094: -- Summary: reset timer for liveness monitors after RM recovery Key: YARN-3094 URL: https://issues.apache.org/jira/browse/YARN-3094 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). In our system, we found the recover process took about 3 mins, and all AM time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289286#comment-14289286 ] Jason Lowe commented on YARN-914: - bq. The first step I was thinking to keep NM running in a low resource mode after graceful decommissioned I think it could be useful to leave the NM process up after the graceful decommission completes. That allows automated decommissioning tools to know the process completed by querying the NM directly. If the NM exits then the tool may have difficulty distinguishing between the NM crashing just before decommisioning completed vs. successful completion. The RM will be tracking this state as well, so it may not be critical to do it one way or the other if the tool is querying the RM rather than the NM directly. bq. However, I am not sure if they can handle state migration to new node ahead of predictable node lost here, or be stateless more or less make more sense here? I agree with Ming that it would be nice if the graceful decommission process could give the AMs a heads up about what's going on. The simplest way to accomplish that is to leverage the already existing preemption framework to tell the AM that YARN is about to take the resources away. The StrictPreemptionContract portion of the PreemptionMessage can be used to list exact resources that YARN will be reclaiming and give the AM a chance to react to that before the containers are reclaimed. It's then up to the AM if it wants to do anything special or just let the containers get killed after a timeout. bq. These notification may still be necessary, so AM won't add these nodes into blacklist if container get killed afterwards. Thoughts? I thought we could leverage the updated nodes list of the AllocateResponse to let AMs know when nodes are entering the decommissioning state or at least when the decommission state completes (and containers are killed). Although if the AM adds the node to the blacklist, that's not such a bad thing either since the RM should never allocate new containers on a decommissioning node anyway. Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated YARN-3086: - Attachment: YARN-3086.patch Wow .. This is a special moment. The first patch I'm submitting to Hadoop ;) Sadly, I'm was not able to run any the tests because the tests in trunk don't seem to pass. Lets hope the CI tools here are able to verify my patch. But out of curiosity: Is it common that your trunk is not building? Am I supposed to develop against a version specific branch? Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: Robert Metzger Priority: Minor Attachments: YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing
[ https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289281#comment-14289281 ] Hudson commented on YARN-3082: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #79 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/79/]) YARN-3082. Non thread safe access to systemCredentials in NodeHeartbeatResponse processing. Contributed by Anubhav Dhoot. (ozawa: rev 3aab354e664a3ce09e0d638bf0c1e7d273d40579) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java Non thread safe access to systemCredentials in NodeHeartbeatResponse processing --- Key: YARN-3082 URL: https://issues.apache.org/jira/browse/YARN-3082 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3082.001.patch, YARN-3082.002.patch When you use system credentials via feature added in YARN-2704, the proto conversion code throws exception in converting ByteBuffer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289349#comment-14289349 ] Tsuyoshi OZAWA commented on YARN-3086: -- {code} I'm was not able to run any the tests because the tests in trunk don't seem to pass {code} The intermittent test failure can be observed currently for some reasons - port conflict, shortage of resource, and timing issues. We should fix them, but the problems are still there. I recommend you to run test under the related directory - in this case, I recommend you to run tests under hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager instead of root directory. {code} Am I supposed to develop against a version specific branch? {code} The develop branch is trunk, so I recommend you to develop on trunk except the problem is branch-specific problem. Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: Robert Metzger Priority: Minor Attachments: YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated YARN-3021: Attachment: YARN-3021.001.patch YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289356#comment-14289356 ] Tsuyoshi OZAWA commented on YARN-3086: -- {quote} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Oops, maybe hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager is better. {quote} This lines are typo. Please ignore them. Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: Robert Metzger Priority: Minor Attachments: YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289355#comment-14289355 ] Tsuyoshi OZAWA commented on YARN-3086: -- {code} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager {code} Oops, maybe hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager is better. {code} cd hadoop # change directory to mvn clean install -DskipTests # compile and install related jars into local repository cd hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager mvn test # launch test for hadoop-yarn-server-resourcemanager {code} It can take 1 hour or more.If you suffer the waiting time, you can skip to launch the test since I'll help you to submit Jenkins CI with your patch. Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: Robert Metzger Priority: Minor Attachments: YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289360#comment-14289360 ] Robert Metzger commented on YARN-3086: -- Thank you for all the help! I've updated the code... and I'm now trying to execute the tests with your instructions. If that works out, I'll upload an updated version of the patch. Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: Robert Metzger Priority: Minor Attachments: YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289338#comment-14289338 ] Tsuyoshi OZAWA commented on YARN-3086: -- [~rmetzger] Great! At first, I'd like to comment about your patch: How about making the default value 4 * 1024? As a result, we can remove if statement and simplify the code path. {code} DEFAULT_YARN_MINICLUSTER_NM_PMEM_MB = -1; {code} Could you update the point? Make NodeManager memory configurable in MiniYARNCluster --- Key: YARN-3086 URL: https://issues.apache.org/jira/browse/YARN-3086 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: Robert Metzger Priority: Minor Attachments: YARN-3086.patch Apache Flink has a build-in YARN client to deploy it to YARN clusters. Recently, we added more tests for the client, using the MiniYARNCluster. One of the tests is requesting more containers than available. This test works well on machines with enough memory, but on travis-ci (our test environment), the available main memory is limited to 3 GB. Therefore, I want to set custom amount of memory for each NodeManager. Right now, the NodeManager memory is hardcoded to 4GB. As discussed on the yarn-dev list, I'm going to create a patch for this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289363#comment-14289363 ] Varun Saxena commented on YARN-3011: Someone, kindly review this one NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Attachments: YARN-3011.001.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289381#comment-14289381 ] Hudson commented on YARN-2800: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2033 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2033/]) YARN-2800. Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature. Contributed by Wangda Tan. (ozawa: rev 24aa462673d392fed859f8088acf9679ae62a129) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/MemoryRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NullRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature Key: YARN-2800 URL: https://issues.apache.org/jira/browse/YARN-2800 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20150122-1.patch In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad use experience. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot get started if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, any operations trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing
[ https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289379#comment-14289379 ] Hudson commented on YARN-3082: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2033 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2033/]) YARN-3082. Non thread safe access to systemCredentials in NodeHeartbeatResponse processing. Contributed by Anubhav Dhoot. (ozawa: rev 3aab354e664a3ce09e0d638bf0c1e7d273d40579) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java Non thread safe access to systemCredentials in NodeHeartbeatResponse processing --- Key: YARN-3082 URL: https://issues.apache.org/jira/browse/YARN-3082 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3082.001.patch, YARN-3082.002.patch When you use system credentials via feature added in YARN-2704, the proto conversion code throws exception in converting ByteBuffer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289432#comment-14289432 ] Sunil G commented on YARN-3075: --- Hi Varun {code} + removeNodeFromLabels(nodeId, labels); host.labels.removeAll(labels); + for (EntryNodeId, Node nmEntry : host.nms.entrySet()) { +Node node = nmEntry.getValue(); if (node.labels != null) { node.labels.removeAll(labels); } +removeNodeFromLabels(nmEntry.getKey(), labels); } {code} I think first call to removeNodeFromLabels can be removed. Only loop should be enough. NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289439#comment-14289439 ] Yongjun Zhang commented on YARN-3021: - Hi [~qwertymaniac], [~vinodkv], [~adhoot], Thanks for the earlier discussion and input. I uploaded patch rev 001 by introducing a new job configuration property mapreduce.job.skip.rm.token.renewal, thus passing -Dmapreduce.job.skip.rm.token.renewal=true to distcp (to instruct Resource Manager to skip token renewal) would solve the problem. I did test in the env Harsh helped to set up, thanks Harsh. Would you please help taking a look at the patch? Thanks. YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289435#comment-14289435 ] Varun Saxena commented on YARN-3075: bq. I think first call to removeNodeFromLabels can be removed. Only loop should be enough. host.nms wont have entry of host:0 so merely the loop wont be enough NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289465#comment-14289465 ] Sunil G commented on YARN-3075: --- Thank you [~varun_saxena] for clarifying. So as we discussed, you are saving hosts including port 0. Hence I got confused. If possible can try to keep the same storage structure, and it will be easier later for management. NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing
[ https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289344#comment-14289344 ] Hudson commented on YARN-3082: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #83 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/83/]) YARN-3082. Non thread safe access to systemCredentials in NodeHeartbeatResponse processing. Contributed by Anubhav Dhoot. (ozawa: rev 3aab354e664a3ce09e0d638bf0c1e7d273d40579) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java Non thread safe access to systemCredentials in NodeHeartbeatResponse processing --- Key: YARN-3082 URL: https://issues.apache.org/jira/browse/YARN-3082 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3082.001.patch, YARN-3082.002.patch When you use system credentials via feature added in YARN-2704, the proto conversion code throws exception in converting ByteBuffer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289346#comment-14289346 ] Hudson commented on YARN-2800: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #83 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/83/]) YARN-2800. Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature. Contributed by Wangda Tan. (ozawa: rev 24aa462673d392fed859f8088acf9679ae62a129) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/MemoryRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NullRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java * hadoop-yarn-project/CHANGES.txt Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature Key: YARN-2800 URL: https://issues.apache.org/jira/browse/YARN-2800 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, YARN-2800-20150122-1.patch In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad use experience. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot get started if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, any operations trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing
[ https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289187#comment-14289187 ] Hudson commented on YARN-3082: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2014 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2014/]) YARN-3082. Non thread safe access to systemCredentials in NodeHeartbeatResponse processing. Contributed by Anubhav Dhoot. (ozawa: rev 3aab354e664a3ce09e0d638bf0c1e7d273d40579) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/CHANGES.txt Non thread safe access to systemCredentials in NodeHeartbeatResponse processing --- Key: YARN-3082 URL: https://issues.apache.org/jira/browse/YARN-3082 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3082.001.patch, YARN-3082.002.patch When you use system credentials via feature added in YARN-2704, the proto conversion code throws exception in converting ByteBuffer -- This message was sent by Atlassian JIRA (v6.3.4#6332)