[jira] [Commented] (YARN-2991) TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk
[ https://issues.apache.org/jira/browse/YARN-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259942#comment-14259942 ] Rohith commented on YARN-2991: -- I am able to reproduce this in eclipse randomly, looking into root reason. TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk -- Key: YARN-2991 URL: https://issues.apache.org/jira/browse/YARN-2991 Project: Hadoop YARN Issue Type: Test Reporter: Zhijie Shen Assignee: Rohith Priority: Blocker {code} Error Message test timed out after 6 milliseconds Stacktrace java.lang.Exception: test timed out after 6 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1281) at java.lang.Thread.join(Thread.java:1355) at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873) {code} It happened twice this months: https://builds.apache.org/job/PreCommit-YARN-Build/6096/ https://builds.apache.org/job/PreCommit-YARN-Build/6182/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2991) TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk
[ https://issues.apache.org/jira/browse/YARN-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2991: - Attachment: 0001-YARN-2991.patch TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk -- Key: YARN-2991 URL: https://issues.apache.org/jira/browse/YARN-2991 Project: Hadoop YARN Issue Type: Test Reporter: Zhijie Shen Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2991.patch {code} Error Message test timed out after 6 milliseconds Stacktrace java.lang.Exception: test timed out after 6 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1281) at java.lang.Thread.join(Thread.java:1355) at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873) {code} It happened twice this months: https://builds.apache.org/job/PreCommit-YARN-Build/6096/ https://builds.apache.org/job/PreCommit-YARN-Build/6182/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2991) TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk
[ https://issues.apache.org/jira/browse/YARN-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260035#comment-14260035 ] Rohith commented on YARN-2991: -- In serviceStop() , eventHandlingThread is interrupted and join for thread to complete. In test case, DrainDispatcher used which create its own thread. But real issue for randomness is when thread.Interupt is called, it is not madatory that thread will get interrupt unless thread is blocked. So there should be mechanism to exit thread by setting boolean flag in while loop. Updated the patch for handling this. I run the test many times, it is able to run without getting hang. Kindly review the patch TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk -- Key: YARN-2991 URL: https://issues.apache.org/jira/browse/YARN-2991 Project: Hadoop YARN Issue Type: Test Reporter: Zhijie Shen Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2991.patch {code} Error Message test timed out after 6 milliseconds Stacktrace java.lang.Exception: test timed out after 6 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1281) at java.lang.Thread.join(Thread.java:1355) at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873) {code} It happened twice this months: https://builds.apache.org/job/PreCommit-YARN-Build/6096/ https://builds.apache.org/job/PreCommit-YARN-Build/6182/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2991) TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk
[ https://issues.apache.org/jira/browse/YARN-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2991: - Target Version/s: 2.7.0 TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk -- Key: YARN-2991 URL: https://issues.apache.org/jira/browse/YARN-2991 Project: Hadoop YARN Issue Type: Test Reporter: Zhijie Shen Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2991.patch {code} Error Message test timed out after 6 milliseconds Stacktrace java.lang.Exception: test timed out after 6 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1281) at java.lang.Thread.join(Thread.java:1355) at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873) {code} It happened twice this months: https://builds.apache.org/job/PreCommit-YARN-Build/6096/ https://builds.apache.org/job/PreCommit-YARN-Build/6182/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2991) TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk
[ https://issues.apache.org/jira/browse/YARN-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260179#comment-14260179 ] Hadoop QA commented on YARN-2991: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689320/0001-YARN-2991.patch against trunk revision 1454efe. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6202//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6202//console This message is automatically generated. TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk -- Key: YARN-2991 URL: https://issues.apache.org/jira/browse/YARN-2991 Project: Hadoop YARN Issue Type: Test Reporter: Zhijie Shen Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2991.patch {code} Error Message test timed out after 6 milliseconds Stacktrace java.lang.Exception: test timed out after 6 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1281) at java.lang.Thread.join(Thread.java:1355) at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873) {code} It happened twice this months: https://builds.apache.org/job/PreCommit-YARN-Build/6096/ https://builds.apache.org/job/PreCommit-YARN-Build/6182/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260263#comment-14260263 ] Zhijie Shen commented on YARN-2936: --- bq. Maybe a simply way is to do this: If either write() is called before getProto() or vice versa, the builder object is set twice. Is it better to recover the override setters/getters, and implement them properly as well as getProto? Similar to what we have done in a PBImpl? YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2938) Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice
[ https://issues.apache.org/jira/browse/YARN-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260268#comment-14260268 ] Zhijie Shen commented on YARN-2938: --- +1, will commit the patch Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice -- Key: YARN-2938 URL: https://issues.apache.org/jira/browse/YARN-2938 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: FindBugs Report.html, YARN-2938.001.patch, YARN-2938.002.patch, YARN-2938.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260270#comment-14260270 ] Karthik Kambatla commented on YARN-2797: FIFO and Capacity schedulers share a lot of code. Given that and the long duration of tests, I felt it was okay to not include FIFO. This was the main reason, rest of the tests don't have FIFO configs. In any case, we should move some of these tests to a different profile (or module) so it doesn't take as long to run the unit tests. Once we do that, may be we can just add FIFO to the list? TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase Key: YARN-2797 URL: https://issues.apache.org/jira/browse/YARN-2797 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Attachments: yarn-2797-1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2938) Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice
[ https://issues.apache.org/jira/browse/YARN-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260285#comment-14260285 ] Hudson commented on YARN-2938: -- FAILURE: Integrated in Hadoop-trunk-Commit #6793 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6793/]) YARN-2938. Fixed new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice. Contributed by Varun Saxena. (zjshen: rev 241d3b3a50c6af92f023d8b2c24598f4813f4674) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice -- Key: YARN-2938 URL: https://issues.apache.org/jira/browse/YARN-2938 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: FindBugs Report.html, YARN-2938.001.patch, YARN-2938.002.patch, YARN-2938.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260321#comment-14260321 ] Zhijie Shen commented on YARN-2936: --- Or to keep simple, for {code} builder.setOwner(getOwner().toString()); builder.setRenewer(getRenewer().toString()); builder.setRealUser(getRealUser().toString()); builder.setIssueDate(getIssueDate()); builder.setMaxDate(getMaxDate()); builder.setSequenceNumber(getSequenceNumber()); builder.setMasterKeyId(getMasterKeyId()); {code} Can we do something like {code} if (builder.getOwner() is not equal to getOwner()) { builder.setOwner(getOwner().toString()); } {code} To only set builder when the value is updated. YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260329#comment-14260329 ] Jian He commented on YARN-2936: --- bq. Can we do something like thanks Zhijie ! +1 for this YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2062) Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover
[ https://issues.apache.org/jira/browse/YARN-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260344#comment-14260344 ] Jian He commented on YARN-2062: --- Is this still happening often? given that we clean all the RMNodes in context on fail over. Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover --- Key: YARN-2062 URL: https://issues.apache.org/jira/browse/YARN-2062 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2062-1.patch On busy clusters, we see several {{org.apache.hadoop.yarn.state.InvalidStateTransitonException}} for events invoked against NEW nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2062) Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover
[ https://issues.apache.org/jira/browse/YARN-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260346#comment-14260346 ] Karthik Kambatla commented on YARN-2062: I haven't checked it recently. Did we make the change you mention after I reported? If yes, I ll be happy to close this as Not a problem. Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover --- Key: YARN-2062 URL: https://issues.apache.org/jira/browse/YARN-2062 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2062-1.patch On busy clusters, we see several {{org.apache.hadoop.yarn.state.InvalidStateTransitonException}} for events invoked against NEW nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260347#comment-14260347 ] Karthik Kambatla commented on YARN-2797: By the way, TestRMRestart passes locally. TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase Key: YARN-2797 URL: https://issues.apache.org/jira/browse/YARN-2797 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Attachments: yarn-2797-1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2062) Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover
[ https://issues.apache.org/jira/browse/YARN-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260360#comment-14260360 ] Jian He commented on YARN-2062: --- that's before this reported. Do you still remember which invalid event happened exactly, I'm trying to understand how this happened. Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover --- Key: YARN-2062 URL: https://issues.apache.org/jira/browse/YARN-2062 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2062-1.patch On busy clusters, we see several {{org.apache.hadoop.yarn.state.InvalidStateTransitonException}} for events invoked against NEW nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2936: --- Attachment: YARN-2936.004.patch YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator
[ https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260427#comment-14260427 ] Karthik Kambatla commented on YARN-2716: We kind of need CURATOR-111 for this. Posting a patch for that. Refactor ZKRMStateStore retry code with Apache Curator -- Key: YARN-2716 URL: https://issues.apache.org/jira/browse/YARN-2716 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Robert Kanter Per suggestion by [~kasha] in YARN-2131, it's nice to use curator to simplify the retry logic in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs
[ https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260430#comment-14260430 ] Varun Saxena commented on YARN-2987: [~jianhe] / [~zjshen], kindly review ClientRMService#getQueueInfo doesn't check app ACLs --- Key: YARN-2987 URL: https://issues.apache.org/jira/browse/YARN-2987 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Attachments: YARN-2987.001.patch ClientRMService#getQueueInfo can return a list of applications belonging to the queue, but doesn't actually check if the user has the permission to view the applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260445#comment-14260445 ] Hadoop QA commented on YARN-2936: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689373/YARN-2936.004.patch against trunk revision 241d3b3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6203//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6203//console This message is automatically generated. YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs
[ https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260457#comment-14260457 ] Jian He commented on YARN-2987: --- looks good overall, could you add a test case that a non-authorized user not able to get the application report ? ClientRMService#getQueueInfo doesn't check app ACLs --- Key: YARN-2987 URL: https://issues.apache.org/jira/browse/YARN-2987 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Attachments: YARN-2987.001.patch ClientRMService#getQueueInfo can return a list of applications belonging to the queue, but doesn't actually check if the user has the permission to view the applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260463#comment-14260463 ] Jian He commented on YARN-2936: --- looks good, one nit: {{builder.getOwner().toString()}} already returns String type, so the toString is unnecessary, similar for getRenewer and getUser. YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2943) Add a node-labels page in RM web UI
[ https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2943: - Attachment: YARN-2943.3.patch The patch didn't apply on latest trunk, updated patch. Add a node-labels page in RM web UI --- Key: YARN-2943 URL: https://issues.apache.org/jira/browse/YARN-2943 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch Now we have node labels in the system, but there's no a very convenient to get information like how many active NM(s) assigned to a given label?, how much total resource for a give label?, For a given label, which queues can access it?, etc. It will be better to add a node-labels page in RM web UI, users/admins can have a centralized view to see such information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2936: --- Attachment: YARN-2936.005.patch YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2936: --- Attachment: YARN-2936.005.patch YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2936: --- Attachment: (was: YARN-2936.005.patch) YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260477#comment-14260477 ] Jian He commented on YARN-2936: --- just one more thing, the newly added test is passing without the core change. could you update the test to pass with the core change but fail without the change ? YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260491#comment-14260491 ] Anubhav Dhoot commented on YARN-2881: - Hi [~subru] thanks for your review bq. Are you assuming that parent queue names are unique in FS? I am assuming the names are all fully qualified, both when clients refer to a queue name while managing reservations, and during the implementation of fair scheduler's reservation portion. This is in contrast to the CapcacityScheduler's reservation portion. bq. run() need not be synchronized. I know this is from previous code but it would be good to clean it up since we are refactoring the code. AbstractPlanFollower::plans is modified from multiple places and that seems the only protection for it. bq. getChildReservationQueues() could be implemented by the AbstractSchedulerPlanFollower using Queue::getQueueInfo ? That will only give us QueueInfos for the child queues. Rest of the code deals in Queue (eg getPlanQueue). So I would prefer leaving this as is. bq. I think we can add a getResourceCalculator to YarnScheduler as it makes sense. Then we need not override calculateTargetCapacity() and isPlanResourcesLessThanReservations(). Done. bq. Minor: spurious white lines in imports of CapacitySchedulerPlanFollower FairSchedulerPlanFollower. Done Implement PlanFollower for FairScheduler Key: YARN-2881 URL: https://issues.apache.org/jira/browse/YARN-2881 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2881.001.patch, YARN-2881.prelim.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2881) Implement PlanFollower for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2881: Attachment: YARN-2881.002.patch Addressing [~subru]'s comments and FindBugs Implement PlanFollower for FairScheduler Key: YARN-2881 URL: https://issues.apache.org/jira/browse/YARN-2881 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2881.001.patch, YARN-2881.002.patch, YARN-2881.prelim.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260496#comment-14260496 ] Hadoop QA commented on YARN-2936: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689373/YARN-2936.004.patch against trunk revision 249cc90. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6204//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6204//console This message is automatically generated. YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2943) Add a node-labels page in RM web UI
[ https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260497#comment-14260497 ] Hadoop QA commented on YARN-2943: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689380/YARN-2943.3.patch against trunk revision 249cc90. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6205//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6205//console This message is automatically generated. Add a node-labels page in RM web UI --- Key: YARN-2943 URL: https://issues.apache.org/jira/browse/YARN-2943 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch Now we have node labels in the system, but there's no a very convenient to get information like how many active NM(s) assigned to a given label?, how much total resource for a give label?, For a given label, which queues can access it?, etc. It will be better to add a node-labels page in RM web UI, users/admins can have a centralized view to see such information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260502#comment-14260502 ] Hadoop QA commented on YARN-2936: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689383/YARN-2936.005.patch against trunk revision 249cc90. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6206//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6206//console This message is automatically generated. YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260519#comment-14260519 ] Varun Saxena commented on YARN-2936: eclipse:eclipse failing due to some problem in Jenkins. Below message is coming. /home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build@2/dev-support/test-patch.sh: line 692: /home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build@2/../patchprocess/patchEclipseOutput.txt: No such file or directory YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260525#comment-14260525 ] Varun Saxena commented on YARN-2936: eclipse:eclipse passes in my local build YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2748) Upload logs in the sub-folders under the local log dir when aggregating logs
[ https://issues.apache.org/jira/browse/YARN-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2748: --- Attachment: YARN-2748.002.patch Upload logs in the sub-folders under the local log dir when aggregating logs Key: YARN-2748 URL: https://issues.apache.org/jira/browse/YARN-2748 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation Affects Versions: 2.6.0 Reporter: Zhijie Shen Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2748.001.patch, YARN-2748.002.patch YARN-2734 has a temporal fix to skip sub folders to avoid exception. Ideally, if the app is creating a sub folder and putting its rolling logs there, we need to upload these logs as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2748) Upload logs in the sub-folders under the local log dir when aggregating logs
[ https://issues.apache.org/jira/browse/YARN-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260555#comment-14260555 ] Varun Saxena commented on YARN-2748: bq. To differentiate log files which may have same file names(due to subfolders), I think we can write file path relative to container log directory instead. Your views on this. bq. Given Log Root Dir/sub-dir1/sub-dir2/.../.log, we can use the relative path sub-dir1/sub-dir2/.../.log to uniquely identify a log. [~zjshen], latest patch uses relative path to identify log in aggregated log file. Kindly review. Upload logs in the sub-folders under the local log dir when aggregating logs Key: YARN-2748 URL: https://issues.apache.org/jira/browse/YARN-2748 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation Affects Versions: 2.6.0 Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2748.001.patch, YARN-2748.002.patch YARN-2734 has a temporal fix to skip sub folders to avoid exception. Ideally, if the app is creating a sub folder and putting its rolling logs there, we need to upload these logs as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260568#comment-14260568 ] Jian He commented on YARN-2936: --- bq. the newly added test is passing without the core change. could you update the test to pass with the core change but fail without the change ? thanks for updating. could you see if my last comment make sense ? thanks YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260574#comment-14260574 ] Hadoop QA commented on YARN-2881: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689386/YARN-2881.002.patch against trunk revision 249cc90. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6207//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6207//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6207//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6207//console This message is automatically generated. Implement PlanFollower for FairScheduler Key: YARN-2881 URL: https://issues.apache.org/jira/browse/YARN-2881 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2881.001.patch, YARN-2881.002.patch, YARN-2881.prelim.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs
[ https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2987: --- Attachment: YARN-2987.002.patch ClientRMService#getQueueInfo doesn't check app ACLs --- Key: YARN-2987 URL: https://issues.apache.org/jira/browse/YARN-2987 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Attachments: YARN-2987.001.patch, YARN-2987.002.patch ClientRMService#getQueueInfo can return a list of applications belonging to the queue, but doesn't actually check if the user has the permission to view the applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs
[ https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260576#comment-14260576 ] Varun Saxena commented on YARN-2987: bq. could you add a test case that a non-authorized user not able to get the application report ? [~jianhe], added the case. Kindly review. ClientRMService#getQueueInfo doesn't check app ACLs --- Key: YARN-2987 URL: https://issues.apache.org/jira/browse/YARN-2987 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Attachments: YARN-2987.001.patch, YARN-2987.002.patch ClientRMService#getQueueInfo can return a list of applications belonging to the queue, but doesn't actually check if the user has the permission to view the applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260594#comment-14260594 ] Wangda Tan commented on YARN-2933: -- Hi [~mayank_bansal], Overall method looks good to me, thanks for update. Some comments about implementation details: 1) You can use {{clusterResource = rmContext.getNodeLabelManager().getResourceByLabel(RMNodeLabelsManager.NO_LABEL, clusterResource);}} instead of get {{clusterResource = clusterResource - all-labeld-resource}}. 2) {{lm.getNodeLabels();}} will copy the node to labels map, so it will be expensive when decide to preempt every container. I suggest we can get a node-to-labels map *at the beginning of {{editSchedule}}*, this will presume node-to-labels is not changed during the preemption policy execution. But I think it will be reasonable since we presume queue-resource is not changed dring preemption policy execution as well. In addition, {{isLabeledContainer}} can leverage the map instead of loop every entry. Regarding test, I think this test covers one case, which is _do no preempt containers from NMs with label_. Another case I think need cover is verify ideal_allocation changed according to this patch. An example is: {code} cluster.no_label.resource = 100 cluster.label-x.resource = 100 root.A.capacity = 40 root.A.label-x.capacity = 50 root.A.no_label.used = 40 root.A.label-x.used = 50 root.B.capacity = 40 root.B.label-x.capacity = 50 root.B.no_label.used = 50 root.B.label-x.used = 0 root.C.capacity = 20 root.C.pending = 10 root.C.used = 10 root.C should preempt 10 from B instead of from A. Even if A's total used resource = 90, but A's no-label used resource still because guaranteed no-label resource. {code} Does this make sense to you? Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.20.patch maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2217: --- Attachment: YARN-2217-trunk-v5.patch [~kasha] V5 attached. 1. Removed isSCMAvailable logic (moving it to MR layer). 2. Surface exceptions through the api. Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2943) Add a node-labels page in RM web UI
[ https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260598#comment-14260598 ] Jian He commented on YARN-2943: --- looks good overall , few minor comments: - getNActiveNMs - getNumActiveNMs - Label class and RMNodeLabelInfo class can be consolidated into one - Probably add a common method like addNode in Label class to update numActiveNMs and resource altogether. Add a node-labels page in RM web UI --- Key: YARN-2943 URL: https://issues.apache.org/jira/browse/YARN-2943 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch Now we have node labels in the system, but there's no a very convenient to get information like how many active NM(s) assigned to a given label?, how much total resource for a give label?, For a given label, which queues can access it?, etc. It will be better to add a node-labels page in RM web UI, users/admins can have a centralized view to see such information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs
[ https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260602#comment-14260602 ] Jian He commented on YARN-2987: --- looks good, +1 ClientRMService#getQueueInfo doesn't check app ACLs --- Key: YARN-2987 URL: https://issues.apache.org/jira/browse/YARN-2987 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Attachments: YARN-2987.001.patch, YARN-2987.002.patch ClientRMService#getQueueInfo can return a list of applications belonging to the queue, but doesn't actually check if the user has the permission to view the applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2943) Add a node-labels page in RM web UI
[ https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2943: - Attachment: YARN-2943.4.patch Thanks comments from [~jianhe], all addressed in the new patch, please kindly review. Add a node-labels page in RM web UI --- Key: YARN-2943 URL: https://issues.apache.org/jira/browse/YARN-2943 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch, YARN-2943.4.patch Now we have node labels in the system, but there's no a very convenient to get information like how many active NM(s) assigned to a given label?, how much total resource for a give label?, For a given label, which queues can access it?, etc. It will be better to add a node-labels page in RM web UI, users/admins can have a centralized view to see such information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2943) Add a node-labels page in RM web UI
[ https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2943: - Attachment: YARN-2943.5.patch Added missing apache license to new file. Add a node-labels page in RM web UI --- Key: YARN-2943 URL: https://issues.apache.org/jira/browse/YARN-2943 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch, YARN-2943.4.patch, YARN-2943.5.patch Now we have node labels in the system, but there's no a very convenient to get information like how many active NM(s) assigned to a given label?, how much total resource for a give label?, For a given label, which queues can access it?, etc. It will be better to add a node-labels page in RM web UI, users/admins can have a centralized view to see such information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs
[ https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260662#comment-14260662 ] Hadoop QA commented on YARN-2987: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689406/YARN-2987.002.patch against trunk revision 249cc90. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6209//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6209//console This message is automatically generated. ClientRMService#getQueueInfo doesn't check app ACLs --- Key: YARN-2987 URL: https://issues.apache.org/jira/browse/YARN-2987 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Attachments: YARN-2987.001.patch, YARN-2987.002.patch ClientRMService#getQueueInfo can return a list of applications belonging to the queue, but doesn't actually check if the user has the permission to view the applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260665#comment-14260665 ] Hadoop QA commented on YARN-2637: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689408/YARN-2637.20.patch against trunk revision 249cc90. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/6211//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6211//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6211//console This message is automatically generated. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260666#comment-14260666 ] Hadoop QA commented on YARN-2217: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689409/YARN-2217-trunk-v5.patch against trunk revision 249cc90. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 10 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/6210//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6210//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6210//console This message is automatically generated. Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2943) Add a node-labels page in RM web UI
[ https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260701#comment-14260701 ] Hadoop QA commented on YARN-2943: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689419/YARN-2943.5.patch against trunk revision 249cc90. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6212//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6212//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6212//console This message is automatically generated. Add a node-labels page in RM web UI --- Key: YARN-2943 URL: https://issues.apache.org/jira/browse/YARN-2943 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch, YARN-2943.4.patch, YARN-2943.5.patch Now we have node labels in the system, but there's no a very convenient to get information like how many active NM(s) assigned to a given label?, how much total resource for a give label?, For a given label, which queues can access it?, etc. It will be better to add a node-labels page in RM web UI, users/admins can have a centralized view to see such information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2477) DockerContainerExecutor must support secure mode
[ https://issues.apache.org/jira/browse/YARN-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260745#comment-14260745 ] Eron Wright commented on YARN-2477: A key question here is whether it is necessary for the container to be capable of Kerberos authentication. Considering how tasks primarily use delegation tokens rather than Kerberos auth, the ability might not be important.A valid scenario might be appmasters with Kerberized endpoints. By running in a container, the application loses access to two relevant files on the host filesystem: a) the /etc/krb5.conf file, and b) the installed JCE policy files (which Abin alludes to). Those files may vary by environment and are typically managed by Ambari/Cloudera Manager. On a), one solution is for the DockerContainerExecutor to share /etc/krb5.conf into the container.On b), I think it acceptable to defer the JCE issue and assume that the image will contain the needed policy. I believe that the steps to install a JCE policy vary by Linux distribution (some use 'alternatives'). DockerContainerExecutor must support secure mode Key: YARN-2477 URL: https://issues.apache.org/jira/browse/YARN-2477 Project: Hadoop YARN Issue Type: New Feature Reporter: Abin Shahab Labels: security DockerContainerExecutor(patch in YARN-1964) does not support Kerberized hadoop clusters yet, as Kerberized hadoop cluster has a strict dependency on the LinuxContainerExecutor. For Docker containers to be used in production environment, they must support secure hadoop. Issues regarding Java's AES encryption library in a containerized environment also need to be worked out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
Yi Liu created YARN-2996: Summary: Refine some fs operations in FileSystemRMStateStore to improve performance Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2996: - Attachment: YARN-2996.001.patch Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2997) NM keeps sending finished containers to RM until app is finished
Chengbing Liu created YARN-2997: --- Summary: NM keeps sending finished containers to RM until app is finished Key: YARN-2997 URL: https://issues.apache.org/jira/browse/YARN-2997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {quote}getRMContainer{quote} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2943) Add a node-labels page in RM web UI
[ https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2943: - Attachment: YARN-2943.6.patch Addressed findbugs warning. Failed test seems not related to this patch. Add a node-labels page in RM web UI --- Key: YARN-2943 URL: https://issues.apache.org/jira/browse/YARN-2943 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, YARN-2943.1.patch, YARN-2943.2.patch, YARN-2943.3.patch, YARN-2943.4.patch, YARN-2943.5.patch, YARN-2943.6.patch Now we have node labels in the system, but there's no a very convenient to get information like how many active NM(s) assigned to a given label?, how much total resource for a give label?, For a given label, which queues can access it?, etc. It will be better to add a node-labels page in RM web UI, users/admins can have a centralized view to see such information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2997: Description: We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {getRMContainer} returns null. was: We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {quote}getRMContainer{quote} returns null. NM keeps sending finished containers to RM until app is finished Key: YARN-2997 URL: https://issues.apache.org/jira/browse/YARN-2997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {getRMContainer} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2997: Description: We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null. was: We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {getRMContainer} returns null. NM keeps sending finished containers to RM until app is finished Key: YARN-2997 URL: https://issues.apache.org/jira/browse/YARN-2997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2997: Attachment: YARN-2997.patch Report to RM only once by not calling {{containerStatuses.add(containerStatus);}} from the second time on. Tested on a real cluster and it works well. NM keeps sending finished containers to RM until app is finished Key: YARN-2997 URL: https://issues.apache.org/jira/browse/YARN-2997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Attachments: YARN-2997.patch We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260882#comment-14260882 ] Hadoop QA commented on YARN-2997: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689447/YARN-2997.patch against trunk revision 249cc90. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6215//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6215//console This message is automatically generated. NM keeps sending finished containers to RM until app is finished Key: YARN-2997 URL: https://issues.apache.org/jira/browse/YARN-2997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Attachments: YARN-2997.patch We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260891#comment-14260891 ] Hadoop QA commented on YARN-2996: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689441/YARN-2996.001.patch against trunk revision 249cc90. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6213//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6213//console This message is automatically generated. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)