[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
[ https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006845#comment-14006845 ] Hadoop QA commented on YARN-2088: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646030/YARN-2088.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3794//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3794//console This message is automatically generated. Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder Key: YARN-2088 URL: https://issues.apache.org/jira/browse/YARN-2088 Project: Hadoop YARN Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: YARN-2088.v1.patch Some fields(set,list) are added to proto builders many times, we need to clear those fields before add, otherwise the result proto contains more contents. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006858#comment-14006858 ] Hadoop QA commented on YARN-2030: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645932/YARN-2030.v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3793//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3793//console This message is automatically generated. Use StateMachine to simplify handleStoreEvent() in RMStateStore --- Key: YARN-2030 URL: https://issues.apache.org/jira/browse/YARN-2030 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du Assignee: Binglin Chang Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch Now the logic to handle different store events in handleStoreEvent() is as following: {code} if (event.getType().equals(RMStateStoreEventType.STORE_APP) || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) { ... if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { ... } else { ... } ... try { if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { ... } else { ... } } ... } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT) || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) { ... if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { ... } else { ... } ... if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { ... } else { ... } } ... } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) { ... } else { ... } } {code} This is not only confuse people but also led to mistake easily. We may leverage state machine to simply this even no state transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006871#comment-14006871 ] Hudson commented on YARN-1962: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by Zhiguo Hong. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java Timeline server is enabled by default - Key: YARN-1962 URL: https://issues.apache.org/jira/browse/YARN-1962 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Fix For: 2.4.1 Attachments: YARN-1962.1.patch, YARN-1962.2.patch Since Timeline server is not matured and secured yet, enabling it by default might create some confusion. We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet. Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,. This JIRA is to turn it off by default. If there is an agreement, i can put a simple patch about this. {noformat} 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers
[ https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006870#comment-14006870 ] Hudson commented on YARN-2017: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) YARN-2017. Merged some of the common scheduler code. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596753) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java *
[jira] [Commented] (YARN-2081) TestDistributedShell fails after YARN-1962
[ https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006866#comment-14006866 ] Hudson commented on YARN-2081: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by Zhiguo Hong. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java TestDistributedShell fails after YARN-1962 -- Key: YARN-2081 URL: https://issues.apache.org/jira/browse/YARN-2081 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 3.0.0, 2.4.1 Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Fix For: 2.4.1 Attachments: YARN-2081.patch java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server
[ https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006874#comment-14006874 ] Hudson commented on YARN-1938: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) YARN-1938. Added kerberos login for the Timeline Server. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596710) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java Kerberos authentication for the timeline server --- Key: YARN-1938 URL: https://issues.apache.org/jira/browse/YARN-1938 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.5.0 Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext
[ https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006867#comment-14006867 ] Hudson commented on YARN-2050: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) YARN-2050. Fix LogCLIHelpers to create the correct FileContext. Contributed by Ming Ma (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596310) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java Fix LogCLIHelpers to create the correct FileContext --- Key: YARN-2050 URL: https://issues.apache.org/jira/browse/YARN-2050 Project: Hadoop YARN Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Fix For: 3.0.0, 2.5.0 Attachments: YARN-2050-2.patch, YARN-2050.patch LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus the FileContext created isn't necessarily the FileContext for remote log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations
[ https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006869#comment-14006869 ] Hudson commented on YARN-2089: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) YARN-2089. FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations. (Zhihai Xu via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596765) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations --- Key: YARN-2089 URL: https://issues.apache.org/jira/browse/YARN-2089 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.4.0 Reporter: Anubhav Dhoot Assignee: zhihai xu Labels: newbie Fix For: 2.5.0 Attachments: yarn-2089.patch We should mark QueuePlacementPolicy and QueuePlacementRule with audience annotations @Private @Unstable -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006893#comment-14006893 ] Hudson commented on YARN-1962: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by Zhiguo Hong. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java Timeline server is enabled by default - Key: YARN-1962 URL: https://issues.apache.org/jira/browse/YARN-1962 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Fix For: 2.4.1 Attachments: YARN-1962.1.patch, YARN-1962.2.patch Since Timeline server is not matured and secured yet, enabling it by default might create some confusion. We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet. Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,. This JIRA is to turn it off by default. If there is an agreement, i can put a simple patch about this. {noformat} 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at
[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations
[ https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006891#comment-14006891 ] Hudson commented on YARN-2089: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) YARN-2089. FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations. (Zhihai Xu via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596765) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations --- Key: YARN-2089 URL: https://issues.apache.org/jira/browse/YARN-2089 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.4.0 Reporter: Anubhav Dhoot Assignee: zhihai xu Labels: newbie Fix For: 2.5.0 Attachments: yarn-2089.patch We should mark QueuePlacementPolicy and QueuePlacementRule with audience annotations @Private @Unstable -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2081) TestDistributedShell fails after YARN-1962
[ https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006888#comment-14006888 ] Hudson commented on YARN-2081: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by Zhiguo Hong. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java TestDistributedShell fails after YARN-1962 -- Key: YARN-2081 URL: https://issues.apache.org/jira/browse/YARN-2081 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 3.0.0, 2.4.1 Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Fix For: 2.4.1 Attachments: YARN-2081.patch java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext
[ https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006889#comment-14006889 ] Hudson commented on YARN-2050: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) YARN-2050. Fix LogCLIHelpers to create the correct FileContext. Contributed by Ming Ma (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596310) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java Fix LogCLIHelpers to create the correct FileContext --- Key: YARN-2050 URL: https://issues.apache.org/jira/browse/YARN-2050 Project: Hadoop YARN Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Fix For: 3.0.0, 2.5.0 Attachments: YARN-2050-2.patch, YARN-2050.patch LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus the FileContext created isn't necessarily the FileContext for remote log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server
[ https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006896#comment-14006896 ] Hudson commented on YARN-1938: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) YARN-1938. Added kerberos login for the Timeline Server. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596710) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java Kerberos authentication for the timeline server --- Key: YARN-1938 URL: https://issues.apache.org/jira/browse/YARN-1938 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.5.0 Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers
[ https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006892#comment-14006892 ] Hudson commented on YARN-2017: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) YARN-2017. Merged some of the common scheduler code. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596753) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java *
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006908#comment-14006908 ] Sunil G commented on YARN-1408: --- bq. we may change it to decrement the resource request only when the container is pulled by the AM ? As [~jianhe] mentioned, this can create problem with subsequent NM heartbeats. Also I agree that the container in ALLOCATED state is the best place to do preemption, but this raise condition can come there. CapacityScheduler raises KILL event for RMContainer(for preemption). So a solution may be like recreate resource request back, if the RMContainer state is ALLOCATED/ACQUIRED here. Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins -- Key: YARN-1408 URL: https://issues.apache.org/jira/browse/YARN-1408 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Sunil G Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.patch Capacity preemption is enabled as follows. * yarn.resourcemanager.scheduler.monitor.enable= true , * yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy Queue = a,b Capacity of Queue A = 80% Capacity of Queue B = 20% Step 1: Assign a big jobA on queue a which uses full cluster capacity Step 2: Submitted a jobB to queue b which would use less than 20% of cluster capacity JobA task which uses queue b capcity is been preempted and killed. This caused below problem: 1. New Container has got allocated for jobA in Queue A as per node update from an NM. 2. This container has been preempted immediately as per preemption. Here ACQUIRED at KILLED Invalid State exception came when the next AM heartbeat reached RM. ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ACQUIRED at KILLED This also caused the Task to go for a timeout for 30minutes as this Container was already killed by preemption. attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2096) testQueueMetricsOnRMRestart has race condition
Anubhav Dhoot created YARN-2096: --- Summary: testQueueMetricsOnRMRestart has race condition Key: YARN-2096 URL: https://issues.apache.org/jira/browse/YARN-2096 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart fails randomly because of a race condition. The test validates that metrics are incremented, but does not wait for all transitions to finish before checking for the values. It also resets metrics after kicking off recovery of second RM. The metrics that need to be incremented race with this reset causing test to fail randomly. We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2096) testQueueMetricsOnRMRestart has race condition
[ https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-2096: --- Assignee: Anubhav Dhoot testQueueMetricsOnRMRestart has race condition -- Key: YARN-2096 URL: https://issues.apache.org/jira/browse/YARN-2096 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart fails randomly because of a race condition. The test validates that metrics are incremented, but does not wait for all transitions to finish before checking for the values. It also resets metrics after kicking off recovery of second RM. The metrics that need to be incremented race with this reset causing test to fail randomly. We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2096) testQueueMetricsOnRMRestart has race condition
[ https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2096: Attachment: YARN-2096.patch Fixed 2 race conditions by First one) waiting for appropriate transitions before checking metrics and Second one) resetting metrics before the events are triggered. testQueueMetricsOnRMRestart has race condition -- Key: YARN-2096 URL: https://issues.apache.org/jira/browse/YARN-2096 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2096.patch org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart fails randomly because of a race condition. The test validates that metrics are incremented, but does not wait for all transitions to finish before checking for the values. It also resets metrics after kicking off recovery of second RM. The metrics that need to be incremented race with this reset causing test to fail randomly. We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only
[ https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006938#comment-14006938 ] Zhijie Shen commented on YARN-1937: --- bq. A meta comment - may be this isn't a RESTy way of rejecting requests? The situation here is that we may not deny the whole request, but part of the entities may not be put. Otherwise, we can simply return a HTTP 403. However, in this case we have to do the customized response, don't we? bq. We should also make this a public enum so that users know what system-filters exist bq. Do we really need TimelinePutError.SYSTEM_FILTER_CONFLICT? Similarly injectOwnerInfo. Or is it better to simply ignore the overriding filters? Not sure, thinking aloud. I intentionally don't allow user to set or modify the system filter, preventing them from affecting the system logic. For example, if user1 post the entity by setting ENTITY_OWNER = user2, the posted entity will never be accessible by user1.Therefore the enums don't need to be visible by users. However, in the documententation, we can explicitly tell users what are the reserved filter names by the timeline service. Users shouldn't use it. bq. Agree with Varun about admins. You should simply start respecting YarnConfiguration.YARN_ADMIN_ACL. See ApplicationACLsManager for e.g and reuse AdminACLsManager here itself. Sure. As I already filed a ticket about adding admin acls. How about working on this issue separately? Add entity-level access control of the timeline data for owners only Key: YARN-1937 URL: https://issues.apache.org/jira/browse/YARN-1937 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2059) Extend access control for admin acls
[ https://issues.apache.org/jira/browse/YARN-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2059: -- Summary: Extend access control for admin acls (was: Extend access control for admin and configured user/group list) Extend access control for admin acls Key: YARN-2059 URL: https://issues.apache.org/jira/browse/YARN-2059 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2059) Extend access control for admin acls
[ https://issues.apache.org/jira/browse/YARN-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2059: -- Target Version/s: 2.5.0 Extend access control for admin acls Key: YARN-2059 URL: https://issues.apache.org/jira/browse/YARN-2059 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Tian updated YARN-2083: -- Attachment: YARN-2083.patch Add testcase for this issue In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit --- Key: YARN-2083 URL: https://issues.apache.org/jira/browse/YARN-2083 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.3.0 Reporter: Yi Tian Labels: assignContainer, fair, scheduler Fix For: 2.3.0 Attachments: YARN-2083.patch In fair scheduler, FSParentQueue and FSLeafQueue do an assignContainerPreCheck to guaranty this queue is not over its limit. But the fitsIn function in Resource.java did not return false when the usedResource equals the maxResource. I think we should create a new Function fitsInWithoutEqual instead of fitsIn in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007024#comment-14007024 ] Rohith commented on YARN-1365: -- Hi Anubhav, One comment on the patch. * Notifying to scheduler for APP_ATTEMPT_ADDED is in RMApp lead to InvalidStateTranstion exception for RMAppAttept. Can this handle at RMAppAtteptImpl#AttemptRecoveredTransition?. Since during recovery of RMApp, all attempt are recovered in synchronously , so RMAppAttempt state is moved to LAUNCHED before notifying to scheduler. {noformat} // Let scheduler know about this attempt so it can allow AM to register boolean disableTransferState = false; app.handler.handle(new AppAttemptAddedSchedulerEvent(app.currentAttempt .getAppAttemptId(), disableTransferState)); {noformat} ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1365.001.patch, YARN-1365.002.patch, YARN-1365.003.patch, YARN-1365.initial.patch For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.3.patch I updated patch with below changes. bq. Pending releases - AM forgets about a request to release once its made. We will have to reissue a release request after RM restart FIXED bq. Blacklisting has logic in ignoreBlacklisting to ignore it if we cross a threshold. FIXED bq. There a few places where the line exceeds 80 chars Even I have done format, it is not reducing less than 80char. Ex : Line 209 at RMContainerRequestor LIne 267 at AMRMClientImpl Apart from above fix, other changes done are * AMRMClient ** AMRMClient maitaines blacklisted nodes.This will be sent back to RM resync. ** Added test for checking functionality. * MapReduce ** Added test applying yarn-1365 patch. To run this test, it is required to have patch for yarn-1365 Please review the patch ApplicationMasterService should Resync with the AM upon allocate call after restart --- Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007148#comment-14007148 ] Wangda Tan commented on YARN-1408: -- I think container should be preempt-able when its in allocate state, as race condition mentioned by [~jianhe], can we add a resource request to RMContainer? When allocate/kill within one AM heartbeat happened, we can add the resource request back. Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins -- Key: YARN-1408 URL: https://issues.apache.org/jira/browse/YARN-1408 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Sunil G Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.patch Capacity preemption is enabled as follows. * yarn.resourcemanager.scheduler.monitor.enable= true , * yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy Queue = a,b Capacity of Queue A = 80% Capacity of Queue B = 20% Step 1: Assign a big jobA on queue a which uses full cluster capacity Step 2: Submitted a jobB to queue b which would use less than 20% of cluster capacity JobA task which uses queue b capcity is been preempted and killed. This caused below problem: 1. New Container has got allocated for jobA in Queue A as per node update from an NM. 2. This container has been preempted immediately as per preemption. Here ACQUIRED at KILLED Invalid State exception came when the next AM heartbeat reached RM. ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ACQUIRED at KILLED This also caused the Task to go for a timeout for 30minutes as this Container was already killed by preemption. attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007269#comment-14007269 ] Steve Loughran commented on YARN-2092: -- I think this should be a wontfix, as the underlying problem was trying to get an older version of Jackson onto the classpath. Admittedly, this probably worked on 2.2-2.4, but that is because the code was pushing up the same version of jackson that was there. If Tez wasn't trying to push up any jackson JARs, but instead take what was there, it would work (which is essentially what has been done). Unless/Until we can isolate YARN apps from the classpath other than org.apache.hadoop.*, this problem will arise. Which implies we need OSGI support in YARN Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT Key: YARN-2092 URL: https://issues.apache.org/jira/browse/YARN-2092 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Came across this when trying to integrate with the timeline server. Using a 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user jars are first in the classpath. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007281#comment-14007281 ] Sangjin Lee commented on YARN-2092: --- +1 on exploring the OSGi option. Until/unless it happens, one option might be to consider an expanded use for the mapreduce.job.classloader config? Currently it only works within a MR app (AM and tasks). However, one could argue that the functionality it provides is a generic one in that it creates a somewhat isolated class space. Perhaps we could rename the job classloader to something like an app classloader, and make it available for any place wherever user code needs to run in isolation. Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT Key: YARN-2092 URL: https://issues.apache.org/jira/browse/YARN-2092 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Came across this when trying to integrate with the timeline server. Using a 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user jars are first in the classpath. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007284#comment-14007284 ] Sangjin Lee commented on YARN-2092: --- ... user code or any code that needs to run in isolation. Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT Key: YARN-2092 URL: https://issues.apache.org/jira/browse/YARN-2092 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Came across this when trying to integrate with the timeline server. Using a 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user jars are first in the classpath. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007291#comment-14007291 ] Hitesh Shah commented on YARN-2092: --- Isolating apps is just one aspect. The bigger issue is the provisioning of thinner client-api jars so that the hadoop internals and their dependencies do not need be pulled into the classpath of an app. Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT Key: YARN-2092 URL: https://issues.apache.org/jira/browse/YARN-2092 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Came across this when trying to integrate with the timeline server. Using a 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user jars are first in the classpath. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart
[ https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2096: --- Summary: Race in TestRMRestart#testQueueMetricsOnRMRestart (was: testQueueMetricsOnRMRestart has race condition) Race in TestRMRestart#testQueueMetricsOnRMRestart - Key: YARN-2096 URL: https://issues.apache.org/jira/browse/YARN-2096 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2096.patch org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart fails randomly because of a race condition. The test validates that metrics are incremented, but does not wait for all transitions to finish before checking for the values. It also resets metrics after kicking off recovery of second RM. The metrics that need to be incremented race with this reset causing test to fail randomly. We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-596: - Attachment: YARN-596.patch Update a new patch to fix Sandy's suggestions. In fair scheduler, intra-application container priorities affect inter-application preemption decisions --- Key: YARN-596 URL: https://issues.apache.org/jira/browse/YARN-596 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch In the fair scheduler, containers are chosen for preemption in the following way: All containers for all apps that are in queues that are over their fair share are put in a list. The list is sorted in order of the priority that the container was requested in. This means that an application can shield itself from preemption by requesting it's containers at higher priorities, which doesn't really make sense. Also, an application that is not over its fair share, but that is in a queue that is over it's fair share is just as likely to have containers preempted as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart
[ https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007319#comment-14007319 ] Tsuyoshi OZAWA commented on YARN-2096: -- Thank you for taking this JIRA, Anubhav. I also faced this problem when reviewing YARN-1365. I'll try to run the tests again and again with your patch. Race in TestRMRestart#testQueueMetricsOnRMRestart - Key: YARN-2096 URL: https://issues.apache.org/jira/browse/YARN-2096 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2096.patch org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart fails randomly because of a race condition. The test validates that metrics are incremented, but does not wait for all transitions to finish before checking for the values. It also resets metrics after kicking off recovery of second RM. The metrics that need to be incremented race with this reset causing test to fail randomly. We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart
[ https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007338#comment-14007338 ] Karthik Kambatla commented on YARN-2096: Looks good to me. +1. Race in TestRMRestart#testQueueMetricsOnRMRestart - Key: YARN-2096 URL: https://issues.apache.org/jira/browse/YARN-2096 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2096.patch org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart fails randomly because of a race condition. The test validates that metrics are incremented, but does not wait for all transitions to finish before checking for the values. It also resets metrics after kicking off recovery of second RM. The metrics that need to be incremented race with this reset causing test to fail randomly. We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart
[ https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007338#comment-14007338 ] Karthik Kambatla edited comment on YARN-2096 at 5/23/14 4:41 PM: - Looks good to me. +1 pending Jenkins. was (Author: kkambatl): Looks good to me. +1. Race in TestRMRestart#testQueueMetricsOnRMRestart - Key: YARN-2096 URL: https://issues.apache.org/jira/browse/YARN-2096 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2096.patch org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart fails randomly because of a race condition. The test validates that metrics are incremented, but does not wait for all transitions to finish before checking for the values. It also resets metrics after kicking off recovery of second RM. The metrics that need to be incremented race with this reset causing test to fail randomly. We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007358#comment-14007358 ] Tsuyoshi OZAWA commented on YARN-1365: -- Hi [~rohithsharma], can you clarify us the case InvalidStateTranstion exception is caused? IIUC, the recovery path is as follows: 1. RMAppManager#recoverApplication() is invoked. 2. Handling RMAppEvent(appId, RMAppEventType.RECOVER) and RMAppRecoveredTransition() is invoked. 3. Handling AppAttemptAddedSchedulerEvent() and APP_ATTEMPT_ADDED is handled. I thought this path works well and the test case included in a patch covers it. Please correct me if I'm wrong. Thanks. ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1365.001.patch, YARN-1365.002.patch, YARN-1365.003.patch, YARN-1365.initial.patch For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart
[ https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007363#comment-14007363 ] Tsuyoshi OZAWA commented on YARN-2096: -- The change looks good to me too(non-binding). Race in TestRMRestart#testQueueMetricsOnRMRestart - Key: YARN-2096 URL: https://issues.apache.org/jira/browse/YARN-2096 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2096.patch org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart fails randomly because of a race condition. The test validates that metrics are incremented, but does not wait for all transitions to finish before checking for the values. It also resets metrics after kicking off recovery of second RM. The metrics that need to be incremented race with this reset causing test to fail randomly. We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever
[ https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007385#comment-14007385 ] Vinod Kumar Vavilapalli commented on YARN-2049: --- +1, looks good. Checking this in. Delegation token stuff for the timeline sever - Key: YARN-2049 URL: https://issues.apache.org/jira/browse/YARN-2049 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch, YARN-2049.7.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever
[ https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007397#comment-14007397 ] Varun Vasudev commented on YARN-2049: - +1, patch look good. Delegation token stuff for the timeline sever - Key: YARN-2049 URL: https://issues.apache.org/jira/browse/YARN-2049 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.5.0 Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch, YARN-2049.7.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only
[ https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007414#comment-14007414 ] Varun Vasudev commented on YARN-1937: - +1 on the latest patch. Add entity-level access control of the timeline data for owners only Key: YARN-1937 URL: https://issues.apache.org/jira/browse/YARN-1937 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, YARN-1937.4.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart
[ https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007419#comment-14007419 ] Hadoop QA commented on YARN-2096: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646464/YARN-2096.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3795//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3795//console This message is automatically generated. Race in TestRMRestart#testQueueMetricsOnRMRestart - Key: YARN-2096 URL: https://issues.apache.org/jira/browse/YARN-2096 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2096.patch org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart fails randomly because of a race condition. The test validates that metrics are incremented, but does not wait for all transitions to finish before checking for the values. It also resets metrics after kicking off recovery of second RM. The metrics that need to be incremented race with this reset causing test to fail randomly. We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1936) Secured timeline client
[ https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007426#comment-14007426 ] Vinod Kumar Vavilapalli commented on YARN-1936: --- +1, this looks good. Will check this in if Jenkins says okay. Secured timeline client --- Key: YARN-1936 URL: https://issues.apache.org/jira/browse/YARN-1936 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch TimelineClient should be able to talk to the timeline server with kerberos authentication or delegation token -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1936) Secured timeline client
[ https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007460#comment-14007460 ] Hadoop QA commented on YARN-1936: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646445/YARN-1936.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: org.apache.hadoop.yarn.client.TestRMAdminCLI {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3796//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3796//console This message is automatically generated. Secured timeline client --- Key: YARN-1936 URL: https://issues.apache.org/jira/browse/YARN-1936 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch TimelineClient should be able to talk to the timeline server with kerberos authentication or delegation token -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1936) Secured timeline client
[ https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007472#comment-14007472 ] Zhijie Shen commented on YARN-1936: --- Again, the test failure is not related. Secured timeline client --- Key: YARN-1936 URL: https://issues.apache.org/jira/browse/YARN-1936 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch TimelineClient should be able to talk to the timeline server with kerberos authentication or delegation token -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007485#comment-14007485 ] Ivan Mitic commented on YARN-1063: -- I have already reviewed a version of the initial patch (YARN-1063.patch). Copy pasting the full list of comments for documentation purposes on the Jira. First round: 1. winutils.h: You have a duplicate EnablePrivilege declaration. Please remove the one with BOOL. 2. winutils.h: Convention in the file is to use CamelCased function names. Please name your functions appropriately. Also, a nit, no space between '(' and function args. Same comment across the board. 3. libwinutils.c#EnablePrivilege: {code} { ReportErrorCode(LLookupPrivilegeValue, GetLastError()); CloseHandle(hToken); return GetLastError(); } {code} You shouldn't be calling GetLastError() twice above. CloseHandle() might reset it to 0 or some other value. Can you change all error code paths in this function to first assign GetLastError() to a local variable, and then log it and do other things. E.g. {code} { dwErrorCode = GetLastError(); ReportErrorCode(LLookupPrivilegeValue, dwErrorCode); CloseHandle(hToken); return dwErrorCode; } {code} 3. Something is wrong with this comment {code} // Function: assignLsaString // // Description: // fills in values of LSA_STRING struct to point to a string buffer {code} 4. void assignLsaString( __in LSA_STRING * target, __in const char *strBuf ) Is target an __inout parameter? 5. libwinutils.c: Mixed tabs and spaces. Please use 2 space ident across the board. 6. libwinutils.c: authentication pacakage - typo 7. libwinutils.c: Should the constant be named MICROSOFT_KERBEROS_NAME or there is something else more appropriate? 8. libwinutils.c: GetNameFromLogonToken: You can assert that first GetTokenInformation returns false. Don't do assert(GetTokenInformation() == FALSE) :) 9. I don't believe that calloc() sets the last error. If the return value is NULL, you should assume/error-with ERROR_NOT_ENOUGH_MEMORY. Applies to all places. 10. libwinutils.c: LookupAccountSid: Do you need to allocate userNameSize+1 and domainNameSize+1 buffer sizes, or is this already accounted for? 11. task.c: You are not checking result of GetCurrentDirectory(). If you expect the method to always succeed, you can assert that the result 0. 12. task.c: Please keep CreateProcessAsUser and old CreateProcess codepaths separate. I don't think it is trivial to prove that the new code with CreateProcessAsUser has exactly the same sematic as the old code. 13. task.c: I don't understand the need to TerminateJobObject when CreateProcessAsUser failed? Why wouldn't the regular return code path exit the process with non zero code? 14. Consider using dwErrorCode in your functions to track status of the win error codes. 15. task.c: createTaskAsUser: Do you need to check if lsaHandle is valid to be sent to unregisterWithLsa. You can initialize to INVALID_HANDLE_VALUE 16. task.c: Task: Can you please rename size to cmdLineSize. argLen to crtArgLen. Btw, is size needed? 17. task.c: Task: Can you please rename ARGC_GROUPID to ARGC_JOBOBJECTNAME or something alike. 18. task.c: TaskUsage: Your command line format does not seem to be valid. You are missing jobobject and pidfile. 19. task.c: {code} cmdLine = argv[ARGC_COMMAND]; if (argc ARGC_COMMAND_ARGS) { crtArg = ARGC_COMMAND; insertHere = buffer; while (crtArg argc) { argLen = wcslen(argv[crtArg]); wcscat(insertHere, argv[crtArg]); insertHere += argLen; size -= argLen; insertHere[0] = L' '; insertHere += 1; size -= 1; insertHere[0] = 0; ++crtArg; } cmdLine = buffer; } {code} Do you mind adding a short comment on what you're doing. 20. bq. add PIDFILE argument to task createAsUser. Pidfile must be created by the task controller just before launching the process. Can you comment on the rationale? 21. bq. accept arbitrary arguments to pass to the process launched. We use the launcher for both container and localizer and that requires variable arguments. How this works today? Is the localizer using something else? 22. We haven't written a whole lot of unittests for winutils so far (Check testwinutils.java). I'll let you make the call on whether this could/should be unittested and if yes, what is appropriate. I will sign off anyhow. Second round: 1. I still see GetLastError() being called twice in EnablePrivilege. Can you please use dwErrorCode instead of the second call. {code} dwErrCode = GetLastError(); ReportErrorCode(LOpenProcessToken, GetLastError()); {code} 2. You missed changing the SAL annoation in the header file {code} void AssignLsaString(__inout LSA_STRING * target, __in const char *strBuf) {code} 3.
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007490#comment-14007490 ] Ivan Mitic commented on YARN-1063: -- Thanks Remus for addressing all comments in the latest patch (YARN-1063.4.patch). Looks good to me, +1 I will give others a couple of days to provide additional feedback and then commit the patch. Winutils needs ability to create task as domain user Key: YARN-1063 URL: https://issues.apache.org/jira/browse/YARN-1063 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Environment: Windows Reporter: Kyle Leckie Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, YARN-1063.patch h1. Summary: Securing a Hadoop cluster requires constructing some form of security boundary around the processes executed in YARN containers. Isolation based on Windows user isolation seems most feasible. This approach is similar to the approach taken by the existing LinuxContainerExecutor. The current patch to winutils.exe adds the ability to create a process as a domain user. h1. Alternative Methods considered: h2. Process rights limited by security token restriction: On Windows access decisions are made by examining the security token of a process. It is possible to spawn a process with a restricted security token. Any of the rights granted by SIDs of the default token may be restricted. It is possible to see this in action by examining the security tone of a sandboxed process launch be a web browser. Typically the launched process will have a fully restricted token and need to access machine resources through a dedicated broker process that enforces a custom security policy. This broker process mechanism would break compatibility with the typical Hadoop container process. The Container process must be able to utilize standard function calls for disk and network IO. I performed some work looking at ways to ACL the local files to the specific launched without granting rights to other processes launched on the same machine but found this to be an overly complex solution. h2. Relying on APP containers: Recent versions of windows have the ability to launch processes within an isolated container. Application containers are supported for execution of WinRT based executables. This method was ruled out due to the lack of official support for standard windows APIs. At some point in the future windows may support functionality similar to BSD jails or Linux containers, at that point support for containers should be added. h1. Create As User Feature Description: h2. Usage: A new sub command was added to the set of task commands. Here is the syntax: winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] Some notes: * The username specified is in the format of user@domain * The machine executing this command must be joined to the domain of the user specified * The domain controller must allow the account executing the command access to the user information. For this join the account to the predefined group labeled Pre-Windows 2000 Compatible Access * The account running the command must have several rights on the local machine. These can be managed manually using secpol.msc: ** Act as part of the operating system - SE_TCB_NAME ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME * The launched process will not have rights to the desktop so will not be able to display any information or create UI. * The launched process will have no network credentials. Any access of network resources that requires domain authentication will fail. h2. Implementation: Winutils performs the following steps: # Enable the required privileges for the current process. # Register as a trusted process with the Local Security Authority (LSA). # Create a new logon for the user passed on the command line. # Load/Create a profile on the local machine for the new logon. # Create a new environment for the new logon. # Launch the new process in a job with the task name specified and using the created logon. # Wait for the JOB to exit. h2. Future work: The following work was scoped out of this check in: * Support for non-domain users or machine that are not domain joined. * Support for privilege isolation by running the task launcher in a high privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1936) Secured timeline client
[ https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007534#comment-14007534 ] Zhijie Shen commented on YARN-1936: --- Also did some validation for the new client patch on my local secured cluster. It still works correctly. Secured timeline client --- Key: YARN-1936 URL: https://issues.apache.org/jira/browse/YARN-1936 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch TimelineClient should be able to talk to the timeline server with kerberos authentication or delegation token -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only
[ https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007555#comment-14007555 ] Hadoop QA commented on YARN-1937: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646469/YARN-1937.4.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3797//console This message is automatically generated. Add entity-level access control of the timeline data for owners only Key: YARN-1937 URL: https://issues.apache.org/jira/browse/YARN-1937 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, YARN-1937.4.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server
[ https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007558#comment-14007558 ] Vinod Kumar Vavilapalli commented on YARN-2070: --- Given that, should we consider dropping this patch completely? DistributedShell publishes unfriendly user information to the timeline server - Key: YARN-2070 URL: https://issues.apache.org/jira/browse/YARN-2070 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Priority: Minor Labels: newbie Attachments: YARN-2070.patch Bellow is the code of using the string of current user object as the user value. {code} entity.addPrimaryFilter(user, UserGroupInformation.getCurrentUser() .toString()); {code} When we use kerberos authentication, it's going to output the full name, such as zjshen/localhost@LOCALHOST (auth.KERBEROS). It is not user friendly for searching by the primary filters. It's better to use shortUserName instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2097) Documentation: health check return status
Allen Wittenauer created YARN-2097: -- Summary: Documentation: health check return status Key: YARN-2097 URL: https://issues.apache.org/jira/browse/YARN-2097 Project: Hadoop YARN Issue Type: Improvement Reporter: Allen Wittenauer We need to document that the output of the health check script is ignored on non-0 exit status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2082) Support for alternative log aggregation mechanism
[ https://issues.apache.org/jira/browse/YARN-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007562#comment-14007562 ] Ming Ma commented on YARN-2082: --- Folks, thanks for the feedbacks and other jiras; quite useful. We have been discussing internally how to mitigate log aggregation's impact on NN for some time. Here are more context and comments. 1. We discussed Vinod's suggestion of post processing before. The issue is that the write RPC hit on NN has already happened. Even more, post processing introduces more hits on NN. 2. Agree that HDFS needs to be more scalable. Some improvements have been done; some are still being worked on. My opinion is we should do both if possible, improve HDFS and improve how applications use HDFS. 3. Reducing the impact on NN in large cluster is our primary motivation, similar to what Jason mentioned in YARN-1440. We use the approach mentioned by [~jira.shegalov], [~ctrezzo] in YARN-221 to mitigate the issue. 4. YARN-1440 also suggested making it pluggable, but it seems the primary motivation is to make it easy for other tools to integrate with yarn logs. If that is the case, we have two requirements to make it make log aggregation pluggable, easy to integrate with other tools and reduce pressure on NN. 5. We discussed writing logs to key-value store before. At that point, we didn't go with that approach as it introduces yarn depend on external component like HBase. Based on recent discussion with [~jira.shegalov] and [~ctrezzo], it sounds like a reasonable approach. a) timeline store has dependency on HBase and b) the size of logs is small and suitable for HBase scenario. 6. Regarding Zhijie's suggestion of using timeline store, that sounds like an interesting idea, if timeline store is highly available. 7. Regarding Steve' comment for the long running job support. It wasn't our primary goal; just want to make sure if we do end up changing log aggregation, the framework needs to support that scenario as well. If there is long running container and we rotate the logs, is there a plan to aggregate them before the container finishes? Support for alternative log aggregation mechanism - Key: YARN-2082 URL: https://issues.apache.org/jira/browse/YARN-2082 Project: Hadoop YARN Issue Type: New Feature Reporter: Ming Ma I will post a more detailed design later. Here is the brief summary and would like to get early feedback. Problem Statement: Current implementation of log aggregation create one HDFS file for each {application, nodemanager }. These files are relative small, in the range of 1-2 MB. In a large cluster with lots of application and many nodemanagers, it ends up creating lots of small files in HDFS. This creates pressure on HDFS NN on the following ways. 1. It increases NN Memory size. It is mitigated by having history server deletes old log files in HDFS. 2. Runtime RPC hit on HDFS. Each log aggregation file introduced several NN RPCs such as create, getAdditionalBlock, complete, rename. When the cluster is busy, such RPC hit has impact on NN performance. In addition, to support non-MR applications on YARN, we might need to support aggregation for long running applications. Design choices: 1. Don't aggregate all the logs, as in YARN-221. 2. Create a dedicated HDFS namespace used only for log aggregation. 3. Write logs to some key-value store like HBase. HBase's RPC hit on NN will be much less. 4. Decentralize the application level log aggregation to NMs. All logs for a given application are aggregated first by a dedicated NM before it is pushed to HDFS. 5. Have NM aggregate logs on a regular basis; each of these log files will have data from different applications and there needs to be some index for quick lookup. Proposal: 1. Make yarn log aggregation pluggable for both read and write path. Note that Hadoop FileSystem provides an abstraction and we could ask alternative log aggregator implement compatable FileSystem, but that seems to an overkill. 2. Provide a log aggregation plugin that write to HBase. The scheme design needs to support efficient read on a per application as well as per application+container basis; in addition, it shouldn't create hotspot in a cluster where certain users might create more jobs than others. For example, we can use hash($user+$applicationId} + containerid as the row key. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2097) Documentation: health check return status
[ https://issues.apache.org/jira/browse/YARN-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2097: --- Component/s: nodemanager Documentation: health check return status - Key: YARN-2097 URL: https://issues.apache.org/jira/browse/YARN-2097 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Allen Wittenauer Labels: newbie We need to document that the output of the health check script is ignored on non-0 exit status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2075) TestRMAdminCLI consistently fail on trunk
[ https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2075: -- Affects Version/s: 2.5.0 3.0.0 TestRMAdminCLI consistently fail on trunk - Key: YARN-2075 URL: https://issues.apache.org/jira/browse/YARN-2075 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-2075.patch {code} Running org.apache.hadoop.yarn.client.TestRMAdminCLI Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.082 sec ERROR! java.lang.UnsupportedOperationException: null at java.util.AbstractList.remove(AbstractList.java:144) at java.util.AbstractList$Itr.remove(AbstractList.java:360) at java.util.AbstractCollection.remove(AbstractCollection.java:252) at org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173) at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180) testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.088 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1937) Add entity-level access control of the timeline data for owners only
[ https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1937: -- Attachment: YARN-1937.5.patch Fix the conflicts Add entity-level access control of the timeline data for owners only Key: YARN-1937 URL: https://issues.apache.org/jira/browse/YARN-1937 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, YARN-1937.4.patch, YARN-1937.5.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007592#comment-14007592 ] Hadoop QA commented on YARN-596: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646537/YARN-596.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3798//console This message is automatically generated. In fair scheduler, intra-application container priorities affect inter-application preemption decisions --- Key: YARN-596 URL: https://issues.apache.org/jira/browse/YARN-596 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch In the fair scheduler, containers are chosen for preemption in the following way: All containers for all apps that are in queues that are over their fair share are put in a list. The list is sorted in order of the priority that the container was requested in. This means that an application can shield itself from preemption by requesting it's containers at higher priorities, which doesn't really make sense. Also, an application that is not over its fair share, but that is in a queue that is over it's fair share is just as likely to have containers preempted as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2090) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase
[ https://issues.apache.org/jira/browse/YARN-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007603#comment-14007603 ] Victor Kim commented on YARN-2090: -- I figured out when\where it happens. If you use kerberized user impersonation when submitting MR job, we actually submit with user (lets say john) via yarn (or whatever kerberos principal, whose user also presented on the box). During the shuffle phase in public void messageReceived(ChannelHandlerContext ctx, MessageEvent evt) ShuffleHandler tries to read\write the map output file as john who ran the MR job. This is not going to work in case of user impersonation, when john is not presented on local box (e.g. he comes from ActiveDirectory), because the file from local FS cannot be read. The fix is to use the JVM process owner instead of the user: System.getProperty(user.name) in two methods: populateHeaders, sendMapOutput {code:title=ShuffleHandler.java|borderStyle=solid} protected void populateHeaders(ListString mapIds, String outputBaseStr, String user, int reduce, HttpRequest request, HttpResponse response, boolean keepAliveParam, MapString, MapOutputInfo mapOutputInfoMap) throws IOException { { // Some code here.. String processOwner = System.getProperty(user.name); MapOutputInfo outputInfo = getMapOutputInfo(base, mapId, reduce, processOwner); // Some code here.. } {code} {code:title=ShuffleHandler.java|borderStyle=solid} protected ChannelFuture sendMapOutput(ChannelHandlerContext ctx, Channel ch, String user, String mapId, int reduce, MapOutputInfo mapOutputInfo) throws IOException { // Some code here.. String processOwner = System.getProperty(user.name); spill = SecureIOUtils.openForRandomRead(spillfile, r, processOwner, null); // Some code here.. } {code} If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase Key: YARN-2090 URL: https://issues.apache.org/jira/browse/YARN-2090 Project: Hadoop YARN Issue Type: Bug Components: applications, nodemanager Affects Versions: 2.4.0 Environment: hadoop: 2.4.0.2.1.2.0 Reporter: Victor Kim Priority: Critical I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos principal. Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one having Kerberos principal on all boxes). Result: job successfully completed. Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. Result: Map tasks are completed SUCCESSfully, Reduce task fails with ShuffleError Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES (see the stack trace below). The use case with user impersonation used to work on earlier versions, without YARN (with JTTT). I found similar issue with Kerberos AUTH involved here: https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as resolved, which is not the case when Kerberos Authentication is enabled. The exception trace from YarnChild JVM: 2014-05-21 12:49:35,687 FATAL [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed with too many fetch failures and insufficient progress! 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323) at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245) at
[jira] [Commented] (YARN-2081) TestDistributedShell fails after YARN-1962
[ https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007605#comment-14007605 ] Hudson commented on YARN-2081: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5608/]) YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by Zhiguo Hong. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java TestDistributedShell fails after YARN-1962 -- Key: YARN-2081 URL: https://issues.apache.org/jira/browse/YARN-2081 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 3.0.0, 2.4.1 Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Fix For: 2.4.1 Attachments: YARN-2081.patch java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations
[ https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007607#comment-14007607 ] Hudson commented on YARN-2089: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5608/]) YARN-2089. FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations. (Zhihai Xu via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596765) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations --- Key: YARN-2089 URL: https://issues.apache.org/jira/browse/YARN-2089 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.4.0 Reporter: Anubhav Dhoot Assignee: zhihai xu Labels: newbie Fix For: 2.5.0 Attachments: yarn-2089.patch We should mark QueuePlacementPolicy and QueuePlacementRule with audience annotations @Private @Unstable -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1936) Secured timeline client
[ https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007612#comment-14007612 ] Hudson commented on YARN-1936: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5608/]) YARN-1936. Added security support for the Timeline Client. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1597153) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TimelineDelegationTokenSecretManagerService.java Secured timeline client --- Key: YARN-1936 URL: https://issues.apache.org/jira/browse/YARN-1936 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.5.0 Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch TimelineClient should be able to talk to the timeline server with kerberos authentication or delegation token -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007609#comment-14007609 ] Hudson commented on YARN-1962: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5608/]) YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by Zhiguo Hong. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java Timeline server is enabled by default - Key: YARN-1962 URL: https://issues.apache.org/jira/browse/YARN-1962 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Fix For: 2.4.1 Attachments: YARN-1962.1.patch, YARN-1962.2.patch Since Timeline server is not matured and secured yet, enabling it by default might create some confusion. We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet. Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,. This JIRA is to turn it off by default. If there is an agreement, i can put a simple patch about this. {noformat} 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at
[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever
[ https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007610#comment-14007610 ] Hudson commented on YARN-2049: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5608/]) YARN-2049. Added delegation-token support for the Timeline Server. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1597130) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineDelegationTokenResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineAuthenticator.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/ClientTimelineSecurityInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineAuthenticationConsts.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenIdentifier.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenOperation.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenSelector.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.SecurityInfo * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenIdentifier * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenRenewer * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TimelineAuthenticationFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TimelineAuthenticationFilterInitializer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TimelineClientAuthenticationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TimelineDelegationTokenSecretManagerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java *
[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers
[ https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007608#comment-14007608 ] Hudson commented on YARN-2017: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5608/]) YARN-2017. Merged some of the common scheduler code. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596753) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java *
[jira] [Updated] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios
[ https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-2026: - Attachment: YARN-2026-v1.txt Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios -- Key: YARN-2026 URL: https://issues.apache.org/jira/browse/YARN-2026 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: scheduler Attachments: YARN-2026-v1.txt While using hierarchical queues in fair scheduler,there are few scenarios where we have seen a leaf queue with least fair share can take majority of the cluster and starve a sibling parent queue which has greater weight/fair share and preemption doesn’t kick in to reclaim resources. The root cause seems to be that fair share of a parent queue is distributed to all its children irrespective of whether its an active or an inactive(no apps running) queue. Preemption based on fair share kicks in only if the usage of a queue is less than 50% of its fair share and if it has demands greater than that. When there are many queues under a parent queue(with high fair share),the child queue’s fair share becomes really low. As a result when only few of these child queues have apps running,they reach their *tiny* fair share quickly and preemption doesn’t happen even if other leaf queues(non-sibling) are hogging the cluster. This can be solved by dividing fair share of parent queue only to active child queues. Here is an example describing the problem and proposed solution: root.lowPriorityQueue is a leaf queue with weight 2 root.HighPriorityQueue is parent queue with weight 8 root.HighPriorityQueue has 10 child leaf queues : root.HighPriorityQueue.childQ(1..10) Above config,results in root.HighPriorityQueue having 80% fair share and each of its ten child queue would have 8% fair share. Preemption would happen only if the child queue is 4% (0.5*8=4). Lets say at the moment no apps are running in any of the root.HighPriorityQueue.childQ(1..10) and few apps are running in root.lowPriorityQueue which is taking up 95% of the cluster. Up till this point,the behavior of FS is correct. Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% of the cluster. It would get only the available 5% in the cluster and preemption wouldn't kick in since its above 4%(half fair share).This is bad considering childQ1 is under a highPriority parent queue which has *80% fair share*. Until root.lowPriorityQueue starts relinquishing containers,we would see the following allocation on the scheduler page: *root.lowPriorityQueue = 95%* *root.HighPriorityQueue.childQ1=5%* This can be solved by distributing a parent’s fair share only to active queues. So in the example above,since childQ1 is the only active queue under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 80%. This would cause preemption to reclaim the 30% needed by childQ1 from root.lowPriorityQueue after fairSharePreemptionTimeout seconds. Also note that similar situation can happen between root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck at 5%,until childQ2 starts relinquishing containers. We would like each of childQ1 and childQ2 to get half of root.HighPriorityQueue fair share ie 40%,which would ensure childQ1 gets upto 40% resource if needed through preemption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server
[ https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007613#comment-14007613 ] Hudson commented on YARN-1938: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5608/]) YARN-1938. Added kerberos login for the Timeline Server. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596710) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java Kerberos authentication for the timeline server --- Key: YARN-1938 URL: https://issues.apache.org/jira/browse/YARN-1938 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.5.0 Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios
[ https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007635#comment-14007635 ] Ashwin Shankar commented on YARN-2026: -- Attached patch addresses this issue by setting fair share of inactive parent and leaf queues(queues which have no running apps) to zero. Patch contains unit tests to illustrate the behavior. I manually tested in pseudo distributed cluster and verified that fair share are distributed to only active queues and found that preemption behavior/fairness is much better. Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios -- Key: YARN-2026 URL: https://issues.apache.org/jira/browse/YARN-2026 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: scheduler Attachments: YARN-2026-v1.txt While using hierarchical queues in fair scheduler,there are few scenarios where we have seen a leaf queue with least fair share can take majority of the cluster and starve a sibling parent queue which has greater weight/fair share and preemption doesn’t kick in to reclaim resources. The root cause seems to be that fair share of a parent queue is distributed to all its children irrespective of whether its an active or an inactive(no apps running) queue. Preemption based on fair share kicks in only if the usage of a queue is less than 50% of its fair share and if it has demands greater than that. When there are many queues under a parent queue(with high fair share),the child queue’s fair share becomes really low. As a result when only few of these child queues have apps running,they reach their *tiny* fair share quickly and preemption doesn’t happen even if other leaf queues(non-sibling) are hogging the cluster. This can be solved by dividing fair share of parent queue only to active child queues. Here is an example describing the problem and proposed solution: root.lowPriorityQueue is a leaf queue with weight 2 root.HighPriorityQueue is parent queue with weight 8 root.HighPriorityQueue has 10 child leaf queues : root.HighPriorityQueue.childQ(1..10) Above config,results in root.HighPriorityQueue having 80% fair share and each of its ten child queue would have 8% fair share. Preemption would happen only if the child queue is 4% (0.5*8=4). Lets say at the moment no apps are running in any of the root.HighPriorityQueue.childQ(1..10) and few apps are running in root.lowPriorityQueue which is taking up 95% of the cluster. Up till this point,the behavior of FS is correct. Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% of the cluster. It would get only the available 5% in the cluster and preemption wouldn't kick in since its above 4%(half fair share).This is bad considering childQ1 is under a highPriority parent queue which has *80% fair share*. Until root.lowPriorityQueue starts relinquishing containers,we would see the following allocation on the scheduler page: *root.lowPriorityQueue = 95%* *root.HighPriorityQueue.childQ1=5%* This can be solved by distributing a parent’s fair share only to active queues. So in the example above,since childQ1 is the only active queue under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 80%. This would cause preemption to reclaim the 30% needed by childQ1 from root.lowPriorityQueue after fairSharePreemptionTimeout seconds. Also note that similar situation can happen between root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck at 5%,until childQ2 starts relinquishing containers. We would like each of childQ1 and childQ2 to get half of root.HighPriorityQueue fair share ie 40%,which would ensure childQ1 gets upto 40% resource if needed through preemption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server
[ https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007640#comment-14007640 ] Zhijie Shen commented on YARN-2070: --- The situation here is that users are free to define their own filters, even if they duplicate the system info. The system info should be invisible to users, such that user won't see, for example, ENTITY_OWNER in the response timeline data. Then, if users want to show the user information, they need to add it somewhere in the timeline entity/event, and this is what distributed shell does. The problem here is that the complete UGI name is used as the user in the distributed shell. Therefore, we will see zjshen/localhost@LOCALHOST (auth.KERBEROS) or zjshen (auth.SIMPLE) in the user field. The authentication details is not much useful, IMO. And it will trouble user when he want to query the timeline entities by filtering by user. DistributedShell publishes unfriendly user information to the timeline server - Key: YARN-2070 URL: https://issues.apache.org/jira/browse/YARN-2070 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Priority: Minor Labels: newbie Attachments: YARN-2070.patch Bellow is the code of using the string of current user object as the user value. {code} entity.addPrimaryFilter(user, UserGroupInformation.getCurrentUser() .toString()); {code} When we use kerberos authentication, it's going to output the full name, such as zjshen/localhost@LOCALHOST (auth.KERBEROS). It is not user friendly for searching by the primary filters. It's better to use shortUserName instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2059) Extend access control for admin acls
[ https://issues.apache.org/jira/browse/YARN-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007670#comment-14007670 ] Hadoop QA commented on YARN-2059: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646475/YARN-2059.1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3804//console This message is automatically generated. Extend access control for admin acls Key: YARN-2059 URL: https://issues.apache.org/jira/browse/YARN-2059 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2059.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster
[ https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007666#comment-14007666 ] Hadoop QA commented on YARN-2073: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646441/yarn-2073-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3799//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3799//console This message is automatically generated. FairScheduler starts preempting resources even with free resources on the cluster - Key: YARN-2073 URL: https://issues.apache.org/jira/browse/YARN-2073 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, yarn-2073-3.patch, yarn-2073-4.patch Preemption should kick in only when the currently available slots don't match the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007669#comment-14007669 ] Hadoop QA commented on YARN-2083: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646479/YARN-2083.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3800//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3800//console This message is automatically generated. In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit --- Key: YARN-2083 URL: https://issues.apache.org/jira/browse/YARN-2083 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.3.0 Reporter: Yi Tian Labels: assignContainer, fair, scheduler Fix For: 2.3.0 Attachments: YARN-2083.patch In fair scheduler, FSParentQueue and FSLeafQueue do an assignContainerPreCheck to guaranty this queue is not over its limit. But the fitsIn function in Resource.java did not return false when the usedResource equals the maxResource. I think we should create a new Function fitsInWithoutEqual instead of fitsIn in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
[ https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007676#comment-14007676 ] Hadoop QA commented on YARN-2088: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646030/YARN-2088.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3802//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3802//console This message is automatically generated. Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder Key: YARN-2088 URL: https://issues.apache.org/jira/browse/YARN-2088 Project: Hadoop YARN Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: YARN-2088.v1.patch Some fields(set,list) are added to proto builders many times, we need to clear those fields before add, otherwise the result proto contains more contents. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1474) Make schedulers services
[ https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007685#comment-14007685 ] Hadoop QA commented on YARN-1474: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646330/YARN-1474.16.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1279 javac compiler warnings (more than the trunk's current 1276 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 17 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3801//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3801//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3801//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3801//console This message is automatically generated. Make schedulers services Key: YARN-1474 URL: https://issues.apache.org/jira/browse/YARN-1474 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.3.0, 2.4.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1474.1.patch, YARN-1474.10.patch, YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch Schedulers currently have a reinitialize but no start and stop. Fitting them into the YARN service model would make things more coherent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only
[ https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007686#comment-14007686 ] Hadoop QA commented on YARN-1937: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646580/YARN-1937.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3803//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3803//console This message is automatically generated. Add entity-level access control of the timeline data for owners only Key: YARN-1937 URL: https://issues.apache.org/jira/browse/YARN-1937 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, YARN-1937.4.patch, YARN-1937.5.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2098) App priority support in Fair Scheduler
Ashwin Shankar created YARN-2098: Summary: App priority support in Fair Scheduler Key: YARN-2098 URL: https://issues.apache.org/jira/browse/YARN-2098 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.5.0 Reporter: Ashwin Shankar Umbrella jira to track tasks needed for supporting app priorities in fair scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007689#comment-14007689 ] Hadoop QA commented on YARN-596: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646537/YARN-596.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3806//console This message is automatically generated. In fair scheduler, intra-application container priorities affect inter-application preemption decisions --- Key: YARN-596 URL: https://issues.apache.org/jira/browse/YARN-596 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch In the fair scheduler, containers are chosen for preemption in the following way: All containers for all apps that are in queues that are over their fair share are put in a list. The list is sorted in order of the priority that the container was requested in. This means that an application can shield itself from preemption by requesting it's containers at higher priorities, which doesn't really make sense. Also, an application that is not over its fair share, but that is in a queue that is over it's fair share is just as likely to have containers preempted as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2098) App priority support in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-2098: - Description: This jira is for supporting app priorities in fair schduler. AppSchedulable hard codes priority of apps to 1,we should change this to get priority from ApplicationSubmissionContext. was: Umbrella jira to track tasks needed for supporting app priorities in fair scheduler. App priority support in Fair Scheduler -- Key: YARN-2098 URL: https://issues.apache.org/jira/browse/YARN-2098 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.5.0 Reporter: Ashwin Shankar This jira is for supporting app priorities in fair schduler. AppSchedulable hard codes priority of apps to 1,we should change this to get priority from ApplicationSubmissionContext. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2098) App priority support in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-2098: - Description: This jira is created for supporting app priorities in fair scheduler. AppSchedulable hard codes priority of apps to 1,we should change this to get priority from ApplicationSubmissionContext. was: This jira is for supporting app priorities in fair scheduler. AppSchedulable hard codes priority of apps to 1,we should change this to get priority from ApplicationSubmissionContext. App priority support in Fair Scheduler -- Key: YARN-2098 URL: https://issues.apache.org/jira/browse/YARN-2098 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.5.0 Reporter: Ashwin Shankar This jira is created for supporting app priorities in fair scheduler. AppSchedulable hard codes priority of apps to 1,we should change this to get priority from ApplicationSubmissionContext. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2098) App priority support in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-2098: - Description: This jira is for supporting app priorities in fair scheduler. AppSchedulable hard codes priority of apps to 1,we should change this to get priority from ApplicationSubmissionContext. was: This jira is for supporting app priorities in fair schduler. AppSchedulable hard codes priority of apps to 1,we should change this to get priority from ApplicationSubmissionContext. App priority support in Fair Scheduler -- Key: YARN-2098 URL: https://issues.apache.org/jira/browse/YARN-2098 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.5.0 Reporter: Ashwin Shankar This jira is for supporting app priorities in fair scheduler. AppSchedulable hard codes priority of apps to 1,we should change this to get priority from ApplicationSubmissionContext. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007698#comment-14007698 ] Jian Fang commented on YARN-796: I like to add a use case to this JIRA. In a cloud environment, hadoop could run in heterogeneous groups of instances.Take Amazon EMR as an example, usually an EMR hadoop cluster runs in master, core, and task groups, where the task group could be spot instances that can go away at any time. As a result, we like to have a tag capability on each node. That is to say, when a node manager starts up, it will load the tags from the configuration file. Then, the resource manager could refine the scheduling results based on the tags. One good example is that we don't want an application master to be assigned to any spot instance in a task group because that instance could be taken away by EC2 at any time. If hadoop resource could support a tag capability, then we could extend the current scheduling algorithm to add constraints to not assign the application master to a task node. We don't really need any admin capability for the tags (but still good to have) since the tags are static and can be specified in a configuration file, for example yarn-site.xml. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2099) Preemption in fair scheduler should consider app priorities
Ashwin Shankar created YARN-2099: Summary: Preemption in fair scheduler should consider app priorities Key: YARN-2099 URL: https://issues.apache.org/jira/browse/YARN-2099 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.5.0 Reporter: Ashwin Shankar Fair scheduler should take app priorities into account while preempting containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007703#comment-14007703 ] Anubhav Dhoot commented on YARN-1366: - Looks good overall, some minor comments below In AMRMClientImpl, populatePendingReleaseRequests could be renamed to removePendingReleaseRequests as its removing them. We can comment why we need blacklistedNodes in addition to blacklistAdditions and removals. In testRMContainerOnResync there is an unused assignment to assigned. Also might be a good idea to rename the test to indicate what is the condition and the expected result, say testRMContainerResendsRequestsOnRestart? Also it will be to good to test the pendingRelease in TestRMContainerAllocator, maybe add ApplicationMasterService should Resync with the AM upon allocate call after restart --- Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Rohith Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2012) Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute
[ https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-2012: - Attachment: YARN-2012-v3.txt Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute - Key: YARN-2012 URL: https://issues.apache.org/jira/browse/YARN-2012 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: scheduler Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt, YARN-2012-v3.txt Currently 'default' rule in queue placement policy,if applied,puts the app in root.default queue. It would be great if we can make 'default' rule optionally point to a different queue as default queue . This default queue can be a leaf queue or it can also be an parent queue if the 'default' rule is nested inside nestedUserQueue rule(YARN-1864). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only
[ https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007715#comment-14007715 ] Vinod Kumar Vavilapalli commented on YARN-1937: --- +1, looks good. Checking this in. Add entity-level access control of the timeline data for owners only Key: YARN-1937 URL: https://issues.apache.org/jira/browse/YARN-1937 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, YARN-1937.4.patch, YARN-1937.5.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster
[ https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007744#comment-14007744 ] Hadoop QA commented on YARN-2073: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646441/yarn-2073-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3805//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3805//console This message is automatically generated. FairScheduler starts preempting resources even with free resources on the cluster - Key: YARN-2073 URL: https://issues.apache.org/jira/browse/YARN-2073 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, yarn-2073-3.patch, yarn-2073-4.patch Preemption should kick in only when the currently available slots don't match the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2097) Documentation: health check return status
[ https://issues.apache.org/jira/browse/YARN-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rekha Joshi updated YARN-2097: -- Attachment: YARN-2097.1.patch Attached patch doc as per NodeHealthMonitorExecutor.reportStatus(). Thanks Documentation: health check return status - Key: YARN-2097 URL: https://issues.apache.org/jira/browse/YARN-2097 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.4.0 Reporter: Allen Wittenauer Labels: newbie Attachments: YARN-2097.1.patch We need to document that the output of the health check script is ignored on non-0 exit status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only
[ https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007759#comment-14007759 ] Hadoop QA commented on YARN-1937: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646580/YARN-1937.5.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3810//console This message is automatically generated. Add entity-level access control of the timeline data for owners only Key: YARN-1937 URL: https://issues.apache.org/jira/browse/YARN-1937 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.5.0 Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, YARN-1937.4.patch, YARN-1937.5.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1427) yarn-env.cmd should have the analog comments that are in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rekha Joshi updated YARN-1427: -- Attachment: YARN-1427.1.patch Attached patch.Thanks yarn-env.cmd should have the analog comments that are in yarn-env.sh Key: YARN-1427 URL: https://issues.apache.org/jira/browse/YARN-1427 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Zhijie Shen Labels: newbie, windows Attachments: YARN-1427.1.patch There're the paragraphs of about RM/NM env vars (probably AHS as well soon) in yarn-env.sh. Should the windows version script provide the similar comments? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007757#comment-14007757 ] Hadoop QA commented on YARN-2083: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646479/YARN-2083.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3807//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3807//console This message is automatically generated. In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit --- Key: YARN-2083 URL: https://issues.apache.org/jira/browse/YARN-2083 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.3.0 Reporter: Yi Tian Labels: assignContainer, fair, scheduler Fix For: 2.3.0 Attachments: YARN-2083.patch In fair scheduler, FSParentQueue and FSLeafQueue do an assignContainerPreCheck to guaranty this queue is not over its limit. But the fitsIn function in Resource.java did not return false when the usedResource equals the maxResource. I think we should create a new Function fitsInWithoutEqual instead of fitsIn in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007768#comment-14007768 ] Anubhav Dhoot commented on YARN-1365: - The error is RMAppRecoveredTransition leaves it in LAUNCHED and then scheduler executes ATTEMPT_ADDED. I see Jian fixed it in a certain way in YARN-1368. But that only addresses it if its in LAUNCHED. If the state reaches RUNNING before that we still get the error. The option is see is we pass in a flag to AppAttemptAddedSchedulerEvent that tells scheduler not to issue ATTEMPT_ADDED. This will be set in RMAppRecoveredTransition. Lemme know what you think ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1365.001.patch, YARN-1365.002.patch, YARN-1365.003.patch, YARN-1365.initial.patch For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1216) Deprecate/ mark private few methods from YarnConfiguration
[ https://issues.apache.org/jira/browse/YARN-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007766#comment-14007766 ] Rekha Joshi commented on YARN-1216: --- Needs to be closed.Existing o.a.h.yarn.webapp.util.WebAppUtils (2.4.0) covers this.Thanks. Deprecate/ mark private few methods from YarnConfiguration -- Key: YARN-1216 URL: https://issues.apache.org/jira/browse/YARN-1216 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Omkar Vinit Joshi Priority: Minor Labels: newbie Today we have few methods in YarnConfiguration which should ideally be moved to some utility class. [related comment | https://issues.apache.org/jira/browse/YARN-1203?focusedCommentId=13771281page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13771281 ] * getRMWebAppURL * getRMWebAppHostAndPort * getProxyHostAndPort -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-596: - Attachment: YARN-596.patch In fair scheduler, intra-application container priorities affect inter-application preemption decisions --- Key: YARN-596 URL: https://issues.apache.org/jira/browse/YARN-596 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch In the fair scheduler, containers are chosen for preemption in the following way: All containers for all apps that are in queues that are over their fair share are put in a list. The list is sorted in order of the priority that the container was requested in. This means that an application can shield itself from preemption by requesting it's containers at higher priorities, which doesn't really make sense. Also, an application that is not over its fair share, but that is in a queue that is over it's fair share is just as likely to have containers preempted as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1474) Make schedulers services
[ https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007778#comment-14007778 ] Hadoop QA commented on YARN-1474: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646330/YARN-1474.16.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1279 javac compiler warnings (more than the trunk's current 1276 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 17 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3808//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3808//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3808//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3808//console This message is automatically generated. Make schedulers services Key: YARN-1474 URL: https://issues.apache.org/jira/browse/YARN-1474 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.3.0, 2.4.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1474.1.patch, YARN-1474.10.patch, YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch Schedulers currently have a reinitialize but no start and stop. Fitting them into the YARN service model would make things more coherent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2095) Large MapReduce Job stops responding
[ https://issues.apache.org/jira/browse/YARN-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007779#comment-14007779 ] Clay McDonald commented on YARN-2095: - Vinod, could you read Eric's email below. Would you agree that there should be a log entry from Yarn in this case? Clay McDonald Cell: 202.560.4101 Direct: 202.747.5962 From: Eric Mizell [mailto:emiz...@hortonworks.com] Sent: Friday, May 23, 2014 4:18 PM To: Clay McDonald Subject: Re: [jira] [Created] (YARN-2095) Large MapReduce Job stops responding Clay, What I noticed is that your reducers were overloaded and were on the brink of running out of memory. The Java heaps were running at 99% and continuously GC’ing while the app was reading from disk. So it was trying it’s best to process the job with limited resources. I agree with you that it would be helpful if the container could put out a log message that there was GC issues to help with debugging. Thanks, Eric Mizell Director Solution Engineering, Hortonworks Mobile: 678-761-7623 Email: emiz...@hortonworks.com Website: http://www.hortonworks.com/ Large MapReduce Job stops responding Key: YARN-2095 URL: https://issues.apache.org/jira/browse/YARN-2095 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Environment: CentOS 6.3 (x86_64) on vmware 10 running HDP-2.0.6 Reporter: Clay McDonald Priority: Blocker Very large jobs (7,455 Mappers and 999 Reducers) hang. Jobs run well but logging to container logs stop after running 33 hours. The job appears to be hung. The status of the job is RUNNING. No error messages found in logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2059) Extend access control for admin acls
[ https://issues.apache.org/jira/browse/YARN-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007781#comment-14007781 ] Hadoop QA commented on YARN-2059: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646475/YARN-2059.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3811//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3811//console This message is automatically generated. Extend access control for admin acls Key: YARN-2059 URL: https://issues.apache.org/jira/browse/YARN-2059 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2059.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2100) Refactor the code of Kerberos + DT authentication
Zhijie Shen created YARN-2100: - Summary: Refactor the code of Kerberos + DT authentication Key: YARN-2100 URL: https://issues.apache.org/jira/browse/YARN-2100 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen The customized Kerberos + DT authentication of the timeline server largely refers to that of Http FS, therefore, there're a portion of duplicate code. We should think about refactor the code if it is necessary. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007786#comment-14007786 ] Sandy Ryza commented on YARN-2083: -- I think it would be better to solve this by making sure the queue cannot actually exceed the maxResources limit. I.e. to also check the constraint *after* picking a container that would be assigned. In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit --- Key: YARN-2083 URL: https://issues.apache.org/jira/browse/YARN-2083 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.3.0 Reporter: Yi Tian Labels: assignContainer, fair, scheduler Fix For: 2.3.0 Attachments: YARN-2083.patch In fair scheduler, FSParentQueue and FSLeafQueue do an assignContainerPreCheck to guaranty this queue is not over its limit. But the fitsIn function in Resource.java did not return false when the usedResource equals the maxResource. I think we should create a new Function fitsInWithoutEqual instead of fitsIn in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios
[ https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007787#comment-14007787 ] Hadoop QA commented on YARN-2026: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646588/YARN-2026-v1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3809//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3809//console This message is automatically generated. Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios -- Key: YARN-2026 URL: https://issues.apache.org/jira/browse/YARN-2026 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: scheduler Attachments: YARN-2026-v1.txt While using hierarchical queues in fair scheduler,there are few scenarios where we have seen a leaf queue with least fair share can take majority of the cluster and starve a sibling parent queue which has greater weight/fair share and preemption doesn’t kick in to reclaim resources. The root cause seems to be that fair share of a parent queue is distributed to all its children irrespective of whether its an active or an inactive(no apps running) queue. Preemption based on fair share kicks in only if the usage of a queue is less than 50% of its fair share and if it has demands greater than that. When there are many queues under a parent queue(with high fair share),the child queue’s fair share becomes really low. As a result when only few of these child queues have apps running,they reach their *tiny* fair share quickly and preemption doesn’t happen even if other leaf queues(non-sibling) are hogging the cluster. This can be solved by dividing fair share of parent queue only to active child queues. Here is an example describing the problem and proposed solution: root.lowPriorityQueue is a leaf queue with weight 2 root.HighPriorityQueue is parent queue with weight 8 root.HighPriorityQueue has 10 child leaf queues : root.HighPriorityQueue.childQ(1..10) Above config,results in root.HighPriorityQueue having 80% fair share and each of its ten child queue would have 8% fair share. Preemption would happen only if the child queue is 4% (0.5*8=4). Lets say at the moment no apps are running in any of the root.HighPriorityQueue.childQ(1..10) and few apps are running in root.lowPriorityQueue which is taking up 95% of the cluster. Up till this point,the behavior of FS is correct. Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% of the cluster. It would get only the available 5% in the cluster and preemption wouldn't kick in since its above 4%(half fair share).This is bad considering childQ1 is under a highPriority parent queue which has *80% fair share*. Until root.lowPriorityQueue starts relinquishing containers,we would see the following allocation on the scheduler page: *root.lowPriorityQueue = 95%* *root.HighPriorityQueue.childQ1=5%* This can be solved by distributing a parent’s fair share only to active queues. So in the example above,since childQ1 is the only active queue under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 80%. This would cause preemption to reclaim the 30% needed by childQ1 from root.lowPriorityQueue after fairSharePreemptionTimeout seconds. Also note that similar situation can happen between root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck at 5%,until childQ2 starts relinquishing
[jira] [Commented] (YARN-2099) Preemption in fair scheduler should consider app priorities
[ https://issues.apache.org/jira/browse/YARN-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007789#comment-14007789 ] Wei Yan commented on YARN-2099: --- Hi, Ashwin. We have discussed a little bit about the priority and preemption in YARN-596. The old preemption implementation is based on container's priority, and YARN-596 starts to consider fairshare. Have one question here, consider all queues deploy fair share policy. By taking app priorities, if we want to preempt a container, which option do we deploy? (1) We collect all running apps from over-fairshare queues, and select the one with the lowest priority as the candidate, or (2) We firstly choose the queue which is most over its fair share, then the select the app with the lowest priority inside that queue as the candidate. Preemption in fair scheduler should consider app priorities --- Key: YARN-2099 URL: https://issues.apache.org/jira/browse/YARN-2099 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.5.0 Reporter: Ashwin Shankar Fair scheduler should take app priorities into account while preempting containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2101) Document the system filters of the timeline entity
Zhijie Shen created YARN-2101: - Summary: Document the system filters of the timeline entity Key: YARN-2101 URL: https://issues.apache.org/jira/browse/YARN-2101 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen In Yarn-1937, to support ACLs, we have reserved a filter name for the timeline server to use, which should not be used by the users. We need to document the system filter explicitly to notify users not using it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2012) Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute
[ https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007797#comment-14007797 ] Sandy Ryza commented on YARN-2012: -- +1 Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute - Key: YARN-2012 URL: https://issues.apache.org/jira/browse/YARN-2012 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: scheduler Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt, YARN-2012-v3.txt Currently 'default' rule in queue placement policy,if applied,puts the app in root.default queue. It would be great if we can make 'default' rule optionally point to a different queue as default queue . This default queue can be a leaf queue or it can also be an parent queue if the 'default' rule is nested inside nestedUserQueue rule(YARN-1864). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only
[ https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007800#comment-14007800 ] Zhijie Shen commented on YARN-1937: --- Tested the timeline security stack so far end-to-end on my local cluster. It seems to work fine, authentication works as expected, and only owner can view his posted timeline data. Add entity-level access control of the timeline data for owners only Key: YARN-1937 URL: https://issues.apache.org/jira/browse/YARN-1937 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.5.0 Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, YARN-1937.4.patch, YARN-1937.5.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2012) Fair Scheduler: allow default queue placement rule to take an arbitrary queue
[ https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2012: - Summary: Fair Scheduler: allow default queue placement rule to take an arbitrary queue (was: Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute) Fair Scheduler: allow default queue placement rule to take an arbitrary queue - Key: YARN-2012 URL: https://issues.apache.org/jira/browse/YARN-2012 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: scheduler Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt, YARN-2012-v3.txt Currently 'default' rule in queue placement policy,if applied,puts the app in root.default queue. It would be great if we can make 'default' rule optionally point to a different queue as default queue . This default queue can be a leaf queue or it can also be an parent queue if the 'default' rule is nested inside nestedUserQueue rule(YARN-1864). -- This message was sent by Atlassian JIRA (v6.2#6252)