[jira] [Commented] (YARN-6492) Generate queue metrics for each partition
[ https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034902#comment-17034902 ] Aihua Xu commented on YARN-6492: [~maniraj...@gmail.com] Do you have update on this jira? > Generate queue metrics for each partition > - > > Key: YARN-6492 > URL: https://issues.apache.org/jira/browse/YARN-6492 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Jonathan Hung >Assignee: Manikandan R >Priority: Major > Attachments: PartitionQueueMetrics_default_partition.txt, > PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, > YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, > YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, > YARN-6492.007.WIP.patch, partition_metrics.txt > > > We are interested in having queue metrics for all partitions. Right now each > queue has one QueueMetrics object which captures metrics either in default > partition or across all partitions. (After YARN-6467 it will be in default > partition) > But having the partition metrics would be very useful. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10126) Use threadPool to handle async scheduling threads
[ https://issues.apache.org/jira/browse/YARN-10126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-10126: Parent: YARN-5139 Issue Type: Sub-task (was: Improvement) > Use threadPool to handle async scheduling threads > - > > Key: YARN-10126 > URL: https://issues.apache.org/jira/browse/YARN-10126 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.9.1 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > Currently, async scheduling launches individual threads to handle scheduling > requests. If there is any issues in such threads, the threads exit and no new > threads get relaunched. Then eventually all the threads die and won't handle > any new job scheduling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10126) Use threadPool to handle async scheduling threads
Aihua Xu created YARN-10126: --- Summary: Use threadPool to handle async scheduling threads Key: YARN-10126 URL: https://issues.apache.org/jira/browse/YARN-10126 Project: Hadoop YARN Issue Type: Improvement Components: capacity scheduler Affects Versions: 2.9.1 Reporter: Aihua Xu Assignee: Aihua Xu Currently, async scheduling launches individual threads to handle scheduling requests. If there is any issues in such threads, the threads exit and no new threads get relaunched. Then eventually all the threads die and won't handle any new job scheduling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7649) RMContainer state transition exception after container update
[ https://issues.apache.org/jira/browse/YARN-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031994#comment-17031994 ] Aihua Xu commented on YARN-7649: [~asuresh] Any update on this task? > RMContainer state transition exception after container update > - > > Key: YARN-7649 > URL: https://issues.apache.org/jira/browse/YARN-7649 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0 >Reporter: Weiwei Yang >Assignee: Arun Suresh >Priority: Major > > I've been seen this in a cluster deployment as well as in UT, run > {{TestAMRMClient#testAMRMClientWithContainerPromotion}} could reproduce this, > it doesn't fail the test case but following error message is shown up in the > log > {noformat} > 2017-12-13 19:41:31,817 ERROR rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(480)) - Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > RELEASED at ALLOCATED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:478) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:65) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:675) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1586) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:155) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:748) > 2017-12-13 19:41:31,817 ERROR rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(481)) - Invalid event RELEASED on container > container_1513165290804_0001_01_03 > {noformat} > this seems to be related to YARN-6251. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10015) Correct the sample command in SLS README file
[ https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024679#comment-17024679 ] Aihua Xu commented on YARN-10015: - [~yufeigu] Can you help review and commit the patch? It's simple doc change. Thanks. > Correct the sample command in SLS README file > - > > Key: YARN-10015 > URL: https://issues.apache.org/jira/browse/YARN-10015 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: YARN-10015.patch > > > The sample command in SLS README {{bin/slsrun.sh > —-input-rumen=sample-data/2jobs2min-rumen-jh.json > —-output-dir=sample-output}} contains a dash from different encoding. The > command will give the following exception. > ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10016) NPE is thrown when accessing SLS web portal
[ https://issues.apache.org/jira/browse/YARN-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-10016: Parent: YARN-5065 Issue Type: Sub-task (was: Bug) > NPE is thrown when accessing SLS web portal > --- > > Key: YARN-10016 > URL: https://issues.apache.org/jira/browse/YARN-10016 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > The following NPE is thrown when running SLS and accessing > http://$HOST:10001/simulate > {noformat} > java.lang.NullPointerException > at > org.eclipse.jetty.server.ResourceService.doGet(ResourceService.java:235) > at > org.eclipse.jetty.server.handler.ResourceHandler.handle(ResourceHandler.java:256) > at org.apache.hadoop.yarn.sls.web.SLSWebApp$1.handle(SLSWebApp.java:159) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) > at org.eclipse.jetty.server.Server.handle(Server.java:494) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) > at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10015) Correct the sample command in SLS README file
[ https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-10015: Parent: YARN-5065 Issue Type: Sub-task (was: Bug) > Correct the sample command in SLS README file > - > > Key: YARN-10015 > URL: https://issues.apache.org/jira/browse/YARN-10015 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: YARN-10015.patch > > > The sample command in SLS README {{bin/slsrun.sh > —-input-rumen=sample-data/2jobs2min-rumen-jh.json > —-output-dir=sample-output}} contains a dash from different encoding. The > command will give the following exception. > ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10016) NPE is thrown when accessing SLS web portal
Aihua Xu created YARN-10016: --- Summary: NPE is thrown when accessing SLS web portal Key: YARN-10016 URL: https://issues.apache.org/jira/browse/YARN-10016 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.3.0 Reporter: Aihua Xu Assignee: Aihua Xu The following NPE is thrown when running SLS and accessing http://$HOST:10001/simulate {noformat} java.lang.NullPointerException at org.eclipse.jetty.server.ResourceService.doGet(ResourceService.java:235) at org.eclipse.jetty.server.handler.ResourceHandler.handle(ResourceHandler.java:256) at org.apache.hadoop.yarn.sls.web.SLSWebApp$1.handle(SLSWebApp.java:159) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:494) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918) at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10015) Correct the sample command in SLS README file
[ https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-10015: Summary: Correct the sample command in SLS README file (was: Correct SLS README sample command) > Correct the sample command in SLS README file > - > > Key: YARN-10015 > URL: https://issues.apache.org/jira/browse/YARN-10015 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: YARN-10015.patch > > > The sample command in SLS README {{bin/slsrun.sh > —-input-rumen=sample-data/2jobs2min-rumen-jh.json > —-output-dir=sample-output}} contains a dash from different encoding. The > command will give the following exception. > ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10015) Correct SLS README sample command
[ https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990007#comment-16990007 ] Aihua Xu commented on YARN-10015: - It's a simple fix. Just replace it with the normal dash. > Correct SLS README sample command > - > > Key: YARN-10015 > URL: https://issues.apache.org/jira/browse/YARN-10015 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: YARN-10015.patch > > > The sample command in SLS README {{bin/slsrun.sh > —-input-rumen=sample-data/2jobs2min-rumen-jh.json > —-output-dir=sample-output}} contains a dash from different encoding. The > command will give the following exception. > ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10015) Correct SLS README sample command
[ https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-10015: Attachment: YARN-10015.patch > Correct SLS README sample command > - > > Key: YARN-10015 > URL: https://issues.apache.org/jira/browse/YARN-10015 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: YARN-10015.patch > > > The sample command in SLS README {{bin/slsrun.sh > —-input-rumen=sample-data/2jobs2min-rumen-jh.json > —-output-dir=sample-output}} contains a dash from different encoding. The > command will give the following exception. > ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10015) Correct SLS README sample command
Aihua Xu created YARN-10015: --- Summary: Correct SLS README sample command Key: YARN-10015 URL: https://issues.apache.org/jira/browse/YARN-10015 Project: Hadoop YARN Issue Type: Bug Reporter: Aihua Xu Assignee: Aihua Xu The sample command in SLS README {{bin/slsrun.sh —-input-rumen=sample-data/2jobs2min-rumen-jh.json —-output-dir=sample-output}} contains a dash from different encoding. The command will give the following exception. ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938133#comment-16938133 ] Aihua Xu commented on YARN-9615: This seems to be an important feature since sometimes we will see a large queue. The generic approach looks promising which can be adopted for other queues as well. > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9615.poc.patch, screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2255) YARN Audit logging not added to log4j.properties
[ https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932598#comment-16932598 ] Aihua Xu commented on YARN-2255: Thanks [~cheersyang] > YARN Audit logging not added to log4j.properties > > > Key: YARN-2255 > URL: https://issues.apache.org/jira/browse/YARN-2255 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Varun Saxena >Assignee: Aihua Xu >Priority: Major > Fix For: 3.3.0, 3.2.2, 3.1.4 > > Attachments: YARN-2255.1.patch, YARN-2255.patch > > > log4j.properties file which is part of the hadoop package, doesnt have YARN > Audit logging tied to it. This leads to audit logs getting generated in > normal log files. Audit logs should be generated in a separate log file -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2255) YARN Audit logging not added to log4j.properties
[ https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928711#comment-16928711 ] Aihua Xu commented on YARN-2255: [~cheersyang] Thanks Weiwei. Yes. I tested locally and it works well. Here is the sample output from the test. Can you help commit the change? aihuaxu-C02WW0RCHTDG:logs aihuaxu$ cat rm-audit.log 2019-09-12 09:31:18,871 INFO resourcemanager.RMAuditLogger: USER=aihuaxu IP=127.0.0.1OPERATION=Submit Application RequestTARGET=ClientRMService RESULT=SUCCESS APPID=application_1568248909028_0001QUEUENAME=default 2019-09-12 09:31:19,480 INFO resourcemanager.RMAuditLogger: USER=aihuaxu OPERATION=AM Allocated ContainerTARGET=SchedulerApp RESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_01 RESOURCE=QUEUENAME=default 2019-09-12 09:31:31,191 INFO resourcemanager.RMAuditLogger: USER=aihuaxu IP=127.0.0.1OPERATION=Register App Master TARGET=ApplicationMasterService RESULT=SUCCESS APPID=application_1568248909028_0001 APPATTEMPTID=appattempt_1568248909028_0001_01 2019-09-12 09:31:31,480 INFO resourcemanager.RMAuditLogger: USER=aihuaxu OPERATION=AM Allocated ContainerTARGET=SchedulerApp RESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_02 RESOURCE=QUEUENAME=default 2019-09-12 09:31:32,489 INFO resourcemanager.RMAuditLogger: USER=aihuaxu OPERATION=AM Allocated ContainerTARGET=SchedulerApp RESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_03 RESOURCE=QUEUENAME=default 2019-09-12 09:31:44,326 INFO resourcemanager.RMAuditLogger: USER=aihuaxu OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_03 RESOURCE=QUEUENAME=default 2019-09-12 09:31:44,331 INFO resourcemanager.RMAuditLogger: USER=aihuaxu OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_02 RESOURCE=QUEUENAME=default 2019-09-12 09:31:44,788 INFO resourcemanager.RMAuditLogger: USER=aihuaxu OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_01 RESOURCE=QUEUENAME=default 2019-09-12 09:31:44,813 INFO resourcemanager.RMAuditLogger: USER=aihuaxu OPERATION=Application Finished - Succeeded TARGET=RMAppManager RESULT=SUCCESS APPID=application_1568248909028_0001 aihuaxu-C02WW0RCHTDG:logs aihuaxu$ cat nm-audit.log 2019-09-12 09:31:20,263INFO nodemanager.NMAuditLogger: USER=appattempt_1568248909028_0001_01IP=127.0.0.1 OPERATION=Start Container Request TARGET=ContainerManageImplRESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_01 2019-09-12 09:31:31,626INFO nodemanager.NMAuditLogger: USER=appattempt_1568248909028_0001_01IP=127.0.0.1 OPERATION=Start Container Request TARGET=ContainerManageImplRESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_02 2019-09-12 09:31:32,787INFO nodemanager.NMAuditLogger: USER=appattempt_1568248909028_0001_01IP=127.0.0.1 OPERATION=Start Container Request TARGET=ContainerManageImplRESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_03 2019-09-12 09:31:44,305INFO nodemanager.NMAuditLogger: USER=aihuaxu OPERATION=Container Finished - SucceededTARGET=ContainerImpl RESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_03 2019-09-12 09:31:44,311INFO nodemanager.NMAuditLogger: USER=aihuaxu OPERATION=Container Finished - SucceededTARGET=ContainerImpl RESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_02 2019-09-12 09:31:44,777INFO nodemanager.NMAuditLogger: USER=aihuaxu OPERATION=Container Finished - SucceededTARGET=ContainerImpl RESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_01 2019-09-12 09:31:44,828INFO nodemanager.NMAuditLogger: USER=appattempt_1568248909028_0001_01IP=127.0.0.1OPERATION=Stop Container RequestTARGET=ContainerManageImplRESULT=SUCCESS APPID=application_1568248909028_0001 CONTAINERID=container_1568248909028_0001_01_01 > YARN Audit logging not added to log4j.properties >
[jira] [Commented] (YARN-2255) YARN Audit logging not added to log4j.properties
[ https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927239#comment-16927239 ] Aihua Xu commented on YARN-2255: Assign to myself. [~wangda], [~cheersyang] Can you help review the change? It's logging configuration to be consistent with hdfs audit log. Thanks. > YARN Audit logging not added to log4j.properties > > > Key: YARN-2255 > URL: https://issues.apache.org/jira/browse/YARN-2255 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Varun Saxena >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-2255.1.patch, YARN-2255.patch > > > log4j.properties file which is part of the hadoop package, doesnt have YARN > Audit logging tied to it. This leads to audit logs getting generated in > normal log files. Audit logs should be generated in a separate log file -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-2255) YARN Audit logging not added to log4j.properties
[ https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned YARN-2255: -- Assignee: Aihua Xu (was: Ying Zhang) > YARN Audit logging not added to log4j.properties > > > Key: YARN-2255 > URL: https://issues.apache.org/jira/browse/YARN-2255 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Varun Saxena >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-2255.1.patch, YARN-2255.patch > > > log4j.properties file which is part of the hadoop package, doesnt have YARN > Audit logging tied to it. This leads to audit logs getting generated in > normal log files. Audit logs should be generated in a separate log file -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2255) YARN Audit logging not added to log4j.properties
[ https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-2255: --- Attachment: YARN-2255.1.patch > YARN Audit logging not added to log4j.properties > > > Key: YARN-2255 > URL: https://issues.apache.org/jira/browse/YARN-2255 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Varun Saxena >Assignee: Ying Zhang >Priority: Major > Attachments: YARN-2255.1.patch, YARN-2255.patch > > > log4j.properties file which is part of the hadoop package, doesnt have YARN > Audit logging tied to it. This leads to audit logs getting generated in > normal log files. Audit logs should be generated in a separate log file -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2255) YARN Audit logging not added to log4j.properties
[ https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927114#comment-16927114 ] Aihua Xu commented on YARN-2255: [~Ying Zhang] I think it's a good idea to have a separate audit log file as hdfs file. I can rebase and try to get it committed if you are not working on it. > YARN Audit logging not added to log4j.properties > > > Key: YARN-2255 > URL: https://issues.apache.org/jira/browse/YARN-2255 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Varun Saxena >Assignee: Ying Zhang >Priority: Major > Attachments: YARN-2255.patch > > > log4j.properties file which is part of the hadoop package, doesnt have YARN > Audit logging tied to it. This leads to audit logs getting generated in > normal log files. Audit logs should be generated in a separate log file -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2255) YARN Audit logging not added to log4j.properties
[ https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927090#comment-16927090 ] Aihua Xu commented on YARN-2255: [~Ying Zhang] Wondering why it's never getting committed. > YARN Audit logging not added to log4j.properties > > > Key: YARN-2255 > URL: https://issues.apache.org/jira/browse/YARN-2255 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Varun Saxena >Assignee: Ying Zhang >Priority: Major > Attachments: YARN-2255.patch > > > log4j.properties file which is part of the hadoop package, doesnt have YARN > Audit logging tied to it. This leads to audit logs getting generated in > normal log files. Audit logs should be generated in a separate log file -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before
[ https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874316#comment-16874316 ] Aihua Xu commented on YARN-6629: [~cheersyang] for this particular issue, since it's already in 2.10, I think we don't need additional backport since 2.10 will be the next release on branch-2, is that correct? > NPE occurred when container allocation proposal is applied but its resource > requests are removed before > --- > > Key: YARN-6629 > URL: https://issues.apache.org/jira/browse/YARN-6629 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.1.0, 2.10.0 > > Attachments: YARN-6629.001.patch, YARN-6629.002.patch, > YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, > YARN-6629.006.patch, YARN-6629.branch-2.001.patch > > > I wrote a test case to reproduce another problem for branch-2 and found new > NPE error, log: > {code} > FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in > handling event type NODE_UPDATE to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516) > at > org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225) > at > org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31) > at org.mockito.internal.MockHandler.handle(MockHandler.java:97) > at > org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply() > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:745) > {code} > Reproduce this error in chronological order: > 1. AM started and requested 1 container with schedulerRequestKey#1 : > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests > Added schedulerRequestKey#1 into schedulerKeyToPlacementSets > 2. Scheduler allocatd 1 container for this request and accepted the proposal > 3. AM removed this request > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests --> > AppSchedulingInfo#addToPlacementSets --> > AppSchedulingInfo#updatePendingResources > Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets) > 4. Scheduler applied this proposal > CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> > AppSchedulingInfo#allocate > Throw NPE when called > schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, > type, node); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.
[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before
[ https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872902#comment-16872902 ] Aihua Xu commented on YARN-6629: Thanks. Just notice it's not included in 2.9.2 but it's in 2.10. > NPE occurred when container allocation proposal is applied but its resource > requests are removed before > --- > > Key: YARN-6629 > URL: https://issues.apache.org/jira/browse/YARN-6629 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.1.0, 2.10.0 > > Attachments: YARN-6629.001.patch, YARN-6629.002.patch, > YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, > YARN-6629.006.patch, YARN-6629.branch-2.001.patch > > > I wrote a test case to reproduce another problem for branch-2 and found new > NPE error, log: > {code} > FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in > handling event type NODE_UPDATE to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516) > at > org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225) > at > org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31) > at org.mockito.internal.MockHandler.handle(MockHandler.java:97) > at > org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply() > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:745) > {code} > Reproduce this error in chronological order: > 1. AM started and requested 1 container with schedulerRequestKey#1 : > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests > Added schedulerRequestKey#1 into schedulerKeyToPlacementSets > 2. Scheduler allocatd 1 container for this request and accepted the proposal > 3. AM removed this request > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests --> > AppSchedulingInfo#addToPlacementSets --> > AppSchedulingInfo#updatePendingResources > Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets) > 4. Scheduler applied this proposal > CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> > AppSchedulingInfo#allocate > Throw NPE when called > schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, > type, node); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before
[ https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872701#comment-16872701 ] Aihua Xu commented on YARN-6629: Can we also backport this change to branch-2? It's critical since it's causing ResourceManager to crash. > NPE occurred when container allocation proposal is applied but its resource > requests are removed before > --- > > Key: YARN-6629 > URL: https://issues.apache.org/jira/browse/YARN-6629 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.1.0, 2.10.0 > > Attachments: YARN-6629.001.patch, YARN-6629.002.patch, > YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, > YARN-6629.006.patch, YARN-6629.branch-2.001.patch > > > I wrote a test case to reproduce another problem for branch-2 and found new > NPE error, log: > {code} > FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in > handling event type NODE_UPDATE to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516) > at > org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225) > at > org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31) > at org.mockito.internal.MockHandler.handle(MockHandler.java:97) > at > org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply() > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:745) > {code} > Reproduce this error in chronological order: > 1. AM started and requested 1 container with schedulerRequestKey#1 : > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests > Added schedulerRequestKey#1 into schedulerKeyToPlacementSets > 2. Scheduler allocatd 1 container for this request and accepted the proposal > 3. AM removed this request > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests --> > AppSchedulingInfo#addToPlacementSets --> > AppSchedulingInfo#updatePendingResources > Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets) > 4. Scheduler applied this proposal > CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> > AppSchedulingInfo#allocate > Throw NPE when called > schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, > type, node); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872665#comment-16872665 ] Aihua Xu commented on YARN-8193: Can we also push this patch to 2.9.x branch? It's causing RM to crash. > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Blocker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2-001.patch, > YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9478) Add timeout for renew delegation thread pool
Aihua Xu created YARN-9478: -- Summary: Add timeout for renew delegation thread pool Key: YARN-9478 URL: https://issues.apache.org/jira/browse/YARN-9478 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Aihua Xu Assignee: Aihua Xu Yarn by default creates a thread pool with 50 threads to handle all the token renewal for the running jobs. Currently there is no timeout for the threads so if there is one application is slowing to renew token, then eventually Yarn could run into the situation that all the threads are busy with renewing tokens for such application types and the whole Yarn cluster can't handle new applications. Propose to add timeout to the threads in the thread pool so the threads get killed after certain time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9463) Add queueName info when failing with queue capacity sanity check
[ https://issues.apache.org/jira/browse/YARN-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814573#comment-16814573 ] Aihua Xu commented on YARN-9463: Thanks a lot [~cheersyang] for your quick code review and submission. > Add queueName info when failing with queue capacity sanity check > > > Key: YARN-9463 > URL: https://issues.apache.org/jira/browse/YARN-9463 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.9.1 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: YARN-9463.1.patch > > > In queue sanity check of CSQueueUtils.java , we are throwing "Illegal queue > capacity setting, (abs-capacity=0.00160782) > > (abs-maximum-capacity=0.0016027201). When label=[]". Better to add queue name > so admin can identify the problematic queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9463) Add queueName info when failing with queue capacity sanity check
[ https://issues.apache.org/jira/browse/YARN-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813645#comment-16813645 ] Aihua Xu commented on YARN-9463: Simple fix: the error will print out queue info as well now. > Add queueName info when failing with queue capacity sanity check > > > Key: YARN-9463 > URL: https://issues.apache.org/jira/browse/YARN-9463 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.9.1 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: YARN-9463.1.patch > > > In queue sanity check of CSQueueUtils.java , we are throwing "Illegal queue > capacity setting, (abs-capacity=0.00160782) > > (abs-maximum-capacity=0.0016027201). When label=[]". Better to add queue name > so admin can identify the problematic queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9463) Add queueName info when failing with queue capacity sanity check
[ https://issues.apache.org/jira/browse/YARN-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9463: --- Attachment: YARN-9463.1.patch > Add queueName info when failing with queue capacity sanity check > > > Key: YARN-9463 > URL: https://issues.apache.org/jira/browse/YARN-9463 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.9.1 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: YARN-9463.1.patch > > > In queue sanity check of CSQueueUtils.java , we are throwing "Illegal queue > capacity setting, (abs-capacity=0.00160782) > > (abs-maximum-capacity=0.0016027201). When label=[]". Better to add queue name > so admin can identify the problematic queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9463) Add queueName info when failing with queue capacity sanity check
Aihua Xu created YARN-9463: -- Summary: Add queueName info when failing with queue capacity sanity check Key: YARN-9463 URL: https://issues.apache.org/jira/browse/YARN-9463 Project: Hadoop YARN Issue Type: Improvement Components: capacity scheduler Affects Versions: 2.9.1 Reporter: Aihua Xu Assignee: Aihua Xu In queue sanity check of CSQueueUtils.java , we are throwing "Illegal queue capacity setting, (abs-capacity=0.00160782) > (abs-maximum-capacity=0.0016027201). When label=[]". Better to add queue name so admin can identify the problematic queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9297) Renaming RM could cause application to crash
[ https://issues.apache.org/jira/browse/YARN-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu resolved YARN-9297. Resolution: Duplicate > Renaming RM could cause application to crash > > > Key: YARN-9297 > URL: https://issues.apache.org/jira/browse/YARN-9297 > Project: Hadoop YARN > Issue Type: Improvement > Components: security >Affects Versions: 2.6.0 >Reporter: Aihua Xu >Priority: Major > > In this line, we are throwing UnknownHostException when any RM host can't > resolve to ip address. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448 > There are some cases that one RM needs to rename or map to different ip > address, then it will crash the application although other RMs are running > fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9297) Renaming RM could cause application to crash
[ https://issues.apache.org/jira/browse/YARN-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766618#comment-16766618 ] Aihua Xu commented on YARN-9297: Yes. You are right. I will resolve as dup. Thanks [~jojochuang] > Renaming RM could cause application to crash > > > Key: YARN-9297 > URL: https://issues.apache.org/jira/browse/YARN-9297 > Project: Hadoop YARN > Issue Type: Improvement > Components: security >Affects Versions: 2.6.0 >Reporter: Aihua Xu >Priority: Major > > In this line, we are throwing UnknownHostException when any RM host can't > resolve to ip address. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448 > There are some cases that one RM needs to rename or map to different ip > address, then it will crash the application although other RMs are running > fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9297) Renaming RM could cause application to crash
Aihua Xu created YARN-9297: -- Summary: Renaming RM could cause application to crash Key: YARN-9297 URL: https://issues.apache.org/jira/browse/YARN-9297 Project: Hadoop YARN Issue Type: Improvement Components: security Affects Versions: 2.6.0 Reporter: Aihua Xu In this line, we are throwing UnknownHostException when any RM host can't resolve to ip address. https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448 There are some cases that one RM needs to rename or map to different ip address, then it will crash the application although other RMs are running fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9200) Enable resource configuration of queue capacity for different resources independently
[ https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757740#comment-16757740 ] Aihua Xu commented on YARN-9200: [~sunilg] Since you worked on absolute resource configuration, wants to hear from you as well. Thanks. > Enable resource configuration of queue capacity for different resources > independently > - > > Key: YARN-9200 > URL: https://issues.apache.org/jira/browse/YARN-9200 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9200.draft > > > In capacity scheduler, currently two resource allocations are supported. 1. > percentage allocation for child queues - the child queue gets a defined > percentage of the resources for all the resource types; 2. absolute values > (YARN-5881) - each resource is configured an absolute values. > Right now we can't mix these case together and it would also very confusing > to mix them in one cluster. The second case actually is more targeting toward > cloud env. > In a non-cloud env, the ability to configure each resource independently is > also useful, but percentage is preferable over absolute value. One thought > here is to add the percentage configuration for each resource type on the > queue. That would allow us to configure memory bounded queues, or CPU bounded > queues. We can also keep backward compatible: each resource type just gets > the same percentage if no percentage is configured for individual resource > type. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9200) Enable resource configuration of queue capacity for different resources independently
[ https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755286#comment-16755286 ] Aihua Xu commented on YARN-9200: [~leftnoteasy], [~rohithsharma] and [~cheersyang] I'm trying to add the support to have separate percentage value for each resource (i.e., have an array instead of float value for each capacity). Seems there are much more change than I originally thought, especially the changes in QueueCapacities and the related ones. Before I move forward with such massive changes, I attached the draft and want to check with you folks if there is a better way. What I have done and I'm planning to do is: CapacitySchedulerConfiguration supports both "45" or "memory=80,vCores=20" and internally keeps an array - one value per resource; in QueueCapacities to map each label to an array of Capacities. You feedback is appreciated. > Enable resource configuration of queue capacity for different resources > independently > - > > Key: YARN-9200 > URL: https://issues.apache.org/jira/browse/YARN-9200 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9200.draft > > > In capacity scheduler, currently two resource allocations are supported. 1. > percentage allocation for child queues - the child queue gets a defined > percentage of the resources for all the resource types; 2. absolute values > (YARN-5881) - each resource is configured an absolute values. > Right now we can't mix these case together and it would also very confusing > to mix them in one cluster. The second case actually is more targeting toward > cloud env. > In a non-cloud env, the ability to configure each resource independently is > also useful, but percentage is preferable over absolute value. One thought > here is to add the percentage configuration for each resource type on the > queue. That would allow us to configure memory bounded queues, or CPU bounded > queues. We can also keep backward compatible: each resource type just gets > the same percentage if no percentage is configured for individual resource > type. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9200) Enable resource configuration of queue capacity for different resources independently
[ https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9200: --- Attachment: YARN-9200.draft > Enable resource configuration of queue capacity for different resources > independently > - > > Key: YARN-9200 > URL: https://issues.apache.org/jira/browse/YARN-9200 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9200.draft > > > In capacity scheduler, currently two resource allocations are supported. 1. > percentage allocation for child queues - the child queue gets a defined > percentage of the resources for all the resource types; 2. absolute values > (YARN-5881) - each resource is configured an absolute values. > Right now we can't mix these case together and it would also very confusing > to mix them in one cluster. The second case actually is more targeting toward > cloud env. > In a non-cloud env, the ability to configure each resource independently is > also useful, but percentage is preferable over absolute value. One thought > here is to add the percentage configuration for each resource type on the > queue. That would allow us to configure memory bounded queues, or CPU bounded > queues. We can also keep backward compatible: each resource type just gets > the same percentage if no percentage is configured for individual resource > type. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751380#comment-16751380 ] Aihua Xu commented on YARN-9116: Thanks [~cheersyang] and [~leftnoteasy] for your help. > Capacity Scheduler: implements queue level maximum-allocation inheritance > - > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, > YARN-9116.4.patch, YARN-9116.5.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751359#comment-16751359 ] Aihua Xu commented on YARN-9116: The post commit is failing with the following exception randomly. There are a couple infra jiras for this already INFRA-13506, INFRA-17015. Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.3.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 2.6.1', expected version is '2.5.0' -> [Help 1] > Capacity Scheduler: implements queue level maximum-allocation inheritance > - > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, > YARN-9116.4.patch, YARN-9116.5.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750490#comment-16750490 ] Aihua Xu commented on YARN-9116: Thanks [~cheersyang] for your valuable comment. > Capacity Scheduler: implements queue level maximum-allocation inheritance > - > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, > YARN-9116.4.patch, YARN-9116.5.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9116: --- Attachment: YARN-9116.5.patch > Capacity Scheduler: implements queue level maximum-allocation inheritance > - > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, > YARN-9116.4.patch, YARN-9116.5.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750291#comment-16750291 ] Aihua Xu commented on YARN-9116: patch-5: minor changes to address Weiwei's checkstyle issue. The UT failure passes locally and it's not related. > Capacity Scheduler: implements queue level maximum-allocation inheritance > - > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, > YARN-9116.4.patch, YARN-9116.5.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9200) Enable resource configuration of queue capacity for different resources independently
[ https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750275#comment-16750275 ] Aihua Xu commented on YARN-9200: [~leftnoteasy] and [~rohithsharma] I will spend time to take a look at the change. Let me know if you already worked on that. I couldn't find similar jira, btw. > Enable resource configuration of queue capacity for different resources > independently > - > > Key: YARN-9200 > URL: https://issues.apache.org/jira/browse/YARN-9200 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > In capacity scheduler, currently two resource allocations are supported. 1. > percentage allocation for child queues - the child queue gets a defined > percentage of the resources for all the resource types; 2. absolute values > (YARN-5881) - each resource is configured an absolute values. > Right now we can't mix these case together and it would also very confusing > to mix them in one cluster. The second case actually is more targeting toward > cloud env. > In a non-cloud env, the ability to configure each resource independently is > also useful, but percentage is preferable over absolute value. One thought > here is to add the percentage configuration for each resource type on the > queue. That would allow us to configure memory bounded queues, or CPU bounded > queues. We can also keep backward compatible: each resource type just gets > the same percentage if no percentage is configured for individual resource > type. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749056#comment-16749056 ] Aihua Xu commented on YARN-9116: patch-4: to address the comments from [~cheersyang]. I didn't try to correct checkstyle issues not related to the patch. > Capacity Scheduler: implements queue level maximum-allocation inheritance > - > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, > YARN-9116.4.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9116: --- Attachment: YARN-9116.4.patch > Capacity Scheduler: implements queue level maximum-allocation inheritance > - > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch, > YARN-9116.4.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748929#comment-16748929 ] Aihua Xu commented on YARN-9116: [~cheersyang] Thanks for the feedback. Those make sense. I will upload a new patch. > Capacity Scheduler: implements queue level maximum-allocation inheritance > - > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9211) The yarn resourcemanager project keeps failing with " There was a timeout or other error in the fork" error
[ https://issues.apache.org/jira/browse/YARN-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9211: --- Description: Recently notice that the build keeps failing in resourcemanager project with " There was a timeout or other error in the fork". Here is the part of the log, but I don't see any UT failures. {noformat} [WARNING] Tests run: 2445, Failures: 0, Errors: 0, Skipped: 7 [INFO] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:32 h [INFO] Finished at: 2019-01-18T11:30:20+00:00 [INFO] Final Memory: 23M/773M [INFO] [WARNING] The requested profile "parallel-tests" could not be activated because it does not exist. [WARNING] The requested profile "native" could not be activated because it does not exist. [WARNING] The requested profile "yarn-ui" could not be activated because it does not exist. [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on project hadoop-yarn-server-resourcemanager: There was a timeout or other error in the fork -> [Help 1] {noformat} Here is a job https://builds.apache.org/job/PreCommit-YARN-Build/23107/console was: Recently notice that the build keeps failing in resourcemanager project with " There was a timeout or other error in the fork". Here is the part of the log, but I don't see any UT failures. {noformat} [WARNING] Tests run: 2445, Failures: 0, Errors: 0, Skipped: 7 [INFO] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:32 h [INFO] Finished at: 2019-01-18T11:30:20+00:00 [INFO] Final Memory: 23M/773M [INFO] [WARNING] The requested profile "parallel-tests" could not be activated because it does not exist. [WARNING] The requested profile "native" could not be activated because it does not exist. [WARNING] The requested profile "yarn-ui" could not be activated because it does not exist. [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on project hadoop-yarn-server-resourcemanager: There was a timeout or other error in the fork -> [Help 1] {noformat} > The yarn resourcemanager project keeps failing with " There was a timeout or > other error in the fork" error > --- > > Key: YARN-9211 > URL: https://issues.apache.org/jira/browse/YARN-9211 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 3.1.2 >Reporter: Aihua Xu >Priority: Major > > Recently notice that the build keeps failing in resourcemanager project with > " There was a timeout or other error in the fork". > Here is the part of the log, but I don't see any UT failures. > {noformat} > [WARNING] Tests run: 2445, Failures: 0, Errors: 0, Skipped: 7 > [INFO] > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 01:32 h > [INFO] Finished at: 2019-01-18T11:30:20+00:00 > [INFO] Final Memory: 23M/773M > [INFO] > > [WARNING] The requested profile "parallel-tests" could not be activated > because it does not exist. > [WARNING] The requested profile "native" could not be activated because it > does not exist. > [WARNING] The requested profile "yarn-ui" could not be activated because it > does not exist. > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) > on project hadoop-yarn-server-resourcemanager: There was a timeout or other > error in the fork -> [Help 1] > {noformat} > Here is a job https://builds.apache.org/job/PreCommit-YARN-Build/23107/console -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9211) The yarn resourcemanager project keeps failing with " There was a timeout or other error in the fork" error
Aihua Xu created YARN-9211: -- Summary: The yarn resourcemanager project keeps failing with " There was a timeout or other error in the fork" error Key: YARN-9211 URL: https://issues.apache.org/jira/browse/YARN-9211 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 3.1.2 Reporter: Aihua Xu Recently notice that the build keeps failing in resourcemanager project with " There was a timeout or other error in the fork". Here is the part of the log, but I don't see any UT failures. {noformat} [WARNING] Tests run: 2445, Failures: 0, Errors: 0, Skipped: 7 [INFO] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:32 h [INFO] Finished at: 2019-01-18T11:30:20+00:00 [INFO] Final Memory: 23M/773M [INFO] [WARNING] The requested profile "parallel-tests" could not be activated because it does not exist. [WARNING] The requested profile "native" could not be activated because it does not exist. [WARNING] The requested profile "yarn-ui" could not be activated because it does not exist. [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on project hadoop-yarn-server-resourcemanager: There was a timeout or other error in the fork -> [Help 1] {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746530#comment-16746530 ] Aihua Xu commented on YARN-9116: [~cheersyang] The timeout issue seems not related since I'm seeing the failure from other build like https://builds.apache.org/job/PreCommit-YARN-Build/23108/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt or https://builds.apache.org/job/PreCommit-YARN-Build/23107/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Can you take another look at the latest patch? I will file a separate jira to fix the build issue. Right now I haven't found out why it keeps failing. Let me know if you have any thoughts. > Capacity Scheduler: implements queue level maximum-allocation inheritance > - > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9116) Capacity Scheduler: implements queue level maximum-allocation inheritance
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9116: --- Summary: Capacity Scheduler: implements queue level maximum-allocation inheritance (was: Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues) > Capacity Scheduler: implements queue level maximum-allocation inheritance > - > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9116: --- Attachment: YARN-9116.3.patch > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch, YARN-9116.3.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744537#comment-16744537 ] Aihua Xu commented on YARN-9116: Checked the UT failure but don't see explicit test case failures but the build indicates a timeout. [~cheersyang] Can you tell which one is causing the timeout from the log? Otherwise, I can attach a new patch to test out. > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744457#comment-16744457 ] Aihua Xu commented on YARN-9116: I will take a look at UT failure [~cheersyang] I was debating if I should make it backward compatible for that fairly new feature. I will add that. Another thing: in this patch, I just silently call {{maximumAllocation = Resources.componentwiseMin(queueMax, clusterMax);}} to get queue level maximum-allocation. Should we fail with some exception if queueMax > clusterMax and let the admin fix the configuration? That seems to be what it was. > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9200) Enable resource configuration of queue capacity for different resources independently
[ https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743284#comment-16743284 ] Aihua Xu commented on YARN-9200: Thanks [~leftnoteasy] Good to know it's the right direction. :) [~rohithsharma] Let me know if you are actively working on this. > Enable resource configuration of queue capacity for different resources > independently > - > > Key: YARN-9200 > URL: https://issues.apache.org/jira/browse/YARN-9200 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > In capacity scheduler, currently two resource allocations are supported. 1. > percentage allocation for child queues - the child queue gets a defined > percentage of the resources for all the resource types; 2. absolute values > (YARN-5881) - each resource is configured an absolute values. > Right now we can't mix these case together and it would also very confusing > to mix them in one cluster. The second case actually is more targeting toward > cloud env. > In a non-cloud env, the ability to configure each resource independently is > also useful, but percentage is preferable over absolute value. One thought > here is to add the percentage configuration for each resource type on the > queue. That would allow us to configure memory bounded queues, or CPU bounded > queues. We can also keep backward compatible: each resource type just gets > the same percentage if no percentage is configured for individual resource > type. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9200) Enable resource configuration of queue capacity for different resources independently
[ https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9200: --- Description: In capacity scheduler, currently two resource allocations are supported. 1. percentage allocation for child queues - the child queue gets a defined percentage of the resources for all the resource types; 2. absolute values (YARN-5881) - each resource is configured an absolute values. Right now we can't mix these case together and it would also very confusing to mix them in one cluster. The second case actually is more targeting toward cloud env. In a non-cloud env, the ability to configure each resource independently is also useful, but percentage is preferable over absolute value. One thought here is to add the percentage configuration for each resource type on the queue. That would allow us to configure memory bounded queues, or CPU bounded queues. We can also keep backward compatible: each resource type just gets the same percentage if no percentage is configured for individual resource type. was: In capacity scheduler, currently two resource allocations are supported. 1. percentage allocation for child queues - the child queue gets a defined percentage of the resources for all the resource types; 2. absolute values (YARN-5881) - each resource is configured an absolute values. Right now we can't mix these case together and it would also very confusing to mix them in one cluster. The second case actually is more targeting toward cloud env. In a non-cloud env, the ability to configure each resource independently is also useful, but percentage is preferable over absolute value. One thought here is to add the percentage configuration for each resource type on the queue. That would allow us to configure memory bounded queues, or CPU bounded queues. > Enable resource configuration of queue capacity for different resources > independently > - > > Key: YARN-9200 > URL: https://issues.apache.org/jira/browse/YARN-9200 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Aihua Xu >Priority: Major > > In capacity scheduler, currently two resource allocations are supported. 1. > percentage allocation for child queues - the child queue gets a defined > percentage of the resources for all the resource types; 2. absolute values > (YARN-5881) - each resource is configured an absolute values. > Right now we can't mix these case together and it would also very confusing > to mix them in one cluster. The second case actually is more targeting toward > cloud env. > In a non-cloud env, the ability to configure each resource independently is > also useful, but percentage is preferable over absolute value. One thought > here is to add the percentage configuration for each resource type on the > queue. That would allow us to configure memory bounded queues, or CPU bounded > queues. We can also keep backward compatible: each resource type just gets > the same percentage if no percentage is configured for individual resource > type. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9200) Enable resource configuration of queue capacity for different resources independently
[ https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned YARN-9200: -- Assignee: Aihua Xu > Enable resource configuration of queue capacity for different resources > independently > - > > Key: YARN-9200 > URL: https://issues.apache.org/jira/browse/YARN-9200 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > In capacity scheduler, currently two resource allocations are supported. 1. > percentage allocation for child queues - the child queue gets a defined > percentage of the resources for all the resource types; 2. absolute values > (YARN-5881) - each resource is configured an absolute values. > Right now we can't mix these case together and it would also very confusing > to mix them in one cluster. The second case actually is more targeting toward > cloud env. > In a non-cloud env, the ability to configure each resource independently is > also useful, but percentage is preferable over absolute value. One thought > here is to add the percentage configuration for each resource type on the > queue. That would allow us to configure memory bounded queues, or CPU bounded > queues. We can also keep backward compatible: each resource type just gets > the same percentage if no percentage is configured for individual resource > type. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9200) Enable resource configuration of queue capacity for different resources independently
[ https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9200: --- Description: In capacity scheduler, currently two resource allocations are supported. 1. percentage allocation for child queues - the child queue gets a defined percentage of the resources for all the resource types; 2. absolute values (YARN-5881) - each resource is configured an absolute values. Right now we can't mix these case together and it would also very confusing to mix them in one cluster. The second case actually is more targeting toward cloud env. In a non-cloud env, the ability to configure each resource independently is also useful, but percentage is preferable over absolute value. One thought here is to add the percentage configuration for each resource type on the queue. That would allow us to configure memory bounded queues, or CPU bounded queues. was: In capacity scheduler, currently two resource allocations are supported. 1. percentage allocation for child queues - the child queue gets a defined percentage of the resources for all the resource types; 2. absolute values (YARN-5881) - each resource is configured an absolute values. Right now we can't mix these case together and it would also very confusing to mix them in one cluster. The second case actually is more targeting toward cloud env. In a non-cloud env, the ability to configure each resource independently is also useful, but in such env, percentage is preferable instead of absolute value. One thought here is to add the percentage configuration for each resource type on the queue. That would allow us to configure memory bounded queues, or CPU bounded queues. > Enable resource configuration of queue capacity for different resources > independently > - > > Key: YARN-9200 > URL: https://issues.apache.org/jira/browse/YARN-9200 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Aihua Xu >Priority: Major > > In capacity scheduler, currently two resource allocations are supported. 1. > percentage allocation for child queues - the child queue gets a defined > percentage of the resources for all the resource types; 2. absolute values > (YARN-5881) - each resource is configured an absolute values. > Right now we can't mix these case together and it would also very confusing > to mix them in one cluster. The second case actually is more targeting toward > cloud env. > In a non-cloud env, the ability to configure each resource independently is > also useful, but percentage is preferable over absolute value. One thought > here is to add the percentage configuration for each resource type on the > queue. That would allow us to configure memory bounded queues, or CPU bounded > queues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9200) Enable resource configuration of queue capacity for different resources independently
[ https://issues.apache.org/jira/browse/YARN-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742626#comment-16742626 ] Aihua Xu commented on YARN-9200: [~wangda], [~sunilg], [~cheersyang] Want to hear your thoughts. Would that improvement work? > Enable resource configuration of queue capacity for different resources > independently > - > > Key: YARN-9200 > URL: https://issues.apache.org/jira/browse/YARN-9200 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Aihua Xu >Priority: Major > > In capacity scheduler, currently two resource allocations are supported. 1. > percentage allocation for child queues - the child queue gets a defined > percentage of the resources for all the resource types; 2. absolute values > (YARN-5881) - each resource is configured an absolute values. > Right now we can't mix these case together and it would also very confusing > to mix them in one cluster. The second case actually is more targeting toward > cloud env. > In a non-cloud env, the ability to configure each resource independently is > also useful, but in such env, percentage is preferable instead of absolute > value. One thought here is to add the percentage configuration for each > resource type on the queue. That would allow us to configure memory bounded > queues, or CPU bounded queues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9200) Enable resource configuration of queue capacity for different resources independently
Aihua Xu created YARN-9200: -- Summary: Enable resource configuration of queue capacity for different resources independently Key: YARN-9200 URL: https://issues.apache.org/jira/browse/YARN-9200 Project: Hadoop YARN Issue Type: Improvement Components: capacity scheduler Affects Versions: 3.1.0 Reporter: Aihua Xu In capacity scheduler, currently two resource allocations are supported. 1. percentage allocation for child queues - the child queue gets a defined percentage of the resources for all the resource types; 2. absolute values (YARN-5881) - each resource is configured an absolute values. Right now we can't mix these case together and it would also very confusing to mix them in one cluster. The second case actually is more targeting toward cloud env. In a non-cloud env, the ability to configure each resource independently is also useful, but in such env, percentage is preferable instead of absolute value. One thought here is to add the percentage configuration for each resource type on the queue. That would allow us to configure memory bounded queues, or CPU bounded queues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9116: --- Attachment: YARN-9116.2.patch > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9116: --- Attachment: (was: YARN-9116.2.patch) > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9116: --- Attachment: YARN-9116.2.patch > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch, YARN-9116.2.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738441#comment-16738441 ] Aihua Xu commented on YARN-9116: Thanks [~leftnoteasy] and [~cheersyang] for the suggestion. Just to clarify the behavior: the queue's maximum-allocation will not exceed the global setting (yarn.scheduler.capacity.maximum-allocation-mb) to keep the compatibility; among the queues, the child inherits the setting from the parent and the child queue can override the parent queue with larger or smaller setting but still respecting the global setting. That sounds reasonable to me. I will work on the support of maximum-allocation-mb/vcores to parent queues first and have a follow up on the general resource type. > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737351#comment-16737351 ] Aihua Xu commented on YARN-9116: [~cheersyang] As I understand currently you can only set maximum-allocation-mb on the leaf queue, not intermediate parent queues. > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736083#comment-16736083 ] Aihua Xu commented on YARN-9116: [~cheersyang] In the example above, we are still introducing the incompatibility change since the root is set to 16G {{yarn.scheduler.capacity.root.maximum-allocation-mb=16G}} while the child queue root.large is set to 80G (larger value). Do you think it's OK change? What you are proposing is: the child can override the parent's value (larger or smaller) but won't exceed the global value, correct? > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734369#comment-16734369 ] Aihua Xu commented on YARN-9116: [~cheersyang] That was my initial idea (see YARN-9055) that we can override the parent setting, but it introduces incompatibility since it's always assumed that the child queue can't have larger settings than the parents. Some clients such as spark will check the top settings and fail immediately if the resource request can't be satisfied. > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732283#comment-16732283 ] Aihua Xu commented on YARN-9116: Thanks [~cheersyang] for the comment. Happy new year. So you are suggesting the following, is that correct? Actually that would introduce many queue level configuration if we don't introduce new property even with such inheritance. Even after we implement inheritance mechanism, we have to set the global to be 120G/256vCores (the maximum value allowed in the cluster) and then override all the top queues to be 16G/16vCores and set the larger container top queue to 120G/256vCores. I feel the current approach is simpler and straightforward. Let me know if you think the inheritance implementation is still needed, but seems we do need to add additional configuration. {noformat} Queue level max inherits the value from its parent if it is not explicitly set If queue level max is set explicitly, then it is honored without considering its parents {noformat} > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725466#comment-16725466 ] Aihua Xu edited comment on YARN-9116 at 12/20/18 12:41 AM: --- Patch-1: in this patch, add the simple logic to give the default memory/vcore values to the queues if no configuration is set for such queues. A new configuration "yarn.scheduler.capacity.default-queue-maximum-allocation" is added to set the queue default for maximum allocation. I didn't implement queue inheritance since feel this would keep the configuration simpler. Let me know if it's needed and I can do that in the followup. was (Author: aihuaxu): Patch-1: in this patch, add the simple logic to give the default memory/vcore values to the queues if no configuration is set for such queues. A new configuration "yarn.scheduler.capacity.default-queue-maximum-allocation" is added to set the queue default for maximum allocation in the configuration. > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9116: --- Attachment: YARN-9116.1.patch > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9116.1.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721603#comment-16721603 ] Aihua Xu commented on YARN-9116: [~cheersyang] Yes. Agree with you. I will implement toward such goal. > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720919#comment-16720919 ] Aihua Xu edited comment on YARN-9116 at 12/14/18 4:52 AM: -- [~cheersyang] As I understand from the code, the logic to inherit from the parent is not implemented yet. We need to set such properties on the leaf queues. That is another approach I'm also thinking of. And also, we may not want the child queue to have larger value than the parent queue (by following the current behavior that queue value will not be larger than global level), then we may not set on the root queue, but on the children of the root queue. was (Author: aihuaxu): [~cheersyang] As I understand from the code, the logic to inherit from the parent is not implemented yet. We need to set such properties on the leaf queues. That is another approach I'm also thinking of but need more work. > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720919#comment-16720919 ] Aihua Xu commented on YARN-9116: [~cheersyang] As I understand from the code, the logic to inherit from the parent is not implemented yet. We need to set such properties on the leaf queues. That is another approach I'm also thinking of but need more work. > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720436#comment-16720436 ] Aihua Xu commented on YARN-9116: Thanks [~leftnoteasy]. I will look into what you mentioned above. Sounds like a good implementation to handle different resource types. Originally I was only thinking of memory and vCores. [~cheersyang] What I'm trying to achieve is to configure a larger container queue. To my understanding from the implementation of YARN-1582, we have to do the following steps: # Configure the global maximum-allocation to 120G/256vCores # Configure regular queues to 16G/16vCores or desired values # Configure larger container queue to 120G/256vCores The default queue-default I'm talking about is just to set to 16G/16vCores in this case. Without such default value, you have to set for all the queues. This is just the default value and you can set the desired one if the queue need a different value. > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9116: --- Description: YARN-1582 adds the support of maximum-allocation-mb configuration per queue which is targeting to support larger container features on dedicated queues (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . While to achieve larger container configuration, we need to increase the global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and then override those configurations with desired values on the queues since queue configuration can't be larger than cluster configuration. There are many queues in the system and if we forget to configure such values when adding a new queue, then such queue gets default 120G/256 which typically is not what we want. We can come up with a queue-default configuration (set to normal queue configuration like 16G/8), so the leaf queues gets such values by default. was: YARN-1582 adds the support of maximum-allocation-mb configuration per queue which is targeting to support larger container features on dedicated queues (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . While to achieve larger container configuration, we need to increase the global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and then override those configurations with desired values on the queues since queue configuration can't be larger than cluster configuration. If we forget to configure such values when adding a new queue, then such queue gets default 120G/256 which typically is not what we want. We can come up with a queue-default configuration (set to normal queue configuration like 16G/8), so the leaf queues gets such values by default. > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. There are > many queues in the system and if we forget to configure such values when > adding a new queue, then such queue gets default 120G/256 which typically is > not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
[ https://issues.apache.org/jira/browse/YARN-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9116: --- Description: YARN-1582 adds the support of maximum-allocation-mb configuration per queue which is targeting to support larger container features on dedicated queues (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . While to achieve larger container configuration, we need to increase the global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and then override those configurations with desired values on the queues since queue configuration can't be larger than cluster configuration. If we forget to configure such values when adding a new queue, then such queue gets default 120G/256 which typically is not what we want. We can come up with a queue-default configuration (set to normal queue configuration like 16G/8), so the leaf queues gets such values by default. was: YARN-1582 adds the support of maximum-allocation-mb configuration per queue which is targeting to support larger container features on dedicated queues (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . While to achieve larger container configuration, we need to increase the global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and then override those configurations with desired values on the queues since queue configuration can't be larger than cluster configuration. If we forget to configure such values when adding a new queue, then such queue gets default 120G/256 which typically is not what we want. We can come up with a top-queue-default configuration (set to normal queue configuration like 16G/8), so top queue and its children gets such values by default. > Capacity Scheduler: add the default maximum-allocation-mb and > maximum-allocation-vcores for the queues > -- > > Key: YARN-9116 > URL: https://issues.apache.org/jira/browse/YARN-9116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue > which is targeting to support larger container features on dedicated queues > (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . > While to achieve larger container configuration, we need to increase the > global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and > then override those configurations with desired values on the queues since > queue configuration can't be larger than cluster configuration. If we forget > to configure such values when adding a new queue, then such queue gets > default 120G/256 which typically is not what we want. > We can come up with a queue-default configuration (set to normal queue > configuration like 16G/8), so the leaf queues gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9115) Capacity Scheduler: larger container configuration improvement on the queue level
[ https://issues.apache.org/jira/browse/YARN-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9115: --- Description: We are trying to use the feature introduced from YARN-1582 to configure one larger container queue while we are seeing some issues or inconvenience. Use this jira to track the tasks for the improvement. > Capacity Scheduler: larger container configuration improvement on the queue > level > - > > Key: YARN-9115 > URL: https://issues.apache.org/jira/browse/YARN-9115 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > We are trying to use the feature introduced from YARN-1582 to configure one > larger container queue while we are seeing some issues or inconvenience. Use > this jira to track the tasks for the improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9055) Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration
[ https://issues.apache.org/jira/browse/YARN-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719521#comment-16719521 ] Aihua Xu commented on YARN-9055: We have the assumption that queue configuration won't be greater than cluster configuration and some clients are using that to fail earlier by comparing the requested resources against the cluster configuration (e.g. the spark https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L177). We can't simply remove such check. > Capacity Scheduler: allow larger queue level maximum-allocation-mb to > override the cluster configuration > > > Key: YARN-9055 > URL: https://issues.apache.org/jira/browse/YARN-9055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9055.1.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue. > That feature gives the flexibility to give different memory requirements for > different queues. Such patch adds the limitation that the queue level > configuration can't exceed the cluster level default configuration, but I > feel it may make more sense to remove such limitation to allow any overrides > since > # Such configuration is controlled by the admin so it shouldn't get abused; > # It's common that typical queues require standard size containers while some > job (queues) have requirements for larger containers. With current > limitation, we have to set larger configuration on the cluster setting which > will cause resource abuse unless we override them on all the queues. > We can remove such limitation in CapacitySchedulerConfiguration.java so the > cluster setting provides the default value and queue setting can override it. > {noformat} >if (maxAllocationMbPerQueue > clusterMax.getMemorySize() > || maxAllocationVcoresPerQueue > clusterMax.getVirtualCores()) { > throw new IllegalArgumentException( > "Queue maximum allocation cannot be larger than the cluster setting" > + " for queue " + queue > + " max allocation per queue: " + result > + " cluster setting: " + clusterMax); > } > {noformat} > Let me know if it makes sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9116) Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues
Aihua Xu created YARN-9116: -- Summary: Capacity Scheduler: add the default maximum-allocation-mb and maximum-allocation-vcores for the queues Key: YARN-9116 URL: https://issues.apache.org/jira/browse/YARN-9116 Project: Hadoop YARN Issue Type: Sub-task Components: capacity scheduler Affects Versions: 2.7.0 Reporter: Aihua Xu Assignee: Aihua Xu YARN-1582 adds the support of maximum-allocation-mb configuration per queue which is targeting to support larger container features on dedicated queues (larger maximum-allocation-mb/maximum-allocation-vcores for such queue) . While to achieve larger container configuration, we need to increase the global maximum-allocation-mb/maximum-allocation-vcores (e.g. 120G/256) and then override those configurations with desired values on the queues since queue configuration can't be larger than cluster configuration. If we forget to configure such values when adding a new queue, then such queue gets default 120G/256 which typically is not what we want. We can come up with a top-queue-default configuration (set to normal queue configuration like 16G/8), so top queue and its children gets such values by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9115) Capacity Scheduler: larger container configuration improvement on the queue level
Aihua Xu created YARN-9115: -- Summary: Capacity Scheduler: larger container configuration improvement on the queue level Key: YARN-9115 URL: https://issues.apache.org/jira/browse/YARN-9115 Project: Hadoop YARN Issue Type: Improvement Components: capacity scheduler Affects Versions: 2.7.0 Reporter: Aihua Xu Assignee: Aihua Xu -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9055) Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration
[ https://issues.apache.org/jira/browse/YARN-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9055: --- Issue Type: Sub-task (was: Improvement) Parent: YARN-9115 > Capacity Scheduler: allow larger queue level maximum-allocation-mb to > override the cluster configuration > > > Key: YARN-9055 > URL: https://issues.apache.org/jira/browse/YARN-9055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9055.1.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue. > That feature gives the flexibility to give different memory requirements for > different queues. Such patch adds the limitation that the queue level > configuration can't exceed the cluster level default configuration, but I > feel it may make more sense to remove such limitation to allow any overrides > since > # Such configuration is controlled by the admin so it shouldn't get abused; > # It's common that typical queues require standard size containers while some > job (queues) have requirements for larger containers. With current > limitation, we have to set larger configuration on the cluster setting which > will cause resource abuse unless we override them on all the queues. > We can remove such limitation in CapacitySchedulerConfiguration.java so the > cluster setting provides the default value and queue setting can override it. > {noformat} >if (maxAllocationMbPerQueue > clusterMax.getMemorySize() > || maxAllocationVcoresPerQueue > clusterMax.getVirtualCores()) { > throw new IllegalArgumentException( > "Queue maximum allocation cannot be larger than the cluster setting" > + " for queue " + queue > + " max allocation per queue: " + result > + " cluster setting: " + clusterMax); > } > {noformat} > Let me know if it makes sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9055) Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration
[ https://issues.apache.org/jira/browse/YARN-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701228#comment-16701228 ] Aihua Xu commented on YARN-9055: Thanks [~tgraves] for the comment. Definitely it will introduce different behaviors. In the jira YARN-1582, we were trying to address the issues that some applications may request larger containers. How will you achieve that with minimal configuration? What I can think of is: you have to increase the cluster configuration and override on the queue level which doesn't require larger containers. > Capacity Scheduler: allow larger queue level maximum-allocation-mb to > override the cluster configuration > > > Key: YARN-9055 > URL: https://issues.apache.org/jira/browse/YARN-9055 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9055.1.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue. > That feature gives the flexibility to give different memory requirements for > different queues. Such patch adds the limitation that the queue level > configuration can't exceed the cluster level default configuration, but I > feel it may make more sense to remove such limitation to allow any overrides > since > # Such configuration is controlled by the admin so it shouldn't get abused; > # It's common that typical queues require standard size containers while some > job (queues) have requirements for larger containers. With current > limitation, we have to set larger configuration on the cluster setting which > will cause resource abuse unless we override them on all the queues. > We can remove such limitation in CapacitySchedulerConfiguration.java so the > cluster setting provides the default value and queue setting can override it. > {noformat} >if (maxAllocationMbPerQueue > clusterMax.getMemorySize() > || maxAllocationVcoresPerQueue > clusterMax.getVirtualCores()) { > throw new IllegalArgumentException( > "Queue maximum allocation cannot be larger than the cluster setting" > + " for queue " + queue > + " max allocation per queue: " + result > + " cluster setting: " + clusterMax); > } > {noformat} > Let me know if it makes sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9055) Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration
[ https://issues.apache.org/jira/browse/YARN-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9055: --- Attachment: YARN-9055.1.patch > Capacity Scheduler: allow larger queue level maximum-allocation-mb to > override the cluster configuration > > > Key: YARN-9055 > URL: https://issues.apache.org/jira/browse/YARN-9055 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: YARN-9055.1.patch > > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue. > That feature gives the flexibility to give different memory requirements for > different queues. Such patch adds the limitation that the queue level > configuration can't exceed the cluster level default configuration, but I > feel it may make more sense to remove such limitation to allow any overrides > since > # Such configuration is controlled by the admin so it shouldn't get abused; > # It's common that typical queues require standard size containers while some > job (queues) have requirements for larger containers. With current > limitation, we have to set larger configuration on the cluster setting which > will cause resource abuse unless we override them on all the queues. > We can remove such limitation in CapacitySchedulerConfiguration.java so the > cluster setting provides the default value and queue setting can override it. > {noformat} >if (maxAllocationMbPerQueue > clusterMax.getMemorySize() > || maxAllocationVcoresPerQueue > clusterMax.getVirtualCores()) { > throw new IllegalArgumentException( > "Queue maximum allocation cannot be larger than the cluster setting" > + " for queue " + queue > + " max allocation per queue: " + result > + " cluster setting: " + clusterMax); > } > {noformat} > Let me know if it makes sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9055) Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration
[ https://issues.apache.org/jira/browse/YARN-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699460#comment-16699460 ] Aihua Xu commented on YARN-9055: [~jlowe], [~leftnoteasy] and [~tgraves] Can you guys have any opinions on this? > Capacity Scheduler: allow larger queue level maximum-allocation-mb to > override the cluster configuration > > > Key: YARN-9055 > URL: https://issues.apache.org/jira/browse/YARN-9055 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > YARN-1582 adds the support of maximum-allocation-mb configuration per queue. > That feature gives the flexibility to give different memory requirements for > different queues. Such patch adds the limitation that the queue level > configuration can't exceed the cluster level default configuration, but I > feel it may make more sense to remove such limitation to allow any overrides > since > # Such configuration is controlled by the admin so it shouldn't get abused; > # It's common that typical queues require standard size containers while some > job (queues) have requirements for larger containers. With current > limitation, we have to set larger configuration on the cluster setting which > will cause resource abuse unless we override them on all the queues. > We can remove such limitation in CapacitySchedulerConfiguration.java so the > cluster setting provides the default value and queue setting can override it. > {noformat} >if (maxAllocationMbPerQueue > clusterMax.getMemorySize() > || maxAllocationVcoresPerQueue > clusterMax.getVirtualCores()) { > throw new IllegalArgumentException( > "Queue maximum allocation cannot be larger than the cluster setting" > + " for queue " + queue > + " max allocation per queue: " + result > + " cluster setting: " + clusterMax); > } > {noformat} > Let me know if it makes sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9055) Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration
Aihua Xu created YARN-9055: -- Summary: Capacity Scheduler: allow larger queue level maximum-allocation-mb to override the cluster configuration Key: YARN-9055 URL: https://issues.apache.org/jira/browse/YARN-9055 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.7.0 Reporter: Aihua Xu Assignee: Aihua Xu YARN-1582 adds the support of maximum-allocation-mb configuration per queue. That feature gives the flexibility to give different memory requirements for different queues. Such patch adds the limitation that the queue level configuration can't exceed the cluster level default configuration, but I feel it may make more sense to remove such limitation to allow any overrides since # Such configuration is controlled by the admin so it shouldn't get abused; # It's common that typical queues require standard size containers while some job (queues) have requirements for larger containers. With current limitation, we have to set larger configuration on the cluster setting which will cause resource abuse unless we override them on all the queues. We can remove such limitation in CapacitySchedulerConfiguration.java so the cluster setting provides the default value and queue setting can override it. {noformat} if (maxAllocationMbPerQueue > clusterMax.getMemorySize() || maxAllocationVcoresPerQueue > clusterMax.getVirtualCores()) { throw new IllegalArgumentException( "Queue maximum allocation cannot be larger than the cluster setting" + " for queue " + queue + " max allocation per queue: " + result + " cluster setting: " + clusterMax); } {noformat} Let me know if it makes sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org