[ https://issues.apache.org/jira/browse/YARN-10839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siddharth Ahuja reassigned YARN-10839: -------------------------------------- Assignee: Siddharth Ahuja > queueMaxAppsDefault when set blindly caps the root queue's maxRunningApps > setting to this value ignoring any individually overriden maxRunningApps > setting for child queues in FairScheduler > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-10839 > URL: https://issues.apache.org/jira/browse/YARN-10839 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Siddharth Ahuja > Assignee: Siddharth Ahuja > Priority: Major > > [queueMaxAppsDefault|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Allocation_file_format] > sets the default running app limit for queues (including the root queue) > which can be overridden by individual child queues through the maxRunningApps > setting. > Consider a simple FairScheduler XML as follows: > {code} > <?xml version="1.0" encoding="UTF-8" standalone="yes"?> > <allocations> > <queue name="root"> > <weight>1.0</weight> > <schedulingPolicy>drf</schedulingPolicy> > <aclSubmitApps>*</aclSubmitApps> > <aclAdministerApps>*</aclAdministerApps> > <queue name="default"> > <weight>1.0</weight> > <schedulingPolicy>drf</schedulingPolicy> > </queue> > <queue name="A"> > <minResources>1024000 mb, 1000 vcores</minResources> > <maxRunningApps>15</maxRunningApps> > <weight>2.0</weight> > <schedulingPolicy>drf</schedulingPolicy> > </queue> > <queue name="B"> > <minResources>512000 mb, 500 vcores</minResources> > <maxRunningApps>10</maxRunningApps> > <weight>1.0</weight> > <schedulingPolicy>drf</schedulingPolicy> > </queue> > </queue> > <queueMaxAppsDefault>3</queueMaxAppsDefault> > <defaultQueueSchedulingPolicy>drf</defaultQueueSchedulingPolicy> > <queuePlacementPolicy> > <rule name="specified" create="true"/> > <rule name="user" create="true"/> > </queuePlacementPolicy> > </allocations> > {code} > Here: > * {{queueMaxAppsDefault}} is set to 3 {{maxRunningApps}} by default. > * root queue does not have any maxRunningApps limit set, > * maxRunningApps for child queues - root.A is 15 and for root.B is 10. > From above, if users wants to submit jobs to root.B, they are (incorrectly) > capped to 3, not 15 because the root queue (parent) itself is capped to 3 > because of the queueMaxAppsDefault setting. > Users' observations are thus seeing their apps stuck in ACCEPTED state. > Either the above FairScheduler XML should have been rejected by the > ResourceManager, or, the root queue should have been capped to the maximum > maxRunningApps setting defined for a leaf queue. > Possible solution -> If root queue has no maxRunningApps set and > queueMaxAppsDefault is set to a lower value than maxRunningApps for an > individual leaf queue, then, the root queue should implicitly be capped to > the latter, instead of queueMaxAppsDefault. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org