[ https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258977#comment-16258977 ]
Wilfred Spiegelenburg commented on YARN-7534: --------------------------------------------- I would like to work on this one if you don't mind I think two things are getting mixed up: the queue used resources are not linked to the node. It is the sum of all the resources of containers from applications that run in a queue. A node heartbeat with a changed usage does not mean that the usage changed because an application in the queue has changed it. It could have changed due to a different queue/application adding a container. We're also not allocating anything just yet and have thus not gone over. When the application is updated, at a later point in time, that is when we do that check. We just have a preliminary check here to see if we can offer this node to the queue. Another point to take into account: we are not checking what the application asked for here. That is the next step that follows just below when we run over all the applications that have a demand: {code} for (FSAppAttempt sched : fetchAppsWithDemand(true)) { if (SchedulerAppUtils.isPlaceBlacklisted(sched, node, LOG)) { continue; } assigned = sched.assignContainer(node); {code} This is the earliest we can find what the ask is. If there are more applications with a demand for the queue we walk over the list. We call [assignContainer |https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L830] and that is where the checks happen. One of the checks we perform is in hasContainerForNode for the FSAppAttempt: {code} } else if (!getQueue().fitsInMaxShare(resource)) { // The requested container must fit in queue maximum share updateAMDiagnosticMsg(resource, " exceeds current queue or its parents maximum resource allowed)."); ret = false; {code} Which makes the allocation fail and thus we drop out and check the next request for the application and if that all fails we check the next application in the list from apps with demand. Do you have any logs that show that this is not working as it should? > Fair scheduler assign resources may exceed maxResources > ------------------------------------------------------- > > Key: YARN-7534 > URL: https://issues.apache.org/jira/browse/YARN-7534 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Reporter: YunFan Zhou > > The logic we're scheduling now is to check whether the resources used by the > queue has exceeded *maxResources* before assigning the container. This will > leads to the fact that after assigning this container the queue uses more > resources than *maxResources*. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org