[ https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242298#comment-15242298 ]
Tao Jie commented on YARN-3126: ------------------------------- I think this issue is quite common, and we have met the same problem. The root cause is that when we should make the max-limitation check in assignment, we should compare *current usage* + *resource to assign* with *max resource limit*. However when have resource to assign to a queue, we know only *current resource usage* and *max resource limit*, we don't know *resource to assign* until we assign resource to an appAttempt. This patch seems add a additional check(checkQueueResourceLimit) on *leaf queue* then assign to AppAttempt, but *parent queue* resource usage may still over max resource limit. Also we already have *FSQueue.assignContainerPreCheck* for max resource limit. If we add a new check, the former one seems to be unnecessary here. [~kasha], would like to hear your thoughts. > FairScheduler: queue's usedResource is always more than the maxResource limit > ----------------------------------------------------------------------------- > > Key: YARN-3126 > URL: https://issues.apache.org/jira/browse/YARN-3126 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.3.0 > Environment: hadoop2.3.0. fair scheduler. spark 1.1.0. > Reporter: Xia Hu > Labels: BB2015-05-TBR, assignContainer, fairscheduler, resources > Fix For: trunk-win > > Attachments: resourcelimit-02.patch, resourcelimit-test.patch, > resourcelimit.patch > > > When submitting spark application(both spark-on-yarn-cluster and > spark-on-yarn-cleint model), the queue's usedResources assigned by > fairscheduler always can be more than the queue's maxResources limit. > And by reading codes of fairscheduler, I suppose this issue happened because > of ignore to check the request resources when assign Container. > Here is the detail: > 1. choose a queue. In this process, it will check if queue's usedResource is > bigger than its max, with assignContainerPreCheck. > 2. then choose a app in the certain queue. > 3. then choose a container. And here is the question, there is no check > whether this container would make the queue sources over its max limit. If a > queue's usedResource is 13G, the maxResource limit is 16G, then a container > which asking for 4G resources may be assigned successful. > This problem will always happen in spark application, cause we can ask for > different container resources in different applications. > By the way, I have already use the patch from YARN-2083. -- This message was sent by Atlassian JIRA (v6.3.4#6332)