[ https://issues.apache.org/jira/browse/YARN-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503959#comment-17503959 ]
tuyu commented on YARN-11083: ----------------------------- the failed testcase it not related this issue. > Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed > to accept this proposal" > ------------------------------------------------------------------------------------------------------ > > Key: YARN-11083 > URL: https://issues.apache.org/jira/browse/YARN-11083 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 3.1.0 > Reporter: tuyu > Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-11083.001.patch > > > in our cluster: 6k+ node, 600+ queues, when cluster is very busy, the commit > fail metric will more then 50+ thousand,so, we To reproduce this case: > Queue tree: > {code:java} > Root max <60G, 100> > / > A max <60G, 100> > / \ > A1 A2 > max<5G,100> max<40,70> > {code} > > Test this situation > A2 allocate <30GB, 1> then A has <30, 99> > A1 allocate <10, 1> > expected behavior is checkHeadRoom will reject this request,because queue max > capacity is <5g,100vcore>. > but getCurrentLimitResource use DominantResourceCalculator > resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is > max share, that will cause scheduler thread will allocate <10G, 1vcore> > success, but the commit thread tryCommit use AbstractCSQueue.accept > Resources.fitsIn check memory and vcore and fail the <10G, 1vcore> commit > Based on this analysis: > getCurrentLimitResource return > {code:java} > return Resources.componentwiseMin( > Resources.min(resourceCalculator, clusterResource, > queueMaxResource, currentResourceLimits.getLimit()), > queueMaxResource); > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org