[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398328#comment-15398328 ]
Jason Lowe commented on YARN-4280: ---------------------------------- Thanks for updating the patch! The parent queue code is now significantly cleaner. The new CSAssignment copy constructor is no longer used, which made me wonder if we missed a case. Consider a scenario where a parent queue has at least two child queues. Trying to assign to one queue returns an amount to be blocked, so the parent limits are adjusted. Then trying to subsequently assign to another child queue returns yet another blocked result, but we will end up returning the second child queue's blocked amount to the parent's parent, which ignores the amount needed by the first (and higher priority at the moment) child queue. > CapacityScheduler reservations may not prevent indefinite postponement on a > busy cluster > ---------------------------------------------------------------------------------------- > > Key: YARN-4280 > URL: https://issues.apache.org/jira/browse/YARN-4280 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler > Affects Versions: 2.6.1, 2.8.0, 2.7.1 > Reporter: Kuhu Shukla > Assignee: Kuhu Shukla > Attachments: YARN-4280-branch-2.009.patch, > YARN-4280-branch-2.8.001.patch, YARN-4280-branch-2.8.002.patch, > YARN-4280-branch-2.8.003.patch, YARN-4280.001.patch, YARN-4280.002.patch, > YARN-4280.003.patch, YARN-4280.004.patch, YARN-4280.005.patch, > YARN-4280.006.patch, YARN-4280.007.patch, YARN-4280.008.patch, > YARN-4280.008_.patch, YARN-4280.009.patch, YARN-4280.010.patch, > YARN-4280.011.patch, YARN-4280.012.patch, YARN-4280.013.patch > > > Consider the following scenario: > There are 2 queues A(25% of the total capacity) and B(75%), both can run at > total cluster capacity. There are 2 applications, appX that runs on Queue A, > always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2 > GB containers. > The user limit is high enough for the application to reach 100% of the > cluster resource. > appX is running at total cluster capacity, full with 1G containers releasing > only one container at a time. appY comes in with a request of 2GB container > but only 1 GB is free. Ideally, since appY is in the underserved queue, it > has higher priority and should reserve for its 2 GB request. Since this > request puts the alloc+reserve above total capacity of the cluster, > reservation is not made. appX comes in with a 1GB request and since 1GB is > still available, the request is allocated. > This can continue indefinitely causing priority inversion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org