[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kuhu Shukla updated YARN-4280: ------------------------------ Attachment: YARN-4280.007.patch Thank you so much [~leftnoteasy] for the detailed review and offline explanation. I have rectified the patch for Point#1, which subtracts max(child.headroom,none()) from parentLimits if QUEUE_SKIPPED is received. For point 2. I think it would still work as follows: Given the queue configuration in the above example with all queues max-capacity=100%, when the first QUEUE_SKIPPED is received from a1 to a, the parent limit for a will be set to (50-2) since childlimits.getHeadroom will be 2. Now when {{getResourceLimitsOfChild}} is called with parentLimits=48, the value of {{parentMaxAvailableResource}} will be zero and the childLimit for a2 will be (0+24) which would inhibit a2 to go through with assignment request of 1. Let me know your thoughts/concerns regarding this. Thanks a lot! > CapacityScheduler reservations may not prevent indefinite postponement on a > busy cluster > ---------------------------------------------------------------------------------------- > > Key: YARN-4280 > URL: https://issues.apache.org/jira/browse/YARN-4280 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler > Affects Versions: 2.6.1, 2.8.0, 2.7.1 > Reporter: Kuhu Shukla > Assignee: Kuhu Shukla > Attachments: YARN-4280.001.patch, YARN-4280.002.patch, > YARN-4280.003.patch, YARN-4280.004.patch, YARN-4280.005.patch, > YARN-4280.006.patch, YARN-4280.007.patch > > > Consider the following scenario: > There are 2 queues A(25% of the total capacity) and B(75%), both can run at > total cluster capacity. There are 2 applications, appX that runs on Queue A, > always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2 > GB containers. > The user limit is high enough for the application to reach 100% of the > cluster resource. > appX is running at total cluster capacity, full with 1G containers releasing > only one container at a time. appY comes in with a request of 2GB container > but only 1 GB is free. Ideally, since appY is in the underserved queue, it > has higher priority and should reserve for its 2 GB request. Since this > request puts the alloc+reserve above total capacity of the cluster, > reservation is not made. appX comes in with a 1GB request and since 1GB is > still available, the request is allocated. > This can continue indefinitely causing priority inversion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org