[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256541#comment-15256541 ]
Sunil G commented on YARN-4390: ------------------------------- HI [~leftnoteasy] Thanks for sharing patch here, Few comments/doubts from my side. 1. In {{CapacitySchedulerPreemptionUtils#deductPreemptableResourcesBasedSelectedCandidates}}, partition is retrieved from {{SchedulerNode}}. I think we can get this from {{RMContainer#getNodeLabelExpression}} itself considering we will update RMContainer labelExpression (via a set call) while we invoke {{replaceLabelsOnNode}}. 2. {{context.getQueueByPartition}} has a code path flow to return NULL eventhough its not possible to happen. With this assumption, null check is not handled in caller side. So is it better to raise an exception and skip that round of preemption to add more clarify? 3. From {{FifoCandidatesSelector}}, {code} CapacitySchedulerPreemptionUtils 72 .deductPreemptableResourcesBasedSelectedCandidates( {code} Ideally this code snippet helps to deduct already selected containers resource from its corresponding {{TempQueuePartition}}. On that note, I think this code need not have to be inside one such selector. Rather it may be better suited from the caller end and ensure that after each round of selection, possible deduction is done. 4. One doubt about the newly changed {{PreemptableResourceCalculator}} ctor. Could we get the information about {{considersReservedResourceWhenCalculateIdeal}} from {{preemptionContext}} itself. We could add this as a getter api so that code may be more clean. 5. {{getNodesForPreemption}} in {{ReservedContainerCandidatesSelector}} has to scan all nodes in cluster to get whether any node has a reserved container or not. For recent UI work, I thought having a metric to know the count of reserved nodes in cluster. Is it costlier to keep a reserved nodes list in scheduler? If so, could that be used here. Its just a thought, I havent really checked the cost of keeping such list. So its fine to scrap this idea also :) 6. In {{ReservedContainerCandidatesSelector}}, containers are sorted against launch time. Is it possible that we may select a high priority latest container against some low priority slightly old one. 7. As I see {{RMContainer}} is read-only interface. I feel {{setQueueName}} can be moved out from this interface and place in {{RMContainerImpl}}. I think we could downcast and use the same. Minor nits: 1. {{ReservedContainerCandidatesSelector#getPreemptionCandidatesOnNode}} has some commented code. It can be removed. 2. in {{TempQueuePerPartition}}, could we add {{reserved}} to {{toString}} for logging. I think test class looks very clean and readable after new framework change. It is very good and really useful :). I thought of trying this framework sometime tomorrow,and will share if I have comments. > Do surgical preemption based on reserved container in CapacityScheduler > ----------------------------------------------------------------------- > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler > Affects Versions: 3.0.0, 2.8.0, 2.7.3 > Reporter: Eric Payne > Assignee: Wangda Tan > Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, > YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, > YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch > > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)