[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Payne updated YARN-3769: ----------------------------- Attachment: YARN-3769-branch-2.7.006.patch [~leftnoteasy], thanks for your comments. {quote} The problem is getUserResourceLimit is not always updated by scheduler. If a queue is not traversed by scheduler OR apps of a queue-user have long heartbeat interval, the user resource limit could be staled. {quote} Got it {quote} I found 0005 patch for trunk is computing user-limit every time and 0005 patch for 2.7 is using getUserResourceLimit. {quote} Yes, I was concerned about using the 2.7 version of {{computeUserLimit}}. It is different than the branch-2 and trunk versions, and it expects a {{required}} parameter which, in 2.7, is calculated in {{assignContainers}} based on an app's capability requests for a given container priority. I noticed that in branch-2 and trunk, it looks like this {{required}} parameter is just given the value of {{minimumAllocation}}. So, in {{YARN-3769-branch-2.7.006.patch}} I passed {{minimumAllocation}} in the {{required}} parameter of {{computeUserLimit}}. > Preemption occurring unnecessarily because preemption doesn't consider user > limit > --------------------------------------------------------------------------------- > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 2.6.0, 2.7.0, 2.8.0 > Reporter: Eric Payne > Assignee: Eric Payne > Attachments: YARN-3769-branch-2.002.patch, > YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, > YARN-3769-branch-2.7.005.patch, YARN-3769-branch-2.7.006.patch, > YARN-3769.001.branch-2.7.patch, YARN-3769.001.branch-2.8.patch, > YARN-3769.003.patch, YARN-3769.004.patch, YARN-3769.005.patch > > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)