[ https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhihai xu updated YARN-4133: ---------------------------- Attachment: (was: YARN-4133.000.patch) > Containers to be preempted leaks in FairScheduler preemption logic. > ------------------------------------------------------------------- > > Key: YARN-4133 > URL: https://issues.apache.org/jira/browse/YARN-4133 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.7.1 > Reporter: zhihai xu > Assignee: zhihai xu > Attachments: YARN-4133.000.patch > > > Containers to be preempted leaks in FairScheduler preemption logic. It may > cause missing preemption due to containers in {{warnedContainers}} wrongly > removed. The problem is in {{preemptResources}}: > There are two issues which can cause containers wrongly removed from > {{warnedContainers}}: > Firstly missing the container state {{RMContainerState.ACQUIRED}} in the > condition check: > {code} > (container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) > {code} > Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we > shouldn't remove container from {{warnedContainers}}. We should only remove > container from {{warnedContainers}}, if container is not in state > {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and > {{RMContainerState.ACQUIRED}}. > {code} > if ((container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) && > isResourceGreaterThanNone(toPreempt)) { > warnOrKillContainer(container); > Resources.subtractFrom(toPreempt, > container.getContainer().getResource()); > } else { > warnedIter.remove(); > } > {code} > Also once the containers in {{warnedContainers}} are wrongly removed, it will > never be preempted. Because these containers are already in > {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't > return the containers in {{FSAppAttempt#preemptionMap}}. > {code} > public RMContainer preemptContainer() { > if (LOG.isDebugEnabled()) { > LOG.debug("App " + getName() + " is going to preempt a running " + > "container"); > } > RMContainer toBePreempted = null; > for (RMContainer container : getLiveContainers()) { > if (!getPreemptionContainers().contains(container) && > (toBePreempted == null || > comparator.compare(toBePreempted, container) > 0)) { > toBePreempted = container; > } > } > return toBePreempted; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)