[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627576#comment-14627576 ]
Rohith Sharma K S commented on YARN-3535: ----------------------------------------- bq. For preemption, container killed has two cases: container already pulled by AM or not. For 1st case, AM should know container is killed, and AM will re-ask container for task. For the case container not pull by AM, preemption killing caused the same case of this issue. So I think it should not be recovered twice. ahh, you are right. Basically if RMContainer is not pulled by AM, then its state is ALLOCATED. On preempting RMContainer, resource request was recovered twise i.e 1. This jira fix 2. Kill Container event in CS. So removing *recoverResourceRequestForContainer(cont);* is make sense to me. > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > --------------------------------------------------------------------------------------------- > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Peng Zhang > Assignee: Peng Zhang > Priority: Critical > Attachments: 0003-YARN-3535.patch, YARN-3535-001.patch, > YARN-3535-002.patch, syslog.tgz, yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)