[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627594#comment-14627594 ]
Rohith Sharma K S commented on YARN-3535: ----------------------------------------- {code} for (ApplicationId appId : reconnectEvent.getRunningApplications()) { handleRunningAppOnNode(rmNode, rmNode.context, appId, rmNode.nodeId); } {code} IIUC, This code will update RMApp about node details so that RMApp get to know that its some containers has run on this node. And this part of code does not kill the existing running containers. Running containers are killed when the NodeRemoved event is triggered to schedulers, and this event will be triggered by RMNodeImpl#Reconnected transition if noAppsRunning. > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > --------------------------------------------------------------------------------------------- > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Peng Zhang > Assignee: Peng Zhang > Priority: Critical > Attachments: 0003-YARN-3535.patch, YARN-3535-001.patch, > YARN-3535-002.patch, syslog.tgz, yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)