[ https://issues.apache.org/jira/browse/YARN-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659882#comment-16659882 ]
Billie Rinaldi commented on YARN-6167: -------------------------------------- Thanks for taking a look, [~leftnoteasy]! bq. Inside releaseContainers, why add following if? There are two conditions where the pending release needs to be remembered, when RMContainer is null and when it is not null. It's not null in the cases I've tested so far, but maybe it would be null if the RM has been restarted. bq. Why changes of RMContainerImpl required? In my testing, when an NM is killed and restarted some time later, the NM registers and the RMNode is added before the NM has recovered the containers. The containers come in later with a node update event, and they look like newly launched containers. So in this case the RM already has an RMContainer in RUNNING state, and then it gets a LAUNCHED event for that container. bq. Instead of handling "pendingRelease" for such containers, can we let AM handle it once NM comes back to normal state? AM should be notified after that. The AM patch I've started working on to test this feature is already pretty complicated, but maybe it wouldn't be too difficult to have the AM remember which containers it should release. I'll look into it. > RM option to delegate NM loss container action to AM > ---------------------------------------------------- > > Key: YARN-6167 > URL: https://issues.apache.org/jira/browse/YARN-6167 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler > Reporter: Billie Rinaldi > Assignee: Billie Rinaldi > Priority: Major > Attachments: YARN-6167.01.patch > > > Currently, if the RM times out an NM, the scheduler will kill all containers > that were running on the NM. For some applications, in the event of a > temporary NM outage, it might be better to delegate to the AM the decision > whether to kill the containers and request new containers from the RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org