[ 
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985299#comment-14985299
 ] 

Jason Lowe commented on YARN-4051:
----------------------------------

bq. For RM finish application or complete container request, let RM retry, 
seems a little complicated,should we do that?
Is it possible for the finish application or complete container requests to 
arrive at this point?  We should not be registering with the RM until we've 
completed the container recovery process.  As such, it should be impossible to 
be told by the RM these things as we should not even be talking to it at that 
point.  Similarly, I believe the cleanest fix for the stop container request 
race is to avoid opening the client port until all the containers have 
recovered.  I know there's some issue there where we need to know the bind 
address of the client port during recovery but don't want to start listening on 
the port yet.  If the RPC layer supported that, it'd be a lot cleaner to simply 
not "open the front doors" while we're still coming up and recovering -- then 
all these races simply aren't possible.

> ContainerKillEvent is lost when container is  In New State and is recovering
> ----------------------------------------------------------------------------
>
>                 Key: YARN-4051
>                 URL: https://issues.apache.org/jira/browse/YARN-4051
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: sandflee
>            Assignee: sandflee
>            Priority: Critical
>         Attachments: YARN-4051.01.patch, YARN-4051.02.patch, 
> YARN-4051.03.patch
>
>
> As in YARN-4050, NM event dispatcher is blocked, and container is in New 
> state, when we finish application, the container still alive even after NM 
> event dispatcher is unblocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to