[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469147#comment-16469147 ]
Billie Rinaldi commented on YARN-8265: -------------------------------------- [~eyang], thanks for working on this patch. There seem to be two problems; one is that a BECOME_READY event does not start the container status retriever, but a bigger problem is that I think I was mistaken about when onContainerRestart is received. It looks like it is only received after the AM initiates a container restart, not after the NM relaunches a container. I don't see an NM client callback for informing the AM when the NM has decided to perform a container relaunch. So, I'll have to think about whether there is a workaround we could do in the AM, or if we'll just have to wait to fix this issue until a new NM callback is implemented. > AM should retrieve new IP for restarted container > ------------------------------------------------- > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Affects Versions: 3.1.0 > Reporter: Eric Yang > Assignee: Eric Yang > Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8265.001.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org