[ https://issues.apache.org/jira/browse/YARN-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517235#comment-14517235 ]
Hudson commented on YARN-3464: ------------------------------ SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2127 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2127/]) Update CHANGES.txt - Pulled in YARN-3465, YARN-3516, and YARN-3464 to branch-2.7 (for 2.7.1) (kasha: rev 32cd2c8d429ddb87348299c00b7d851246a25b4e) * hadoop-yarn-project/CHANGES.txt > Race condition in LocalizerRunner kills localizer before localizing all > resources > --------------------------------------------------------------------------------- > > Key: YARN-3464 > URL: https://issues.apache.org/jira/browse/YARN-3464 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: zhihai xu > Assignee: zhihai xu > Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3464.000.patch, YARN-3464.001.patch > > > Race condition in LocalizerRunner causes container localization timeout. > Currently LocalizerRunner will kill the ContainerLocalizer when pending list > for LocalizerResourceRequestEvent is empty. > {code} > } else if (pending.isEmpty()) { > action = LocalizerAction.DIE; > } > {code} > If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the > ContainerLocalizer due to empty pending list, this > LocalizerResourceRequestEvent will never be handled. > Without ContainerLocalizer, LocalizerRunner#update will never be called. > The container will stay at LOCALIZING state, until the container is killed by > AM due to TASK_TIMEOUT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)