[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589432#comment-16589432 ]
Jason Lowe commented on YARN-8649: ---------------------------------- Thanks for updating the patch! Logic looks good overall, but I have some concerns on the logging that was added. I think it's misleading to assume the NM is shutting down when this situation occurs. As I understand it, the main trigger for this scenario is a container getting killed while it is still localizing. That can happen when the NM shuts down, but it can also happen without the NM shutting down. Therefore it seems inappropriate to assume this scenario means the NM is shutting down. There are already separate logs when the NM decides to shut down so probably best to keep this logging to just the fact that the resource was removed before we got around to localizing it and therefore will no longer be localized. The warning log should show the source resource, similar to what is done in the public localization debug code that was added, rather than the local path. The local path won't mean as much as the resource that was requested, as that source resource path was logged when it was initially requested by the container. There is debug logging in the public localizer case but not the private case which is inconsistent. Arguably if it's useful for the public case it would be useful for the private case. Given there's a loud warning log already in the common getPathForLocalization code, I'm not sure the debug log in the public path adds any value, especially if we change the loud warning log to show the source path. > Similar as YARN-4355:NPE while processing localizer heartbeat > ------------------------------------------------------------- > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.1.1 > Reporter: lujie > Assignee: lujie > Priority: Major > Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, > YARN-8649_4.patch, hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org