[ https://issues.apache.org/jira/browse/YARN-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585372#comment-14585372 ]
Yuliya Feldman commented on YARN-3803: -------------------------------------- This situation is easily reproducible while running any M/R job as user with id < 500 on a cluster with single NM using LinuxContainerExecutor. So far the only solution I found is to proceed with localization in DuplicateFetchResourceTransition if ref == 0. This solution does not seem to look very clean according to state transitions, but there is no otherwise any evidence that previous container localization failed. I would appreciate comments/thoughts on this > Application hangs after more then one localization attempt fails on the same > NM > ------------------------------------------------------------------------------- > > Key: YARN-3803 > URL: https://issues.apache.org/jira/browse/YARN-3803 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.7.0, 2.5.1 > Reporter: Yuliya Feldman > Assignee: Yuliya Feldman > Priority: Minor > > In the sandbox (single node) environment with LinuxContainerExecutor when > first Application Localization attempt fails second attempt can not proceed > and subsequently application hangs until RM kills it as non-responding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)