[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281045#comment-15281045 ]
Subru Krishnan commented on YARN-1815: -------------------------------------- [~jianhe], there are actually two problems I noticed while working on this: * With work-preserving restart in place, UAM works across RM restarts but all it's running containers are killed during recovery. I have a fix for this. I tested this with [~ellenfkh] and UAM works across RM restarts _without_ loosing any work. * We then hit the issue of UAM final state not being recorded as subsequent failovers of RM brings the Unmanaged apps back to _ACCEPTED_ state even though they had _COMPLETED_ in the past. > RM should record final state for unmanaged AMs > ----------------------------------------------- > > Key: YARN-1815 > URL: https://issues.apache.org/jira/browse/YARN-1815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Affects Versions: 2.3.0 > Reporter: Karthik Kambatla > Assignee: Subru Krishnan > Priority: Critical > Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, > yarn-1815-2.patch, yarn-1815-2.patch > > > RM should record final state for unmanaged AMs -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org