[ 
https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1354:
-----------------------------

    Attachment: YARN-1354-v1.patch

Patch that persists applications to a leveldb state store when recovery is 
enabled.  This patch also addresses YARN-1355 because app acls are persisted as 
part of the app details.

The review for MAPREDUCE-5652 noted a potential issue with application 
completion events being lost as the NM goes down, and one way to mitigate that 
would be sending the list of active applications to the RM when the NM 
registers.  Then the RM can update the NM with any finished applications on the 
response or the next NM heartbeat.  That's not yet addressed with this initial 
patch, as I wanted to keep the patch size manageable and get some initial 
feedback.  After the feedback we can decide whether to address that corner case 
as part of this change or in a followup JIRA.

> Recover applications upon nodemanager restart
> ---------------------------------------------
>
>                 Key: YARN-1354
>                 URL: https://issues.apache.org/jira/browse/YARN-1354
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1354-v1.patch
>
>
> The set of active applications in the nodemanager context need to be 
> recovered for work-preserving nodemanager restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to