[
https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated YARN-1354:
-----------------------------
Attachment: YARN-1354-v1.patch
Patch that persists applications to a leveldb state store when recovery is
enabled. This patch also addresses YARN-1355 because app acls are persisted as
part of the app details.
The review for MAPREDUCE-5652 noted a potential issue with application
completion events being lost as the NM goes down, and one way to mitigate that
would be sending the list of active applications to the RM when the NM
registers. Then the RM can update the NM with any finished applications on the
response or the next NM heartbeat. That's not yet addressed with this initial
patch, as I wanted to keep the patch size manageable and get some initial
feedback. After the feedback we can decide whether to address that corner case
as part of this change or in a followup JIRA.
> Recover applications upon nodemanager restart
> ---------------------------------------------
>
> Key: YARN-1354
> URL: https://issues.apache.org/jira/browse/YARN-1354
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: 2.3.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Attachments: YARN-1354-v1.patch
>
>
> The set of active applications in the nodemanager context need to be
> recovered for work-preserving nodemanager restart
--
This message was sent by Atlassian JIRA
(v6.2#6252)