[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882025#comment-13882025 ]
Karthik Kambatla commented on YARN-1618: ---------------------------------------- bq. All we need to do is go from NEW->KILLED on KILL event and ignore START event in KILLED state. Agree. Posted patch (yarn-1618-2.patch) to handle this. Tested the patch on a secure cluster, and verified the RM doesn't crash anymore when I run an Oozie job with an incorrect RM address. bq. The point about saving app before scheduler acknowledges is a known issue. If that is the only issue, we can close as a duplicate of YARN-1507 which already exists. I think there is merit to fixing the bug here, and use YARN-1507 to have the app be saved only after the scheduler acknowledges it. > Applications transition from NEW to FINAL_SAVING, and try to update > non-existing entries in the state-store > ----------------------------------------------------------------------------------------------------------- > > Key: YARN-1618 > URL: https://issues.apache.org/jira/browse/YARN-1618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Affects Versions: 2.2.0 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Priority: Blocker > Attachments: yarn-1618-1.patch, yarn-1618-2.patch > > > YARN-891 augments the RMStateStore to store information on completed > applications. In the process, it adds transitions from NEW to FINAL_SAVING. > This leads to the RM trying to update entries in the state-store that do not > exist. On ZKRMStateStore, this leads to the RM crashing. > Previous description: > ZKRMStateStore fails to handle updates to znodes that don't exist. For > instance, this can happen when an app transitions from NEW to FINAL_SAVING. > In these cases, the store should create the missing znode and handle the > update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)