[ 
https://issues.apache.org/jira/browse/YARN-8001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387624#comment-16387624
 ] 

Rohith Sharma K S commented on YARN-8001:
-----------------------------------------

In such case, YARN Client resubmit the application to new RM with new 
application id. To the user, there is NO impact. Submitted application start 
running and succeeds. You should be seeing a log in client something like 
"Re-submit application <applicationId> with the same 
ApplicationSubmissionContext"

What is the problem are you faced with this kind of restart? Doesn't your 
application resubmitted and succeeded? 

> Newly created Yarn application ID lost after RM failover
> --------------------------------------------------------
>
>                 Key: YARN-8001
>                 URL: https://issues.apache.org/jira/browse/YARN-8001
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>    Affects Versions: 2.7.3, 2.9.0
>            Reporter: shanyu zhao
>            Priority: Major
>
> I’ve seen a problem in Hadoop 2.7.3 where the newly submitted yarn 
> application was lost after a RM failover. It looks like when handling 
> Application submission, RM does not write it to the state-store (We are using 
> zookeeper based state store) immediately before it respond to the client. But 
> later it failed over to another RM and all write call to the state store 
> failed. The new RM recovers state from the state-store, and this app is lost. 
>  
> The symptom is error message at client side claiming a previously submitted 
> application ID does not exist:
> 2018-02-22 14:54:50,258 [JobControl] WARN  
> org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider - 
> Invocation returned exception on [rm1] : 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1519310222933_0160' doesn't exist in RM. Please check 
> that the job submission was successful.
>  
> This is a timeline excerpted from the resource manager logs:
> 2018-02-22 14:54:06.7685260    headnode1        Storing application with id 
> application_1519310222933_0160
> 2018-02-22 14:54:06.7685660    headnode1              
> application_1519310222933_0160 State change from NEW to NEW_SAVING
> 2018-02-22 14:54:17.8924760    headnode1        Transitioning to standby state
> 2018-02-22 14:54:30.3951160    headnode0        Transitioning to active state



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to