[ 
https://issues.apache.org/jira/browse/YARN-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rohithsharma updated YARN-929:
------------------------------

    Summary: 2 MRAppMaster running parallely for same Job Id  (was: 2 
MRAppMaster spawned for same Job Id)
    
> 2 MRAppMaster running parallely for same Job Id
> -----------------------------------------------
>
>                 Key: YARN-929
>                 URL: https://issues.apache.org/jira/browse/YARN-929
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.5-alpha
>            Reporter: rohithsharma
>
> Configuration : 
>     yarn.resourcemanager.am.max-retries = 3
> Scenario is 
>     NodeManager is killed forcefully i.e using kill -9 NM_PID.
>     After Node expiry , RM killed all the container running in this 
> NodeManager.
>     But , MRAppMaster JVM is still running.
>     RM spawn the 2nd attempt MRAppMaster since am retry is configured as 3. 
> At this point, there are 2 MRAppMaster is running parallely for same job Id
> Problem from running 2 MRApp is 1st attempt appmaster deletes the job 
> information from hdfs which cause FileNotFoundException for 2nd attempt 
> MRApp.  
>      

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to