rohithsharma created YARN-929: --------------------------------- Summary: 2 MRAppMaster spawned for same Job Id Key: YARN-929 URL: https://issues.apache.org/jira/browse/YARN-929 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: rohithsharma
Configuration : yarn.resourcemanager.am.max-retries = 3 Scenario is NodeManager is killed forcefully i.e using kill -9 NM_PID. After Node expiry , RM killed all the container running in this NodeManager. But , MRAppMaster JVM is still running. RM spawn the 2nd attempt MRAppMaster since am retry is configured as 3. Problem from running 2 MRApp is 1st attempt appmaster deletes the job information from hdfs which cause FileNotFoundException for 2nd attempt MRApp. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira