[ https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brahma Reddy Battula resolved YARN-3639. ---------------------------------------- Resolution: Duplicate > It takes too long time for RM to recover all apps if the original active RM > and NN go down at the same time. > ------------------------------------------------------------------------------------------------------------ > > Key: YARN-3639 > URL: https://issues.apache.org/jira/browse/YARN-3639 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Xianyin Xin > Attachments: YARN-3639-recovery_log_1_app.txt > > > If the active RM and NN go down at the same time, the new RM will take long > time to recover all apps. After analysis, we found the root cause is renewing > HDFS tokens in the recovering process. The HDFS client created by the renewer > would firstly try to connect to the original NN, the result of which is > time-out after 10~20s, and then the client tries to connect to the new NN. > The entire recovery cost 15*#apps seconds according our test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)