[ 
https://issues.apache.org/jira/browse/YARN-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742077#comment-16742077
 ] 

Wilfred Spiegelenburg commented on YARN-9198:
---------------------------------------------

As I [commented in the previous 
jira|https://issues.apache.org/jira/browse/YARN-7913?focusedCommentId=16483490&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16483490]:
 the CS and FS work differently and this can happen due to a number of reasons. 
ACL changes or a change in queue configuration is one of those. Just removing a 
running application on restore is not correct. It really breaks the restore as 
you can now not rely on the restore to pull back all running application on a 
fail over. We need to go back and fix the underlying issue around the queues 
and config.

BTW: The CS forces you to roll back the configuration change and make sure that 
it always works. That might be a solution but with the FS doing queue 
management in a more dynamic way that might not work.

> Corrupted state from a previous version can still cause RM to fail with NPE 
> on FairScheduler
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-9198
>                 URL: https://issues.apache.org/jira/browse/YARN-9198
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler, resourcemanager
>    Affects Versions: 3.1.0, 2.8.5
>            Reporter: Dapeng Sun
>            Assignee: Dapeng Sun
>            Priority: Major
>         Attachments: YARN-9198.001.patch
>
>
> Previously, RM may fail with NPE due to YARN-4347,YARN-4000. After these 
> fixes, FairScheduler still has the same potential issue.
>  
> 201x-xx-xx xx:xx:xx,xxx ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart) - Failed to load/recover state
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to