[ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785845#comment-15785845
 ] 

Wangda Tan commented on YARN-6031:
----------------------------------

Thanks [~Ying Zhang] for updating patch, and thanks comments from [~templedf] / 
[~sunilg].

I agree with approach suggested by [~sunilg]. We can create RMAppImpl when 
InvalidResourceRequest thrown (For example, create a null ResourceRequest), and 
print proper error message 

After {{createAndPopulateNewRMApp}}, we can check if (amRequest == null) and 
(not unmanaged-am), if yes, send APP_REJECTED message, just like handling 
errors when credential parse error found, see {{submitApplication}}.

Thoughts?


> Application recovery failed after disabling node label
> ------------------------------------------------------
>
>                 Key: YARN-6031
>                 URL: https://issues.apache.org/jira/browse/YARN-6031
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 2.8.0
>            Reporter: Ying Zhang
>            Assignee: Ying Zhang
>            Priority: Minor
>         Attachments: YARN-6031.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to