[ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099594#comment-16099594
 ] 

Sunil G commented on YARN-6031:
-------------------------------

Hi [~jianhe]
Few doubts here,

bq.1. Below code catches InvalidLabelResourceRequestException and assumes that 
the error is because node-label becomes disabled
This code snippet catches InvalidLabelResourceRequestException and suppress the 
same only in case of recovery. If AMResourceRequest was stored in statestore, 
which means that {{validateAndCreateResourceRequest}} was successful when app 
was submitted. Now during recovery, same will throw error only when node labels 
are disables by conf. If its in store, we can assume that the am request is 
sane enough. Could you please give more context where some other scenario can 
also throw same exception during recovery.
On an another note, if not recovery {{throw e;}}, we throw same exception back.

bq.2. Below code directly transitions app to failed by using a Rejected event. 
The attempt state is not moved to failed
In RMAppManager#createAndPopulateNewRMApp, app is just created whether its in 
submission/recovery mode. Attempt is not yet created. Hence I think this wont 
be a problem.

bq.3. Is it ok to let the app continue in this scenario, it's less disruptive 
to the apps.
Currently exception was thrown and RM was loosing the context of such an app. 
To record and track such an app, we create the app nd move it to fail state. 
Hence recovery for other apps will also continue and we will have context of 
this app as well.

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-6031
>                 URL: https://issues.apache.org/jira/browse/YARN-6031
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 2.8.0
>            Reporter: Ying Zhang
>            Assignee: Ying Zhang
>            Priority: Minor
>             Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
>         Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch, YARN-6031-branch-2.8.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to