[ 
https://issues.apache.org/jira/browse/YARN-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138333#comment-15138333
 ] 

Karthik Kambatla commented on YARN-4679:
----------------------------------------

Thanks Jason. My bad - completely forgot the discussion around this. 

[~jianhe], [~vinodkv] - I vaguely remember us discussing the notion of a 
threshold for fraction of nodes that were previously connected in addition to 
this timeout. Do I remember right? Do you think it still makes sense and we can 
use it as a proxy for recovery completion? 

> When work-preserving restart is enabled, the scheduler should wait for the 
> earlier of recovery completion and configured wait time
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4679
>                 URL: https://issues.apache.org/jira/browse/YARN-4679
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Karthik Kambatla
>
> When work-preserving restart is enabled, it appears the restart (or failover) 
> is unconditionally blocked for the configured delay even if the recovery 
> itself finishes sooner than this. This should be updated to wait for the 
> earlier of the two conditions. Also, it would be nice to allow setting the 
> config to -1 to indicate wait as long as need for the recovery to be 
> completed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to