[ 
https://issues.apache.org/jira/browse/YARN-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139685#comment-15139685
 ] 

Jason Lowe commented on YARN-4679:
----------------------------------

YARN-291 not only breaks with NM restart but also RM restart.  I'm thinking we 
want to persist at least some per-node state in the RM to help solve problems 
like YARN-998 and YARN-2567.  However that would bring up the question of what 
to do when the NM rejoins and really has changed its resource availability 
(e.g.: added or removed RAM, etc.).  In other words, is there a point where the 
rejoining node's state should override the previous resource adjustment applied 
to the RM by admin?  IMHO if the node's total resources haven't changed since 
the admin specified the dynamic resource override then the override should 
persist across RM/NM restarts.  However it becomes less clear to me when the NM 
rejoins with changed resources, especially when the specified override is more 
than the NM advertised during the rejoin.

If we're talking about adjusting the NM's available resources only on the NM 
side then we run into a number of race conditions.  Unlike the RM, the NM is 
unaware of any resource allocations assigned to it until the AM gets around to 
launching the container.  If the NM decides to lower its resources, it could 
easily receive container launch requests afterwards that would violate its new 
total allocation.  I believe it "works" as long as we're willing to live with 
some amount of "overage" on NMs due to these races, but if the scenario demands 
that no such overage occur (e.g.: we're sharing the node's resources with 
something outside of YARN) then it's best to handle the dynamic node resource 
changes on the RM side.

> When work-preserving restart is enabled, the scheduler should wait for the 
> earlier of recovery completion and configured wait time
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4679
>                 URL: https://issues.apache.org/jira/browse/YARN-4679
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Karthik Kambatla
>
> When work-preserving restart is enabled, it appears the restart (or failover) 
> is unconditionally blocked for the configured delay even if the recovery 
> itself finishes sooner than this. This should be updated to wait for the 
> earlier of the two conditions. Also, it would be nice to allow setting the 
> config to -1 to indicate wait as long as need for the recovery to be 
> completed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to