[ 
https://issues.apache.org/jira/browse/YARN-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161931#comment-17161931
 ] 

Prabhu Joseph commented on YARN-10352:
--------------------------------------

[~wangda] Thanks [~wangda] for the review comments. Have addressed them in  
[^YARN-10352-004.patch] 

bq.  With this, we can proactively relocate containers to different nodes 
before the 10 mins timeout. 

Yes right, have reported YARN-10357 to track this.

Currently NM does not unregister from RM when Node Recovery is Enabled so that 
it won't affect the existing running containers. Instead, i think it can send 
unRegisterNM with a boolean set which RM can use for stop scheduling, 
preempting allocated (but not acquired) containers without disturbing running 
containers on that node. RM will also have the right cluster available 
resources without considering the stopped nodes.

NodeStatusUpdaterImpl#serviceStop

{code}
     if (this.registeredWithRM && !this.isStopped
          && !isNMUnderSupervisionWithRecoveryEnabled()
          && !context.getDecommissioned() && !failedToConnect) {
        unRegisterNM();
      }
{code}



> Skip schedule on not heartbeated nodes in Multi Node Placement
> --------------------------------------------------------------
>
>                 Key: YARN-10352
>                 URL: https://issues.apache.org/jira/browse/YARN-10352
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.3.0, 3.4.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>              Labels: capacityscheduler, multi-node-placement
>         Attachments: YARN-10352-001.patch, YARN-10352-002.patch, 
> YARN-10352-003.patch, YARN-10352-004.patch
>
>
> When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM 
> Active Nodes will be still having those stopped nodes until NM Liveliness 
> Monitor Expires after configured timeout 
> (yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, 
> Multi Node Placement assigns the containers on those nodes. They need to 
> exclude the nodes which has not heartbeated for configured heartbeat interval 
> (yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to 
> Asynchronous Capacity Scheduler Threads. 
> (CapacityScheduler#shouldSkipNodeSchedule)
> *Repro:*
> 1. Enable Multi Node Placement 
> (yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery 
> Enabled  (yarn.node.recovery.enabled)
> 2. Have only one NM running say worker0
> 3. Stop worker0 and start any other NM say worker1
> 4. Submit a sleep job. The containers will timeout as assigned to stopped NM 
> worker0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to