[ 
https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780108#comment-16780108
 ] 

Wilfred Spiegelenburg commented on YARN-6487:
---------------------------------------------

The removal of continuous scheduling was/is based on performance numbers and 
locking issues.

Continuous scheduling was introduced to help speed up allocating containers in 
a small cluster that did not have a large number of heartbeats coming in. This 
would happen in clusters that were running a mixed load of containers with an 
emphasis on longer running containers. In those clusters the NM heartbeats 
would hold up assigning containers when a burst of requests would come in.

The side effect is however that when a cluster grows (100+ nodes) the number of 
heartbeats that needed processing started interfering with the continuous 
scheduling thread and other internal threads. This does cause thread starvation 
and in the worst case scheduling comes to a standstill.
The improvements that have been made in the scheduler that now allows you to 
assign multiple containers per heartbeat and still spread the load over 
multiple nodes have made continuous scheduling unneeded in all but the smallest 
clusters. In those clusters changing NM heartbeat intervals can be used to 
workaround that.
So we really do not need it anymore. If turned on in large clusters it can 
cause a lot of side effect that is why we decided to deprecate it.

We could think about completely decoupling scheduling from the NM heartbeat to 
remove the locking but that would be a far bigger task which affects all 
schedulers.

> FairScheduler: remove continuous scheduling (YARN-1010)
> -------------------------------------------------------
>
>                 Key: YARN-6487
>                 URL: https://issues.apache.org/jira/browse/YARN-6487
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: fairscheduler
>    Affects Versions: 2.7.0
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Major
>
> Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to