[ https://issues.apache.org/jira/browse/YARN-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877833#comment-15877833 ]
Tao Jie commented on YARN-6101: ------------------------------- [~He Tianyi], thank you for sharing your case. Today scheduling is triggered by NM heartbeat, that is once one NM come, the scheduler select containers to assign to this NM. It is difficult to find the global best node to run container for applications. It seems that YARN-5139 improves the scheduling logic, which is first we find a set of candidate nodes for each resource request, then we have NodeScorer to measure which node is the best to allocate. In this case, node's utilization should be considered. > Delay scheduling for node resource balance > ------------------------------------------ > > Key: YARN-6101 > URL: https://issues.apache.org/jira/browse/YARN-6101 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler > Reporter: He Tianyi > Priority: Minor > Attachments: YARN-6101.preliminary.0000.patch > > > We observed that, in today's cluster, usage of Spark has dramatically > increased. > This introduced a new issue that CPU/MEM utilization for single node may > become imbalanced due to Spark is generally more memory intensive. For > example, after a node with capability (48 cores, 192 GB memory) cannot > satisfy a (1 core, 2 GB memory) request if current used resource is (20 > cores, 191 GB memory), with plenty of total available resource across the > whole cluster. > A thought for avoiding the situation is to introduce some strategy during > scheduling. > This JIRA proposes a delay-scheduling-alike approach to achieve better > balance between different type of resources on each node. > The basic idea is consider dominant resource for each node, and when a > scheduling opportunity on a particular node is offered to a resource request, > better make sure the allocation is changing dominant resource of the node, > or, in worst case, allocate at once when number of offered scheduling > opportunities exceeds a certain number. > With YARN SLS and a simulation file with hybrid workload (MR+Spark), the > approach improved cluster resource usage by nearly 5%. And after deployed to > production, we observed a 8% improvement. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org