[ https://issues.apache.org/jira/browse/YARN-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nathan Roberts updated YARN-5202: --------------------------------- Attachment: YARN-5202.patch Originally branched from commit: 42f90ab885d9693fcc1e52f9637f7de4111110ae > Dynamic Overcommit of Node Resources - POC > ------------------------------------------ > > Key: YARN-5202 > URL: https://issues.apache.org/jira/browse/YARN-5202 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager > Affects Versions: 3.0.0-alpha1 > Reporter: Nathan Roberts > Assignee: Nathan Roberts > Attachments: YARN-5202.patch > > > This Jira is to present a proof-of-concept implementation (collaboration > between [~jlowe] and myself) of a dynamic over-commit implementation in YARN. > The type of over-commit implemented in this jira is similar to but not as > full-featured as what's being implemented via YARN-1011. YARN-1011 is where > we see ourselves heading but we needed something quick and completely > transparent so that we could test it at scale with our varying workloads > (mainly MapReduce, Spark, and Tez). Doing so has shed some light on how much > additional capacity we can achieve with over-commit approaches, and has > fleshed out some of the problems these approaches will face. > Primary design goals: > - Avoid changing protocols, application frameworks, or core scheduler logic, > - simply adjust individual nodes' available resources based on current node > utilization and then let scheduler do what it normally does > - Over-commit slowly, pull back aggressively - If things are looking good and > there is demand, slowly add resource. If memory starts to look over-utilized, > aggressively reduce the amount of over-commit. > - Make sure the nodes protect themselves - i.e. if memory utilization on a > node gets too high, preempt something - preferably something from a > preemptable queue > A patch against trunk will be attached shortly. Some notes on the patch: > - This feature was originally developed against something akin to 2.7. Since > the patch is mainly to explain the approach, we didn't do any sort of testing > against trunk except for basic build and basic unit tests > - The key pieces of functionality are in {{SchedulerNode}}, > {{AbstractYarnScheduler}}, and {{NodeResourceMonitorImpl}}. The remainder of > the patch is mainly UI, Config, Metrics, Tests, and some minor code > duplication (e.g. to optimize node resource changes we treat an over-commit > resource change differently than an updateNodeResource change - i.e. > remove_node/add_node is just too expensive for the frequency of over-commit > changes) > - We only over-commit memory at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org