[ 
https://issues.apache.org/jira/browse/YARN-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322717#comment-15322717
 ] 

Jason Lowe commented on YARN-5202:
----------------------------------

We're fine with some of the work being moved into that effort.  We're just 
posting the proof of concept prototype we've been running to stimulate 
discussion around overcommit and have people try it out if they're interested 
in what the POC does.  The only reason we didn't go with YARN-1011 directly is 
that we needed something that could be implemented quickly and transparently to 
applications in the short-term.  It doesn't use the concept of guaranteed vs. 
opportunisitc containers but rather simply scales the cluster dynamically 
node-by-node and lets the scheduler divvy up the extra capacity based on its 
existing queue configs.  It does mean containers can be shot that weren't being 
shot before, so SLA-critical apps could be impacted.  That's why we think 
ultimately a guaranteed vs. opportunistic approach like YARN-1011 is a better 
long-term solution.

As for specific parts that could be moved, I agree the UI/metrics stuff would 
be useful, as well as the fast-path for updating a node's resources in the 
scheduler.  As I mentioned in YARN-5215, it's very expensive to adjust a node's 
resources in the scheduler with the traditional approach, so if we can end up 
doing it dynamically on every node heartbeat it's critical to make sure that's 
as cheap as possible.  The nodemanager "self-preservation" preemption logic is 
probably also very relevant.  It would need to be updated to add the guaranteed 
vs. opportunistic container distinction, but otherwise seems close to what we 
would want there.


> Dynamic Overcommit of Node Resources - POC
> ------------------------------------------
>
>                 Key: YARN-5202
>                 URL: https://issues.apache.org/jira/browse/YARN-5202
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, resourcemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>         Attachments: YARN-5202.patch
>
>
> This Jira is to present a proof-of-concept implementation (collaboration 
> between [~jlowe] and myself) of a dynamic over-commit implementation in YARN. 
>  The type of over-commit implemented in this jira is similar to but not as 
> full-featured as what's being implemented via YARN-1011. YARN-1011 is where 
> we see ourselves heading but we needed something quick and completely 
> transparent so that we could test it at scale with our varying workloads 
> (mainly MapReduce, Spark, and Tez). Doing so has shed some light on how much 
> additional capacity we can achieve with over-commit approaches, and has 
> fleshed out some of the problems these approaches will face.
> Primary design goals:
> - Avoid changing protocols, application frameworks, or core scheduler logic,  
> - simply adjust individual nodes' available resources based on current node 
> utilization and then let scheduler do what it normally does
> - Over-commit slowly, pull back aggressively - If things are looking good and 
> there is demand, slowly add resource. If memory starts to look over-utilized, 
> aggressively reduce the amount of over-commit.
> - Make sure the nodes protect themselves - i.e. if memory utilization on a 
> node gets too high, preempt something - preferably something from a 
> preemptable queue
> A patch against trunk will be attached shortly.  Some notes on the patch:
> - This feature was originally developed against something akin to 2.7.  Since 
> the patch is mainly to explain the approach, we didn't do any sort of testing 
> against trunk except for basic build and basic unit tests
> - The key pieces of functionality are in {{SchedulerNode}}, 
> {{AbstractYarnScheduler}}, and {{NodeResourceMonitorImpl}}. The remainder of 
> the patch is mainly UI, Config, Metrics, Tests, and some minor code 
> duplication (e.g. to optimize node resource changes we treat an over-commit 
> resource change differently than an updateNodeResource change - i.e. 
> remove_node/add_node is just too expensive for the frequency of over-commit 
> changes)
> - We only over-commit memory at this point. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to