[jira] [Commented] (YARN-5202) Dynamic Overcommit of Node Resources - POC

Ha Son Hai (JIRA) Thu, 18 Aug 2016 15:23:48 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427275#comment-15427275
 ]


Ha Son Hai commented on YARN-5202:
----------------------------------

Hi [~nroberts]!

Would you mind if I ask for a little bit more on the explanation of this 
parameter "RM_OVERCOMMIT_MEM_MAX_FACTOR"? I set it to a different value, but 
the log of the ResourceManager reports that value 0. It's the same for 
vcoreFactor. I wonder if it's a bug?

By the way, is RM_OVERCOMMIT_MEM_MAX_FACTOR redundant with 
RM_OVERCOMMIT_MEM_INCREMENT? one is in ratio and the other is in MBs?
In the case that I have node with 32Gigs RAM, if I set 
RM_OVERCOMMIT_MEM_MAX_FACTOR to 2, does it mean that I can over-commit for 2 
times the total memory that I have (in case the utilization is very low) that 
is 64Gigs?

Sorry if the question is a "basic" or a "stupid" one. I have just started to 
work with the code of HADOOP so there is a lot of things that's new to me.
Thanks a lot for your clarification. I attached below your explanation for the 
parameters. 

    + RM_OVERCOMMIT_MEM_INCREMENT: Specifies the largest memory increment in 
megabytes when enlarging a node's total resource for overcommit. Once 
incremented at least one container must be launched on the node to increase the 
value further. A value <= 0 will disable memory overcommit.
    + RM_OVERCOMMIT_MEM_MAX_FACTOR
    Maximum amount of memory to overcommit as a factor of the total node 
memory. A value <= 0 disables memory overcommit.


> Dynamic Overcommit of Node Resources - POC
> ------------------------------------------
>
>                 Key: YARN-5202
>                 URL: https://issues.apache.org/jira/browse/YARN-5202
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, resourcemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>         Attachments: YARN-5202-branch2.7-uber.patch, YARN-5202.patch
>
>
> This Jira is to present a proof-of-concept implementation (collaboration 
> between [~jlowe] and myself) of a dynamic over-commit implementation in YARN. 
>  The type of over-commit implemented in this jira is similar to but not as 
> full-featured as what's being implemented via YARN-1011. YARN-1011 is where 
> we see ourselves heading but we needed something quick and completely 
> transparent so that we could test it at scale with our varying workloads 
> (mainly MapReduce, Spark, and Tez). Doing so has shed some light on how much 
> additional capacity we can achieve with over-commit approaches, and has 
> fleshed out some of the problems these approaches will face.
> Primary design goals:
> - Avoid changing protocols, application frameworks, or core scheduler logic,  
> - simply adjust individual nodes' available resources based on current node 
> utilization and then let scheduler do what it normally does
> - Over-commit slowly, pull back aggressively - If things are looking good and 
> there is demand, slowly add resource. If memory starts to look over-utilized, 
> aggressively reduce the amount of over-commit.
> - Make sure the nodes protect themselves - i.e. if memory utilization on a 
> node gets too high, preempt something - preferably something from a 
> preemptable queue
> A patch against trunk will be attached shortly.  Some notes on the patch:
> - This feature was originally developed against something akin to 2.7.  Since 
> the patch is mainly to explain the approach, we didn't do any sort of testing 
> against trunk except for basic build and basic unit tests
> - The key pieces of functionality are in {{SchedulerNode}}, 
> {{AbstractYarnScheduler}}, and {{NodeResourceMonitorImpl}}. The remainder of 
> the patch is mainly UI, Config, Metrics, Tests, and some minor code 
> duplication (e.g. to optimize node resource changes we treat an over-commit 
> resource change differently than an updateNodeResource change - i.e. 
> remove_node/add_node is just too expensive for the frequency of over-commit 
> changes)
> - We only over-commit memory at this point. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5202) Dynamic Overcommit of Node Resources - POC

Reply via email to