[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321648#comment-15321648
 ] 

Jason Lowe commented on YARN-5215:
----------------------------------

Ah, so the headline was a bit misleading.  Most people saw that and thought 
this feature is going to schedule more containers on the server, but this is 
essentially the opposite of YARN-1011 and YARN-5202.  It's scaling down the 
nodes from the original size as the node utilization increases rather than 
scaling it up from the original size as the node utilization decreases.  
Instead of overcommit, this feature is "undercommit."  ;)

I'm OK with the idea of the feature in general, and I don't think it will be 
horribly incompatible.  In fact I think features like YARN-1011 / YARN-5202 
could emulate this behavior by tuning the "guaranteed" node size for YARN very 
low but allowing it to drastically overcommit up to the original node 
capability.  In other words rather than starting nodes big and scaling down, we 
start nodes small and scale up when we can.  The YARN-5202 patch is already 
dynamically scaling the node size based on the reported total node utilization, 
so it will respond to increasing external load similarly.  The only thing 
missing there is it won't go below the original node size no matter how bad the 
utilization gets, so either that would need to be changed or as I mentioned the 
users tune the feature differently to get this behavior.

Any thoughts on whether this is better implemented as an overcommit setup 
rather than an undercommit setup?  It may be confusing if YARN has two separate 
features doing essentially the same thing from opposite viewpoints.  Also the 
guaranteed containers from YARN-1011 are going to be difficult to guarantee if 
this feature can preempt them based on external node load.  Arguably the user 
should configure a guaranteed YARN capacity on these nodes and then YARN can 
opportunistically use the remaining node's capacity when it appears available.

If we do go with this approach, it seems like this patch is quite a ways off.  
Besides unit tests, conf switches, and boundary condition hardening, I think it 
will be confusing to users and admins to monitor it.  Simply adjusting the 
SchedulerNode will semantically accomplish the desired task as far as 
scheduling containers goes, but the UI, queue and cluster metrics will not 
reflect the reality of the scheduler.  For example if most of the nodes have 
significantly been scaled back due to external load, the scheduler UI will show 
a well-underutilized cluster when in reality it may be completely full and 
can't schedule another container.  That's going to be very confusing to users.  
And there are no metrics showing how much has been scaled back -- I think the 
user would have to go to the scheduler nodes page, sum the node capabilities 
themselves, notice it's significantly lower than the reported cluster total, 
and assume it must be this feature causing that anomaly.  I would think 
minimally the cluster size should be changing (along with the queue portions of 
that size) so the amount of utilization of the YARN cluster and scheduler UI is 
accurate.  That still leaves the user to divine why their cluster size is 
floating around over time when they aren't adding or removing nodes, which is 
why we may need another metric showing how much has been "stolen" by external 
node load outside of YARN.  Maybe we still have an overcommit metric but it 
goes negative when we've had original capacity removed by external factors?  
Not sure how best to represent it without over-cluttering the UI with a bunch 
of feature-specific fields.

This was addressed in the YARN-5202 patch by adjusting the queue and cluster 
metrics as we adjust the scheduler node, and there were also metrics and UI 
fields added to show the amount of overcommit.  Note that in the YARN-5202 
patch we added a fast-path to adjusting the node's size in the scheduler.  The 
typical remove-old-node-add-new-node form of updating is quite expensive since 
it computes unnecessary things like node label diffs, etc. and updates the 
metrics twice, once for the removal and once for the add.  Since this kind of 
feature is going to be adjusting node sizes all the time, a node adjustment 
needs to be as cheap as possible while still keeping the UI and metrics up to 
date.


> Scheduling containers based on external load in the servers
> -----------------------------------------------------------
>
>                 Key: YARN-5215
>                 URL: https://issues.apache.org/jira/browse/YARN-5215
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Inigo Goiri
>         Attachments: YARN-5215.000.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to