[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083957#comment-15083957
 ] 

Bikas Saha commented on YARN-1011:
----------------------------------

I agree with natural container churn in favor of preemption to avoid lost work 
though the issue of clearly defining scheduler policy still remains.

bq.  If we were oversubscribing 10X then I'd probably want it for sure, but if 
it's at most 2X capacity then worst case is a container only gets 50% of the 
resource it had requested. Obviously for something like memory this has to be 
closely controlled because going over the physical capabilities of the machine 
has very significant consequences. But for CPU, I'd definitely be inclined to 
live with the occasional 50% worst case for all containers, in order to avoid 
the 1/1024th worst case for OPPORTUNISTIC containers on a busy node.
I did not understand this. Does this mean, its ok for normal containers to run 
50% slower in the presence of opportunistic containers? If yes, then there are 
scenarios where this may not be a valid choice. E.g. when a cluster is running 
a mix of SLA and non-SLA jobs. Non-SLA jobs are ok if there containers got 
slowed down to increase cluster utilization by running opportunistic containers 
because we are getting higher overall throughput. But SLA jobs are not ok with 
missing deadlines because there tasks ran 50% slower. 

IMO, the litmus test for a feature like this would be to take an existing 
cluster (with low utilization because tasks are asking for more resources than 
what they need 100% of the time). Then turn this feature on and get better 
cluster utilization and throughput without affecting the existing workload. 
Whatever be the internal implementation details. Agree?

bq. 50% of maximum-under-utilized resource of past 30 min for each NM can be 
used to allocate opportunistic containers.
These are heuristics and may all be valid under different circumstances. What 
we should step back and see is what is the source of this optimization.
Observation : Cluster is under-utilized despite being fully allocated
Possible reasons : 
1) Tasks are incorrectly over-allocated. Will never use the resources they ask 
for and hence we can safely run additional opportunistic containers. So this 
feature is used to compensate for poorly configured applications. Probably a 
valid scenario but is it common?
2) Tasks are correctly allocated but dont use their capacity to the limit all 
the time. E.g. Terasort will use high cpu only during the sorting but not 
during the entire length of the job. But its containers will ask for enough CPU 
to run the sort in the desired time. This is a typical application behavior 
where resource usage varies over time. So this feature is used to soak up the 
fallow resources in the cluster while tasks are not using their quoted capacity.

The arguments and assumptions we make need to be considered in the light of 
which of 1 or 2 is the common case and where this feature will be useful.

While its useful to have configuration knobs, for a complex dynamic feature 
like this that is basically reacting to runtime observations, it may be quite 
hard to be able to configure this statically using manual configuration. While 
some limits about max over-allocation limit etc. are easy and probably required 
to configure, we should look at making this feature work by itself instead of 
relying exclusively on configuration (hell :P) for users to make this feature 
usable.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-1011
>                 URL: https://issues.apache.org/jira/browse/YARN-1011
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>         Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to