[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738985#comment-13738985
 ] 

Sandy Ryza commented on YARN-1024:
----------------------------------

I've been thinking a lot about this, and wanted to propose a modified approach, 
inspired by an offline discussion with Arun and his max-vcores idea 
(https://issues.apache.org/jira/browse/YARN-1024?focusedCommentId=13730074&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13730074).

First, my assumptions about how CPUs work:
* A CPU is essentially a bathtub full of processing power that can be doled out 
to threads, with a limit per thread based on the power of each core within it.
* To give X processing power to a thread means that within a standard unit of 
time, roughly some number of instructions proportional to X can be executed for 
that thread. 
* No more than a certain amount of processing power (the amount of processing 
power per core) can be given to each thread.
* We can use CGroups to say that a task gets some fraction of the system's 
processing power.
* This means that if we have 5 cores with Y processing power each, we can give 
5 threads Y processing power each, or 6 threads 5Y/6 processing power each, but 
we can't give 4 threads 5Y/4 processing power each.
* It never makes sense to use CGroups assign a higher fraction of the system's 
processing power than (numthreads the task can take advantage of / number of 
cores) to a task.
* Equivalently, if my CPU has X processing power per core, it never makes sense 
to assign more than (numthreads the task can take advantage of) * X processing 
power to a task.

So as long as we account for that last constraint, we can essentially view 
processing power as a fluid resource like memory.  With this in mind, we can:
1. Split virtual cores into cores and yarnComputeUnitsPerCore.  Requests can 
include both and nodes can be configured with both.
2. Have a cluster-defined maxComputeUnitsPerCore, which would be the smallest 
yarnComputeUnitsPerCore on any node.  We min all yarnComputeUnitsPerCore 
requests with this number when they hit the RM.
3. Use YCUs, not cores, for scheduling.  I.e. the scheduler thinks of a node's 
CPU capacity in terms of the number of YCUs it can handle and thinks of a 
resource's CPU request in terms of its (normalized yarnComputeUnitsPerCore * # 
cores).  We use YCUs for DRF.
4. If we make YCUs small enough, no need for fractional anything.

This reduces to a number-of-cores-based approach if all containers are 
requested with yarnComputeUnitsPerCore=infinity, and reduces to a YCU approach 
if maxComputeUnitsPerCore is set to infinity.  Predictability, simplicity, and 
scheduling flexibility can be traded off per cluster without overloading the 
same concept with multiple definitions.

This doesn't take into account heteregeneous hardware within a cluster, but I 
think (2) can be tweaked to handle this by holding a value for each node  (can 
elaborate on how this would work).  It also doesn't take into account pinning 
threads to CPUs, but I don't think it's any less extensible for ultimately 
dealing with this than other proposals.

Sorry for the longwindedness.  Bobby, would this provide the flexibility you're 
looking for?
                
> Define a virtual core unambigiously
> -----------------------------------
>
>                 Key: YARN-1024
>                 URL: https://issues.apache.org/jira/browse/YARN-1024
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>
> We need to clearly define the meaning of a virtual core unambiguously so that 
> it's easy to migrate applications between clusters.
> For e.g. here is Amazon EC2 definition of ECU: 
> http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
> Essentially we need to clearly define a YARN Virtual Core (YVC).
> Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
> equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to