[ 
https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725413#comment-13725413
 ] 

Allen Wittenauer commented on YARN-972:
---------------------------------------

Do we have an example of a workload that needs 'fractional cores'?  Is this 
workload even appropriate for Hadoop?

As one of the probably crazy people who supports systems that do extremely 
non-MR things at large scales, I'd prefer to see two things implemented:
* I need a processor this fast (in GHz)
* I need a processor that supports this instruction set

But I'd position the GHz question differently than what has been proposed 
above.  If I say my workload needs a 1GHz processor but there is only a 4GHz 
processor available, then the workflow would get the whole 4GHz processor. If 
another workload comes in that needs a 4GHz processor but only a 2GHz processor 
is available, it needs to wait.  

Treating speed as fractions gets into another problem:  2x2GHz != 4GHz.  Just 
as having 1/4 of 4 different cores != 1 core. Throw cpu sets into the mix and 
we've got a major hairball.

Also, I'm a bit leery of our usage of the term core here and elsewhere in 
Hadoop-land. As [~ste...@apache.org] points out, there are impacts on the Lx 
caches when sharing load.  This is also true when talking about most SMT 
implementations, such as Intel's HyperThreading.  This means if we're talking 
about the Linux (and most other OSes) representation of CPU threads as being 
equivalent cores, there is *already* a performance hit and users are *already* 
getting fractional performance just by treating those as "real" cores.
                
> Allow requests and scheduling for fractional virtual cores
> ----------------------------------------------------------
>
>                 Key: YARN-972
>                 URL: https://issues.apache.org/jira/browse/YARN-972
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: api, scheduler
>    Affects Versions: 2.0.5-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> As this idea sparked a fair amount of discussion on YARN-2, I'd like to go 
> deeper into the reasoning.
> Currently the virtual core abstraction hides two orthogonal goals.  The first 
> is that a cluster might have heterogeneous hardware and that the processing 
> power of different makes of cores can vary wildly.  The second is that a 
> different (combinations of) workloads can require different levels of 
> granularity.  E.g. one admin might want every task on their cluster to use at 
> least a core, while another might want applications to be able to request 
> quarters of cores.  The former would configure a single vcore per core.  The 
> latter would configure four vcores per core.
> I don't think that the abstraction is a good way of handling the second goal. 
>  Having a virtual cores refer to different magnitudes of processing power on 
> different clusters will make the difficult problem of deciding how many cores 
> to request for a job even more confusing.
> Can we not handle this with dynamic oversubscription?
> Dynamic oversubscription, i.e. adjusting the number of cores offered by a 
> machine based on measured CPU-consumption, should work as a complement to 
> fine-granularity scheduling.  Dynamic oversubscription is never going to be 
> perfect, as the amount of CPU a process consumes can vary widely over its 
> lifetime.  A task that first loads a bunch of data over the network and then 
> performs complex computations on it will suffer if additional CPU-heavy tasks 
> are scheduled on the same node because its initial CPU-utilization was low.  
> To guard against this, we will need to be conservative with how we 
> dynamically oversubscribe.  If a user wants to explicitly hint to the 
> scheduler that their task will not use much CPU, the scheduler should be able 
> to take this into account.
> On YARN-2, there are concerns that including floating point arithmetic in the 
> scheduler will slow it down.  I question this assumption, and it is perhaps 
> worth debating, but I think we can sidestep the issue by multiplying 
> CPU-quantities inside the scheduler by a decently sized number like 1000 and 
> keep doing the computations on integers.
> The relevant APIs are marked as evolving, so there's no need for the change 
> to delay 2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to