[ 
https://issues.apache.org/jira/browse/YARN-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074393#comment-15074393
 ] 

Karthik Kambatla commented on YARN-4257:
----------------------------------------

[~leftnoteasy] - really appreciate the ping, haven't seen the JIRA. 

We need the FairScheduler to allow 0 allocation - Llama relies on it. 
Basically, Llama (a long running service serving multiple milli-second-long 
queries) can't always tolerate the scheduling latency of Yarn or the latency of 
spawning containers. To ensure Llama works with Yarn while Yarn adds support 
for long running services, Llama requests resources from Yarn and "lends" them 
to individual queries. To avoid segmentation, we normalize all requests - each 
request is split into a linear combination of <1 unit of memory, 0 cpu> and <0 
memory, 1 unit if cpu>. 

Also, is the motivation for the current change just code cleanup? Could we have 
scheduler-specific validations followed by a call to super.validateConf() for 
common checks? 

> Move scheduler validateConf method to AbstractYarnScheduler and make it 
> protected
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-4257
>                 URL: https://issues.apache.org/jira/browse/YARN-4257
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: scheduler
>            Reporter: Swapnil Daingade
>            Assignee: Rich Haase
>              Labels: easyfix
>         Attachments: YARN-4257.patch
>
>
> Currently FairScheduler, CapacityScheduler and FifoScheduler each have a 
> method private void validateConf(Configuration conf).
> All three methods validate the minimum and maximum scheduler allocations for 
> cpu and memory (with minor difference). FairScheduler supports 0 as minimum 
> allocation for cpu and memory, while CapacityScheduler and FifoScheduler do 
> not. We can move this code to AbstractYarnScheduler (avoids code duplication) 
> and make it protected for individual schedulers to override.
> Why do we care about a minimum allocation of 0 for cpu and memory?
> We contribute to a project called Apache Myriad that run yarn on mesos. 
> Myriad supports a feature call fine grained scaling (fgs). In fgs, a NM is 
> launched with zero capacity (0 cpu and 0 mem). When a yarn container is to be 
> run on the NM, a mesos offer for that node is accepted and the NM capacity is 
> dynamically scaled up to match the accepted mesos offer. On completion of the 
> yarn container, resources are returned back to Mesos and the NM capacity is 
> scaled down back to zero (cpu & mem). 
> In ResourceTrackerService.registerNodeManager, yarn checks if the NM capacity 
> is at-least as much as yarn.scheduler.minimum-allocation-mb and 
> yarn.scheduler.minimum-allocation-vcores. These values can be set to 0 in 
> yarn-site.xml (so a zero capacity NM is possible). However, the validateConf 
> methods in CapacityScheduler and FifoScheduler do not allow for 0 values for 
> these properties (The FairScheduler one does allow for 0). This behaviour 
> should be consistent or at-least be override able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to