[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141266#comment-15141266
 ] 

Carlo Curino commented on YARN-1051:
------------------------------------

[~grey], I suggest you to read the attached techreport for full context, but 
let me try to summarize the ideas here.

*General Idea*
The reservation system receives reservation requests from users over a period 
of time. Note that each reservation can request resources much ahead of time 
(e.g., I need 10 containers for 1 hour tomorrow sometime between 3pm and 6pm). 
The planner will try to "fit" all these reservation in the plan agenda, while 
respecting the user constraints (e.g., amount of  resources and 
start_time/deadline) and the physical constraints of the plan (which is a 
"queue", and thus has access to a portion of the cluster capacity). The APIs 
exposed to the users allow them to expose their flexibility (e.g., for a 
map-only job I can express the fact that I can run with up to 10 parallel 
containers, but also 1 container at a time), this allows the plan to fit more 
jobs by "deforming them".  A side effect of this is that we can provide support 
for gang-semantics (e.g., I need 10 concurrent containers for 1 h). 

The key intuition is that each job might temporarily use a large amount of 
resources, but we control very explicitly when it should yield resources back 
to other jobs. This explicit time-multiplexing gives very strong guarantees to 
each job (i.e., if the reservation was accepted you will get your resources), 
but allows us to densely pack the cluster agenda (and thus get high utilization 
/ high ROI). Moreover, best-effort jobs can be run on separate queues with the 
standard set of scheduling invariant provided by 
FairScheduler/CapacityScheduler. 
 
*SharingPolicy*
Another interesting area in which enterprise settings can extend/innovate is 
the choice of "SharingPolicy". The SharingPolicy is a way for us to determine 
(beside physical resource availability) how much resources can a 
tenant/reservation ask for in the Plan. This is both 
per-reservation and across reservation from a user (or group). We contributed 
so far a couple of simple policies allowing to enforce instantaneous and 
over-time limits (e.g., each user can grab up to 30% of the plan 
instantaneously, but no more than an average of 5% 
over a 24h period of time). Internally at MS, we are developing other policies 
that are specific to business-rules we care to enforce in our clusters. By 
design, creating a new SharingPolicy that match your business settings is 
fairly easy (narrow  API and easy configuration 
mechanics). Since the Plan stores past (up to a window of time), present, 
future reservations, the policy can be very sophisticated, and explicit. Also 
given the run-lenght-encoded representation of the allocations, algos can be 
quite efficient. 

*ReservationAgent*
The reservation agents are the core of the placement logic. We developed a few, 
which optimize for different things (e.g., minimize cost of the allocation by 
smoothing out the plan, or placing as late/early as possible in the window of 
feasibility). Again this is an area of possible
enhancement, where business logic can kick in and choose to prioritize certain 
types of allocations. 

*Enforcement mechanics*
Finally, in order to "enforce" this planned decisions, we use dynamically 
created and resized queues (each reservation can contain one or more jobs, thus 
the queue mechanism is useful to reuse).  Note that [~acmurthy]'s comment was 
fairly technical, and related to this
last point. He was proposing to leverage application priorities instead of 
queues as an enforcement mechanisms. Both are feasible, and have some pros and 
cons. Overall using queues allowed us to reuse some more of the mechanisms 
(e.g., rely on the preemption 
policy, and all of the advancement people are contributing there).


> YARN Admission Control/Planner: enhancing the resource allocation model with 
> time.
> ----------------------------------------------------------------------------------
>
>                 Key: YARN-1051
>                 URL: https://issues.apache.org/jira/browse/YARN-1051
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler, resourcemanager, scheduler
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>             Fix For: 2.6.0
>
>         Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
> YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
> techreport.pdf
>
>
> In this umbrella JIRA we propose to extend the YARN RM to handle time 
> explicitly, allowing users to "reserve" capacity over time. This is an 
> important step towards SLAs, long-running services, workflows, and helps for 
> gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to