[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628305#comment-13628305
 ] 

Carlo Curino commented on YARN-45:
----------------------------------

Alejandro, thanks for the feedback, and yes you are spot on. I think what you 
propose is akin to the Set<ResourceRequest> we have (which is similar if I 
understand correctly to the PreemptResource thing you describe). We plan to 
support this, and it does cover one set of use cases very well, i.e., when we 
have a "broad" request and we are ok with the AM resolving this as it see fit. 
As you point out this is good because it allows the AM to be smart about what 
to return and thus more likely to save expensive preemptions in favor of cheap 
ones, or even return a container which is not data-local in place of one that 
is data-local etc... 

However, this feels contrived when we know precisely what we want back from a 
certain AM (e.g., we want to preempt a specific container). To this purpose the 
Set<ContainerID> -based preemption is easier to use, and also simplifies the 
bookeeping done in the RM (in our preemption policy), to decide when to "kill" 
a container if the AM does not preempt it within a certain timeout. This is a 
good match with the FairScheduler internals and we adapted CapacityScheduler to 
leverage this too by means of a preemption monitor.  This will be more clear 
when we release the actual monitor (in the next few days) but the idea is that 
if we talk to the AM in terms of a Set<ContainerID> there is no ambiguity to 
detect when the AM is ignoring us, and thus we have to move on with container 
killing (e.g., to enforce capacity/fairness).  On the contrary using 
ResourceRequest or something like that, we might not know whether the resource 
I want back now is the same I wanted in some previous iteration (hence i am 
being ignored by the AM) or they just happen to be the same/similar. 

If we can devise a simple way to leverage a single resource-based 
representation for both scenarios I would be happy to drop the 
Set<ContainerID>, but so far we haven't found a clean way to do it, so we 
provisioned for both Set<ResourceRequest> and/or Set<ContainerID> to be 
optionally part of a PreemptRequest. The current semantic is that these are 
disjoint sets of resources we want (some called-out as containers, and some 
expressed as resources), but we don't have a strong reason for this not to be a 
tagged union.

Do you think the above covers the use case you have in mind or am I missing 
something? (BTW I am very curious to hear what's your use case).



                
> Scheduler feedback to AM to release containers
> ----------------------------------------------
>
>                 Key: YARN-45
>                 URL: https://issues.apache.org/jira/browse/YARN-45
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Chris Douglas
>            Assignee: Carlo Curino
>         Attachments: YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to