[ 
https://issues.apache.org/jira/browse/YARN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830368#comment-13830368
 ] 

Carlo Curino commented on YARN-1434:
------------------------------------

Srikanth, what we observed (again in a noise environment, so to be validated) 
is that the AM returning containers is maintaining is position as "under 
capacity" w.r.t. other machines, since it returned a bunch of containers, so it 
will be picked again as highest in priority. As a consequence it is wasting 
containers in a way that in our small setup was harming other jobs opportunity 
to get access to containers. 

If Robert has few spare cycles, he will try to make a minimal patch to the MR 
AM that make it behave maliciously and try again on the CapacityScheduler, and 
maybe Sandy could try it with the fair scheduler? 

If we confirm this is indeed a problem, and that is substantial for non-trivial 
scenarios (we noticed it for 2 jobs in 2 queues on 10 machines, not sure 
whether has impact at scale), we might need to tweak the schedulers logics to 
penalize users that yield back lots of containers (e.g., accounting for those 
containers against the user quota for n seconds or something).


> Single Job can affect fairshare of others
> -----------------------------------------
>
>                 Key: YARN-1434
>                 URL: https://issues.apache.org/jira/browse/YARN-1434
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Carlo Curino
>            Priority: Minor
>
> A job receiving containers and deciding not to use them and yielding them 
> back in the next heartbeat could significantly affect the amount of resources 
> given to other jobs. 
> This is because by yielding containers back the job appears always to be 
> under-capacity (more than others) so it is picked to be the next to receive 
> containers.
> Observed by Robert Grandl, to be independently confirmed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to