[ https://issues.apache.org/jira/browse/YARN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830368#comment-13830368 ]
Carlo Curino commented on YARN-1434: ------------------------------------ Srikanth, what we observed (again in a noise environment, so to be validated) is that the AM returning containers is maintaining is position as "under capacity" w.r.t. other machines, since it returned a bunch of containers, so it will be picked again as highest in priority. As a consequence it is wasting containers in a way that in our small setup was harming other jobs opportunity to get access to containers. If Robert has few spare cycles, he will try to make a minimal patch to the MR AM that make it behave maliciously and try again on the CapacityScheduler, and maybe Sandy could try it with the fair scheduler? If we confirm this is indeed a problem, and that is substantial for non-trivial scenarios (we noticed it for 2 jobs in 2 queues on 10 machines, not sure whether has impact at scale), we might need to tweak the schedulers logics to penalize users that yield back lots of containers (e.g., accounting for those containers against the user quota for n seconds or something). > Single Job can affect fairshare of others > ----------------------------------------- > > Key: YARN-1434 > URL: https://issues.apache.org/jira/browse/YARN-1434 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Carlo Curino > Priority: Minor > > A job receiving containers and deciding not to use them and yielding them > back in the next heartbeat could significantly affect the amount of resources > given to other jobs. > This is because by yielding containers back the job appears always to be > under-capacity (more than others) so it is picked to be the next to receive > containers. > Observed by Robert Grandl, to be independently confirmed. -- This message was sent by Atlassian JIRA (v6.1#6144)