subject:"\[jira\] \[Commented\] \(YARN\-2026\) Fair scheduler \: Fair share for inactive queues causes unfair allocation in some scenarios"

[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-08-08 Thread Karthik Kambatla (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091037#comment-14091037
]

Karthik Kambatla commented on YARN-2026:

Thanks for bearing with us on this JIRA, Ashwin. That patch looks mostly good.
Minor comments:
# This is a very subjective opinion. In ComputeFairShares, would it be
cleaner/simpler to rename existing {{public computeShares}} to {{private
computeSharesInternal}}, and add a new {{public computeShares}} that takes
calls the internal version only with active queues?
# Thanks for adding a bunch of tests in TestFairSchedulerFairShare. Post
YARN-1474,
## setup() need not call
{{scheduler.setRMContext(resourceManager.getRMContext());}}
## configureClusterWithQueuesAndOneNode need not call the following:
{code}
scheduler.init(conf);
scheduler.start();
scheduler.reinitialize(conf, resourceManager.getRMContext());
{code}

Fair scheduler : Fair share for inactive queues causes unfair allocation in
some scenarios
--

Key: YARN-2026
URL: https://issues.apache.org/jira/browse/YARN-2026
Project: Hadoop YARN
Issue Type: Bug
Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
Labels: scheduler
Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt, YARN-2026-v3.txt,
YARN-2026-v4.txt

Problem1- While using hierarchical queues in fair scheduler,there are few
scenarios where we have seen a leaf queue with least fair share can take
majority of the cluster and starve a sibling parent queue which has greater
weight/fair share and preemption doesn’t kick in to reclaim resources.
The root cause seems to be that fair share of a parent queue is distributed
to all its children irrespective of whether its an active or an inactive(no
apps running) queue. Preemption based on fair share kicks in only if the
usage of a queue is less than 50% of its fair share and if it has demands
greater than that. When there are many queues under a parent queue(with high
fair share),the child queue’s fair share becomes really low. As a result when
only few of these child queues have apps running,they reach their *tiny* fair
share quickly and preemption doesn’t happen even if other leaf
queues(non-sibling) are hogging the cluster.
This can be solved by dividing fair share of parent queue only to active
child queues.
Here is an example describing the problem and proposed solution:
root.lowPriorityQueue is a leaf queue with weight 2
root.HighPriorityQueue is parent queue with weight 8
root.HighPriorityQueue has 10 child leaf queues :
root.HighPriorityQueue.childQ(1..10)
Above config,results in root.HighPriorityQueue having 80% fair share
and each of its ten child queue would have 8% fair share. Preemption would
happen only if the child queue is 4% (0.5*8=4).
Lets say at the moment no apps are running in any of the
root.HighPriorityQueue.childQ(1..10) and few apps are running in
root.lowPriorityQueue which is taking up 95% of the cluster.
Up till this point,the behavior of FS is correct.
Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30%
of the cluster. It would get only the available 5% in the cluster and
preemption wouldn't kick in since its above 4%(half fair share).This is bad
considering childQ1 is under a highPriority parent queue which has *80% fair
share*.
Until root.lowPriorityQueue starts relinquishing containers,we would see the
following allocation on the scheduler page:
*root.lowPriorityQueue = 95%*
*root.HighPriorityQueue.childQ1=5%*
This can be solved by distributing a parent’s fair share only to active
queues.
So in the example above,since childQ1 is the only active queue
under root.HighPriorityQueue, it would get all its parent’s fair share i.e.
80%.
This would cause preemption to reclaim the 30% needed by childQ1 from
root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
Problem2 - Also note that similar situation can happen between
root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2
hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck
at 5%,until childQ2 starts relinquishing containers. We would like each of
childQ1 and childQ2 to get half of root.HighPriorityQueue fair share ie
40%,which would ensure childQ1 gets upto 40% resource if needed through
preemption.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-08-08 Thread Ashwin Shankar (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091308#comment-14091308
]

Ashwin Shankar commented on YARN-2026:
--

Thanks [~kasha]. All comments addressed in v5 patch.

Fair scheduler : Fair share for inactive queues causes unfair allocation in
some scenarios
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-08-08 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091516#comment-14091516
]

Hadoop QA commented on YARN-2026:
-

{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12660716/YARN-2026-v5.txt
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 2 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/4564//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4564//console

This message is automatically generated.

Fair scheduler : Fair share for inactive queues causes unfair allocation in
some scenarios
--

[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-08-08 Thread Karthik Kambatla (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091577#comment-14091577
]

Karthik Kambatla commented on YARN-2026:

+1. Checking this in..

Fair scheduler : Fair share for inactive queues causes unfair allocation in
some scenarios
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-08-07 Thread Ashwin Shankar (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089907#comment-14089907
]

Ashwin Shankar commented on YARN-2026:
--

Based on an offline discussion with [~kasha], the plan is to change the present
fair share implementation
by making it dynamic i.e. calculate fair share only for active queues. We would
implement static fair share in YARN-2393 and Web UI changes in YARN-2360.

[~kasha],
In my latest(v4) patch, you would find that I’ve abstracted out the fair share
calculation
to a method ComputeFairShares#getWeightToResourceRatio and I invoke this with
active schedulables. I’ve implemented it this way for two reasons :
1. When we do static fair share in YARN-2360, we can just reuse that method by
passing all
schedulables.
2. It didn’t seem clean to me to mix “active queue” logic inside fair share
computation.

Fair scheduler : Fair share for inactive queues causes unfair allocation in
some scenarios
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-08-07 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090026#comment-14090026
]

Hadoop QA commented on YARN-2026:
-

{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12660479/YARN-2026-v4.txt
against trunk revision .