[
https://issues.apache.org/jira/browse/YARN-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223996#comment-16223996
]
Steven Rand edited comment on YARN-7391 at 10/29/17 1:27 PM:
-
[~templedf] and [~yufeigu], thanks for commenting. Apologies for not including
my use case in the original description. We run multiple long-running Spark
applications, each of which uses Spark's dynamic allocation feature, and
therefore has a demand which fluctuates over time. At any point, the demand of
any given app can be quite low (e.g., only an AM container), or quite high
(e.g., hundreds of executors).
Historically, we've run each app in its own leaf queue, since the Fair
Scheduler has not always supported preemption inside a leaf queue. We've found
that since the fair share of a parent queue is split evenly among all of its
active leaf queues, the fair share of each app is the same, regardless of its
demand. This causes our apps with higher demand to have fair shares that are
too low for them to preempt enough resources to even get close to meeting their
demand. If fair share were based on demand, then our apps with lower demand
would be unaffected, but our apps with higher demand could have high enough
weights to preempt a reasonable number of resources away from apps that are
over their fair shares.
This problem led us to consider running more apps inside the same leaf queue,
which is no longer an issue now that the Fair Scheduler supports preemption
between apps in the same leaf queue. We'd hoped to use the size-based weight
feature to achieve the goal of the more demanding apps having high enough fair
shares to preempt sufficient resources away from other apps. However, in
experimenting with this feature, the results were somewhat underwhelming. Yes,
the more demanding apps now have higher fair shares, but not by enough to
significantly impact allocation.
Consider, for example, the rather extreme case of 10 apps running in a leaf
queue, where 9 of them are requesting 20GB each, and 1 of them is requesting
1024GB. The weight of each of the 9 less demanding apps is about 14.3, and the
weight of the highly demanding app is about 20.0. So the highly demanding app
winds up with about 13.5% (20/148) of the queue's fair share, despite having a
demand that's more than 5x that of the other 9 put together, as opposed to the
10% it would have with size-based weight turned off. I know the example is a
bit silly, but I wanted to show that even with huge differences in demand, the
current behavior of size-based weight doesn't produce major differences in
weights.
Does that make sense? Happy to provide more info if helpful.
was (Author: steven rand):
[~templedf] and [~yufeigu], thanks for commenting. Apologies for not including
my use case in the original description. We run multiple long-running Spark
applications, each of which uses Spark's dynamic allocation feature, and
therefore has a demand which fluctuates over time. At any point, the demand of
any given app can be quite low (e.g., only an AM container), or quite high
(e.g., hundreds of executors).
Historically, we've run each app in its own leaf queue, since the Fair
Scheduler has not always supported preemption inside a leaf queue. We've found
that since the fair share of a parent queue is split evenly among all of its
active leaf queues, the fair share of each app is the same, regardless of its
demand. This causes our apps with higher demand to have fair shares that are
too low for them to preempt enough resources to even get close to meeting their
demand. If fair share were based on demand, then our apps with lower demand
would be unaffected, but our apps with higher demand could have high enough
weights to preempt a reasonable number of resources away from apps that over
their fair shares.
This problem led us to consider running more apps inside the same leaf queue,
which is no longer an issue now that the Fair Scheduler supports preemption
inside a leaf queue. We'd hoped to use the size-based weight feature to achieve
the goal of the more demanding apps having high enough fair shares to preempt
sufficient resources away from other apps. However, in experimenting with this
feature, the results were somewhat underwhelming. Yes, the more demanding apps
now have higher fair shares, but not by enough to significantly impact
allocation.
Consider, for example, the rather extreme case of 10 apps running in a leaf
queue, where 9 of them are requesting 20GB each, and 1 of them is requesting
1024GB. The weight of each of the 9 less demanding apps is about 14.3, and the
weight of the highly demanding app is about 20.0. So the highly demanding app
winds up with about 13.5% (20/148) of the queue's fair share, despite having a
demand that's more than 5x that of the other 9 put together, as opposed to the