[jira] [Comment Edited] (YARN-7391) Consider square root instead of natural log for size-based weight

2017-10-29 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223996#comment-16223996
 ] 

Steven Rand edited comment on YARN-7391 at 10/29/17 1:27 PM:
-

[~templedf] and [~yufeigu], thanks for commenting. Apologies for not including 
my use case in the original description. We run multiple long-running Spark 
applications, each of which uses Spark's dynamic allocation feature, and 
therefore has a demand which fluctuates over time. At any point, the demand of 
any given app can be quite low (e.g., only an AM container), or quite high 
(e.g., hundreds of executors).

Historically, we've run each app in its own leaf queue, since the Fair 
Scheduler has not always supported preemption inside a leaf queue. We've found 
that since the fair share of a parent queue is split evenly among all of its 
active leaf queues, the fair share of each app is the same, regardless of its 
demand. This causes our apps with higher demand to have fair shares that are 
too low for them to preempt enough resources to even get close to meeting their 
demand. If fair share were based on demand, then our apps with lower demand 
would be unaffected, but our apps with higher demand could have high enough 
weights to preempt a reasonable number of resources away from apps that are 
over their fair shares.

This problem led us to consider running more apps inside the same leaf queue, 
which is no longer an issue now that the Fair Scheduler supports preemption 
between apps in the same leaf queue. We'd hoped to use the size-based weight 
feature to achieve the goal of the more demanding apps having high enough fair 
shares to preempt sufficient resources away from other apps. However, in 
experimenting with this feature, the results were somewhat underwhelming. Yes, 
the more demanding apps now have higher fair shares, but not by enough to 
significantly impact allocation.

Consider, for example, the rather extreme case of 10 apps running in a leaf 
queue, where 9 of them are requesting 20GB each, and 1 of them is requesting 
1024GB. The weight of each of the 9 less demanding apps is about 14.3, and the 
weight of the highly demanding app is about 20.0. So the highly demanding app 
winds up with about 13.5% (20/148) of the queue's fair share, despite having a 
demand that's more than 5x that of the other 9 put together, as opposed to the 
10% it would have with size-based weight turned off. I know the example is a 
bit silly, but I wanted to show that even with huge differences in demand, the 
current behavior of size-based weight doesn't produce major differences in 
weights.

Does that make sense? Happy to provide more info if helpful.


was (Author: steven rand):
[~templedf] and [~yufeigu], thanks for commenting. Apologies for not including 
my use case in the original description. We run multiple long-running Spark 
applications, each of which uses Spark's dynamic allocation feature, and 
therefore has a demand which fluctuates over time. At any point, the demand of 
any given app can be quite low (e.g., only an AM container), or quite high 
(e.g., hundreds of executors).

Historically, we've run each app in its own leaf queue, since the Fair 
Scheduler has not always supported preemption inside a leaf queue. We've found 
that since the fair share of a parent queue is split evenly among all of its 
active leaf queues, the fair share of each app is the same, regardless of its 
demand. This causes our apps with higher demand to have fair shares that are 
too low for them to preempt enough resources to even get close to meeting their 
demand. If fair share were based on demand, then our apps with lower demand 
would be unaffected, but our apps with higher demand could have high enough 
weights to preempt a reasonable number of resources away from apps that over 
their fair shares.

This problem led us to consider running more apps inside the same leaf queue, 
which is no longer an issue now that the Fair Scheduler supports preemption 
inside a leaf queue. We'd hoped to use the size-based weight feature to achieve 
the goal of the more demanding apps having high enough fair shares to preempt 
sufficient resources away from other apps. However, in experimenting with this 
feature, the results were somewhat underwhelming. Yes, the more demanding apps 
now have higher fair shares, but not by enough to significantly impact 
allocation.

Consider, for example, the rather extreme case of 10 apps running in a leaf 
queue, where 9 of them are requesting 20GB each, and 1 of them is requesting 
1024GB. The weight of each of the 9 less demanding apps is about 14.3, and the 
weight of the highly demanding app is about 20.0. So the highly demanding app 
winds up with about 13.5% (20/148) of the queue's fair share, despite having a 
demand that's more than 5x that of the other 9 put together, as opposed to the 

[jira] [Comment Edited] (YARN-7391) Consider square root instead of natural log for size-based weight

2017-10-25 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219191#comment-16219191
 ] 

Daniel Templeton edited comment on YARN-7391 at 10/25/17 5:57 PM:
--

Do you have a particular scenario where the log-based weight isn't given 
reasonable results?  My concern would be that a sqrt-based weight would allow a 
big-memory app to starve out smaller apps.  Without a motivating issue, I'm not 
super excited about changing this part of the code.

On a tangentially related note, I was profiling the scheduler yesterday and 
noticed that we spend a ton of our scheduling time waiting for the lock in this 
method.  Looks like a good candidate for caching in {{FSAppAttempt}}.


was (Author: templedf):
Do you have a particular scenario where the log-based weight isn't given 
reasonable results?  My concern would be that a sqrt-based weight would allow a 
big-memory app to starve out smaller apps.  Without a motivating issue, I'm not 
super excited about changing this part of the code.

On a tangentially related note, I was profiling the scheduler yesterday and 
noticed that we spend a ton on our scheduling time waiting for the lock in this 
method.  Looks like a good candidate for caching in {{FSAppAttempt}}.

> Consider square root instead of natural log for size-based weight
> -
>
> Key: YARN-7391
> URL: https://issues.apache.org/jira/browse/YARN-7391
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>
> Currently for size-based weight, we compute the weight of an app using this 
> code from 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L377:
> {code}
>   if (sizeBasedWeight) {
> // Set weight based on current memory demand
> weight = Math.log1p(app.getDemand().getMemorySize()) / Math.log(2);
>   }
> {code}
> Because the natural log function grows slowly, the weights of two apps with 
> hugely different memory demands can be quite similar. For example, {{weight}} 
> evaluates to 14.3 for an app with a demand of 20 GB, and evaluates to 19.9 
> for an app with a demand of 1000 GB. The app with the much larger demand will 
> still have a higher weight, but not by a large amount relative to the sum of 
> those weights.
> I think it's worth considering a switch to a square root function, which will 
> grow more quickly. In the above example, the app with a demand of 20 GB now 
> has a weight of 143, while the app with a demand of 1000 GB now has a weight 
> of 1012. These weights seem more reasonable relative to each other given the 
> difference in demand between the two apps.
> The above example is admittedly a bit extreme, but I believe that a square 
> root function would also produce reasonable results in general.
> The code I have in mind would look something like:
> {code}
>   if (sizeBasedWeight) {
> // Set weight based on current memory demand
> weight = Math.sqrt(app.getDemand().getMemorySize());
>   }
> {code}
> Would people be comfortable with this change?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org