[
https://issues.apache.org/jira/browse/YARN-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047727#comment-18047727
]
ASF GitHub Bot commented on YARN-8833:
--------------------------------------
github-actions[bot] closed pull request #439: YARN-8833 fix compute shares may
lock the scheduling process
URL: https://github.com/apache/hadoop/pull/439
> Avoid potential integer overflow when computing fair shares
> -----------------------------------------------------------
>
> Key: YARN-8833
> URL: https://issues.apache.org/jira/browse/YARN-8833
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Reporter: liyakun
> Assignee: liyakun
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 3.2.1, 2.9.3
>
> Attachments: YARN-8833-branch-2.003.patch,
> YARN-8833-branch-2.1.patch, YARN-8833-branch-2.2.patch, YARN-8833.1.patch,
> YARN-8833.2.patch, YARN-8833.3.patch, YARN-8833.patch
>
>
> When use w2rRatio compute fair share, there may be a chance triggering the
> problem of Int overflow, and entering an infinite loop.
> Since the compute share thread holds the writeLock, it may blocking
> scheduling thread.
> This issue occurs in a production environment. And we have already fixed it.
>
> added 2018-10-29: elaborate the problem
> /**
> * Compute the resources that would be used given a weight-to-resource ratio
> * w2rRatio, for use in the computeFairShares algorithm as described in #
> */
> private static int resourceUsedWithWeightToResourceRatio(double w2rRatio,
> Collection<? extends Schedulable> schedulables, String type) \{ int
> resourcesTaken = 0; for (Schedulable sched : schedulables) { int share =
> computeShare(sched, w2rRatio, type); resourcesTaken += share; }
> return resourcesTaken;
> }
> The variable resourcesTaken is an integer type. And it also is accumulated
> value of result of
> computeShare(Schedulable sched, double w2rRatio,String type) which is a value
> between the min share and max share of a queue.
> For example, when there are 3 queues, each has min share = max share =
> Integer.MAX_VALUE, the resourcesTaken will be out of Integer bound, and it
> will be a negative number.
> when resourceUsedWithWeightToResourceRatio(double w2rRatio, Collection<?
> extends Schedulable> schedulables, String type) return a negative number, the
> loop in
> computeSharesInternal() may never out which got the scheduler lock.
>
> //org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares
> while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
> < totalResource)
> { rMax *= 2.0; }
> This may blocking scheduling thread.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]