[ https://issues.apache.org/jira/browse/YARN-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737851#comment-16737851 ]
Wilfred Spiegelenburg commented on YARN-8833: --------------------------------------------- The patch looks good +1. The check implemented in {{safeAdd}} is the equivalent check of the {{addExact}} from the JVM so we are good. Thank you for porting to branch-2 [~yoelee] > Avoid potential integer overflow when computing fair shares > ----------------------------------------------------------- > > Key: YARN-8833 > URL: https://issues.apache.org/jira/browse/YARN-8833 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Reporter: liyakun > Assignee: liyakun > Priority: Major > Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-8833-branch-2.003.patch, > YARN-8833-branch-2.1.patch, YARN-8833-branch-2.2.patch, YARN-8833.1.patch, > YARN-8833.2.patch, YARN-8833.3.patch, YARN-8833.patch > > > When use w2rRatio compute fair share, there may be a chance triggering the > problem of Int overflow, and entering an infinite loop. > Since the compute share thread holds the writeLock, it may blocking > scheduling thread. > This issue occurs in a production environment. And we have already fixed it. > > added 2018-10-29: elaborate the problem > /** > * Compute the resources that would be used given a weight-to-resource ratio > * w2rRatio, for use in the computeFairShares algorithm as described in # > */ > private static int resourceUsedWithWeightToResourceRatio(double w2rRatio, > Collection<? extends Schedulable> schedulables, String type) \{ int > resourcesTaken = 0; for (Schedulable sched : schedulables) { int share = > computeShare(sched, w2rRatio, type); resourcesTaken += share; } > return resourcesTaken; > } > The variable resourcesTaken is an integer type. And it also is accumulated > value of result of > computeShare(Schedulable sched, double w2rRatio,String type) which is a value > between the min share and max share of a queue. > For example, when there are 3 queues, each has min share = max share = > Integer.MAX_VALUE, the resourcesTaken will be out of Integer bound, and it > will be a negative number. > when resourceUsedWithWeightToResourceRatio(double w2rRatio, Collection<? > extends Schedulable> schedulables, String type) return a negative number, the > loop in > computeSharesInternal() may never out which got the scheduler lock. > > //org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares > while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type) > < totalResource) > { rMax *= 2.0; } > This may blocking scheduling thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org