[ https://issues.apache.org/jira/browse/YARN-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690937#comment-16690937 ]
Hudson commented on YARN-8833: ------------------------------ FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #15456 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15456/]) YARN-8833. Avoid potential integer overflow when computing fair shares. (wwei: rev d027a24f0349b60efa5125c330058f123771748f) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestComputeFairShares.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/ComputeFairShares.java > Avoid potential integer overflow when computing fair shares > ----------------------------------------------------------- > > Key: YARN-8833 > URL: https://issues.apache.org/jira/browse/YARN-8833 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Reporter: liyakun > Assignee: liyakun > Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8833.1.patch, YARN-8833.2.patch, YARN-8833.3.patch, > YARN-8833.patch > > > When use w2rRatio compute fair share, there may be a chance triggering the > problem of Int overflow, and entering an infinite loop. > Since the compute share thread holds the writeLock, it may blocking > scheduling thread. > This issue occurs in a production environment with 8500 nodes. And we have > already fixed it. > > added 2018-10-29: elaborate the problem > /** > * Compute the resources that would be used given a weight-to-resource ratio > * w2rRatio, for use in the computeFairShares algorithm as described in # > */ > private static int resourceUsedWithWeightToResourceRatio(double w2rRatio, > Collection<? extends Schedulable> schedulables, String type) \{ int > resourcesTaken = 0; for (Schedulable sched : schedulables) \{ int share = > computeShare(sched, w2rRatio, type); resourcesTaken += share; } > return resourcesTaken; > } > The variable resourcesTaken is an integer type. And it also is accumulated > value of result of > computeShare(Schedulable sched, double w2rRatio,String type) which is a value > between the min share and max share of a queue. > For example, when there are 3 queues, each has min share = max share = > Integer.MAX_VALUE, the resourcesTaken will be out of Integer bound, and it > will be a negative number. > when resourceUsedWithWeightToResourceRatio(double w2rRatio, Collection<? > extends Schedulable> schedulables, String type) return a negative number, the > loop in > computeSharesInternal() may never out which got the scheduler lock. > > //org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares > while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type) > < totalResource) > { rMax *= 2.0; } > This may blocking scheduling thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org