[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620180#comment-16620180 ]
Hadoop QA commented on YARN-8771: --------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 0s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 3 new + 24 unchanged - 0 fixed = 27 total (was 24) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 7s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}184m 36s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestApplicationMasterService | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueManagementDynamicEditPolicy | | | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSchedulingRequestUpdate | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8771 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940341/YARN-8771.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a31e09774439 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 27978bc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21871/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/21871/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21871/testReport/ | | Max. process+thread count | 925 (vs. ulimit of 10000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21871/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --------------------------------------------------------------------------------------- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 3.2.0 > Reporter: Tao Yang > Assignee: Tao Yang > Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch, > YARN-8771.003.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org