[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550502#comment-15550502 ]
Tony Peng commented on YARN-4477: --------------------------------- I'm also getting this problem with assignMultiple=false. [~kasha] [~rdub] what was your offline discussion? > FairScheduler: Handle condition which can result in an infinite loop in > attemptScheduling. > ------------------------------------------------------------------------------------------ > > Key: YARN-4477 > URL: https://issues.apache.org/jira/browse/YARN-4477 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Reporter: Tao Jie > Assignee: Tao Jie > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4477.001.patch, YARN-4477.002.patch, > YARN-4477.003.patch, YARN-4477.004.patch > > > This problem is introduced by YARN-4270 which add limitation on reservation. > In FSAppAttempt.reserve(): > {code} > if (!reservationExceedsThreshold(node, type)) { > LOG.info("Making reservation: node=" + node.getNodeName() + > " app_id=" + getApplicationId()); > if (!alreadyReserved) { > getMetrics().reserveResource(getUser(), container.getResource()); > RMContainer rmContainer = > super.reserve(node, priority, null, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } else { > RMContainer rmContainer = node.getReservedContainer(); > super.reserve(node, priority, rmContainer, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } > } > {code} > If reservation over threshod, current node will not set reservation. > But in attemptScheduling in FairSheduler: > {code} > while (node.getReservedContainer() == null) { > boolean assignedContainer = false; > if (!queueMgr.getRootQueue().assignContainer(node).equals( > Resources.none())) { > assignedContainers++; > assignedContainer = true; > > } > > if (!assignedContainer) { break; } > if (!assignMultiple) { break; } > if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } > } > {code} > assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not > equals to Resources.none(). > As a result, if multiple assign is enabled and maxAssign is unlimited, this > while loop would never break. > I suppose that assignContainer(node) should return Resource.none rather than > CONTAINER_RESERVED when the attempt doesn't take the reservation because of > the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org