[ https://issues.apache.org/jira/browse/YARN-10705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17309038#comment-17309038 ]
Siddharth Ahuja edited comment on YARN-10705 at 3/25/21, 11:36 PM: ------------------------------------------------------------------- Added a patch to ensure that logging only happens in case of an actual container assignment/allocation, not reservation. Tested this on a single node cluster from generated distribution after compilation of the patch on trunk using the below steps: * Set {{yarn.resourcemanager.scheduler.class}} to {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler}}, * Started YARN on the single node cluster, it has 1 NodeManager with 8GB to run containers, * Enabled DEBUG logging for the FSLeafQueue class to check for debug logs: {code} bin/yarn daemonlog -setlevel localhost:8088 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue DEBUG {code} * Check for the DEBUG allocation message in the RM logs : {code} tail -f rmlogs.log | grep "DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue" {code} * Ran the first job requiring 1 AM + 1 non-AM worth 2GB each, so 4GB out of 8GB are used up: {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=2048 -m 1 -mt 600000 {code} * Ran a second job requiring 1 AM + 1 non-AM worth 2GB and 4GB respectively. 2nd application starts i.e. AM starts but there is no room for the 4GB container yet so reservation for the 4GB non-AM happens. {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=4096 -m 1 -mt 600000 {code} * With the patch only following 3 lines are present when reservation occurs which is expected after the patch is applied: {code} 2021-03-25 17:54:13,475 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container:<memory:2048, vCores:1> 2021-03-25 17:54:20,507 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container:<memory:2048, vCores:1> 2021-03-25 17:54:35,558 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue container:<memory:2048, vCores:1> {code} however, in the case of no patch, this was getting added before: {code} 2021-03-25 17:54:43,589 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue container:<memory:-1, vCores:0> {code} No JUnits required, as the change is about a "lack" of log, no change to functionality, as such, re-running existing JUnits should suffice. was (Author: sahuja): Added a patch to ensure that logging only happens in case of an actual container assignment/allocation, not reservation. Tested this on a single node cluster from generated distribution after compilation of the patch on trunk using the below steps: * Set {{yarn.resourcemanager.scheduler.class}} to {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler}}, * Started YARN on the single node cluster, it has 1 NodeManager with 8GB to run containers, * Enabled DEBUG logging for the FSLeafQueue class to check for debug logs: {code} bin/yarn daemonlog -setlevel localhost:8088 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue DEBUG {code} * Check for the DEBUG allocation message in the RM logs : {code} tail -f rmlogs.log | grep "DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.somequeue" {code} * Ran the first job requiring 1 AM + 1 non-AM worth 2GB each, so 4GB out of 8GB are used up: {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=2048 -m 1 -mt 600000 {code} * Ran a second job requiring 1 AM + 1 non-AM worth 2GB and 4GB respectively. 2nd application starts i.e. AM starts but there is no room for the 4GB container yet so reservation for the 4GB non-AM happens. {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.map.memory.mb=4096 -m 1 -mt 600000 {code} * With the patch only following 3 lines are present when reservation occurs which is expected after the patch is applied: {code} 2021-03-25 17:54:13,475 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container:<memory:2048, vCores:1> 2021-03-25 17:54:20,507 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container:<memory:2048, vCores:1> 2021-03-25 17:54:35,558 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container:<memory:2048, vCores:1> {code} however, in the case of no patch, this was getting added before: {code} 2021-03-25 17:54:43,589 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.sidtheadmin container:<memory:-1, vCores:0> {code} No JUnits required, as the change is about a "lack" of log, no change to functionality, as such, re-running existing JUnits should suffice. > Misleading DEBUG log for container assignment needs to be removed when the > container is actually reserved, not assigned in FairScheduler > ---------------------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-10705 > URL: https://issues.apache.org/jira/browse/YARN-10705 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Affects Versions: 3.4.0 > Reporter: Siddharth Ahuja > Assignee: Siddharth Ahuja > Priority: Minor > Attachments: YARN-10705.001.patch > > > Following DEBUG logs are logged if a container reservation is made when a > node has been offered to the queue in FairScheduler: > {code} > 2021-02-10 07:33:55,049 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > application_1610442362681_2607's resource request is reserved. > 2021-02-10 07:33:55,049 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: > Assigned container in queue:root.pj_dc_pe container:<memory:-1, vCores:0> > {code} > The latter log from above seems to indicate a bad container assignment with > <memory:-1, vCores:0> resource allocation, whereas, in actual, it is a bad > log which shouldn't have been logged in the first place. > This log comes from [1] after an application attempt with an unmet demand is > checked for container assignment/reservation. > If the container for this app attempt is reserved on the node, then, it > returns <memory:-1, vCores:0> from [2]. > From [3]: > {quote} > * If an assignment was made, returns the resources allocated to the > * container. If a reservation was made, returns > * FairScheduler.CONTAINER_RESERVED. If no assignment or reservation > was > * made, returns an empty resource. > {quote} > We are checking for the empty resource at [4], but not > FairScheduler.CONTAINER_RESERVED before logging out a message for container > assignment specifically which is incorrect. > Instead of: > {code} > if (!assigned.equals(none())) { > LOG.debug("Assigned container in queue:{} container:{}", > getName(), assigned); > break; > } > {code} > it should be: > {code} > // check if an assignment or a reservation was made. > if (!assigned.equals(none())) { > // only log container assignment if there is > // an actual assignment, not a reservation. > if (!assigned.equals(FairScheduler.CONTAINER_RESERVED) > && LOG.isDebugEnabled()) { > LOG.debug("Assigned container in queue:" + getName() + " " + > "container:" + assigned); > } > break; > } > {code} > [1] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L356 > [2] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L911 > [3] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L842 > [4] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L355 -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org