[ 
https://issues.apache.org/jira/browse/YARN-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875800#comment-17875800
 ] 

ASF GitHub Bot commented on YARN-11702:
---------------------------------------

hadoop-yetus commented on PR #6990:
URL: https://github.com/apache/hadoop/pull/6990#issuecomment-2304316830

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |  17m 39s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m  3s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m 22s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   8m  1s |  |  trunk passed with JDK 
Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04  |
   | +1 :green_heart: |  compile  |   6m 58s |  |  trunk passed with JDK 
Private Build-1.8.0_422-8u422-b05-1~20.04-b05  |
   | +1 :green_heart: |  checkstyle  |   2m  1s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 16s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   3m 12s |  |  trunk passed with JDK 
Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04  |
   | +1 :green_heart: |  javadoc  |   3m  0s |  |  trunk passed with JDK 
Private Build-1.8.0_422-8u422-b05-1~20.04-b05  |
   | +1 :green_heart: |  spotbugs  |   6m 21s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  35m 25s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 34s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  4s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   7m  3s |  |  the patch passed with JDK 
Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04  |
   | +1 :green_heart: |  javac  |   7m  3s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 58s |  |  the patch passed with JDK 
Private Build-1.8.0_422-8u422-b05-1~20.04-b05  |
   | +1 :green_heart: |  javac  |   6m 58s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 50s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m  2s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04  |
   | +1 :green_heart: |  javadoc  |   2m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_422-8u422-b05-1~20.04-b05  |
   | +1 :green_heart: |  spotbugs  |   6m 37s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m 49s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 12s |  |  hadoop-yarn-api in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   5m 51s |  |  hadoop-yarn-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 109m 48s |  |  
hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  0s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 322m 20s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.46 ServerAPI=1.46 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6990/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6990 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 1200633d4eca 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 
20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8da713d7469b11f75059674ef6b0ad2c131631b2 |
   | Default Java | Private Build-1.8.0_422-8u422-b05-1~20.04-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05 
|
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6990/4/testReport/ |
   | Max. process+thread count | 938 (vs. ulimit of 5500) |
   | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: hadoop-yarn-project/hadoop-yarn |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6990/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Fix Yarn over allocating containers
> -----------------------------------
>
>                 Key: YARN-11702
>                 URL: https://issues.apache.org/jira/browse/YARN-11702
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, fairscheduler, scheduler, yarn
>            Reporter: Syed Shameerur Rahman
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>              Labels: pull-request-available
>
> *Replication Steps:*
> Apache Spark 3.5.1 and Apache Hadoop 3.3.6 (Capacity Scheduler)
>  
> {code:java}
> spark.executor.memory            1024M
> spark.driver.memory              2048M
> spark.executor.cores             1
> spark.executor.instances 20
> spark.dynamicAllocation.enabled false{code}
>  
> Based on the setup, there should be 20 spark executors, but from the 
> ResourceManager (RM) UI, i could see that 32 executors were allocated and 12 
> of them were released in seconds. On analyzing the Spark ApplicationMaster 
> (AM) logs, The following logs were observed.
>  
> {code:java}
> 4/06/24 14:10:14 INFO YarnAllocator: Will request 20 executor container(s) 
> for  ResourceProfile Id: 0, each with 1 core(s) and 1408 MB memory. with 
> custom resources: <memory:1408, max memory:2147483647, vCores:1, max 
> vCores:2147483647>
> 24/06/24 14:10:14 INFO YarnAllocator: Received 8 containers from YARN, 
> launching executors on 8 of them.
> 24/06/24 14:10:14 INFO YarnAllocator: Received 8 containers from YARN, 
> launching executors on 8 of them.
> 24/06/24 14:10:14 INFO YarnAllocator: Received 12 containers from YARN, 
> launching executors on 4 of them.
> 24/06/24 14:10:17 INFO YarnAllocator: Received 4 containers from YARN, 
> launching executors on 0 of them.
> {code}
> It was clear for the logs that extra allocated 12 containers are being 
> ignored from Spark side. Inorder to debug this further, additional log lines 
> were added to 
> [AppSchedulingInfo|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java#L427]
>  class in increment and decrement of container request to expose additional 
> information about the request.
>  
> {code:java}
> 2024-06-24 14:10:14,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (IPC Server handler 42 on default port 8030): Updates PendingContainers: 0 
> Incremented by: 20 SchedulerRequestKey{priority=0, allocationRequestId=0, 
> containerToUpdate=null} for: appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,077 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 20 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,077 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 19 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 18 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,112 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 17 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,112 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 16 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,113 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 15 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,113 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 14 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,113 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 13 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,327 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 12 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,328 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 11 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,362 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 10 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,363 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 9 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,363 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 8 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,363 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 7 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,363 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 6 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,364 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 5 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_0000012024-06-24 14:10:14,454 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (IPC Server handler 35 on default port 8030): Updates PendingContainers: 4 
> Decremented by: 4 SchedulerRequestKey{priority=0, allocationRequestId=0, 
> containerToUpdate=null} for: appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,454 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (IPC Server handler 35 on default port 8030): Updates PendingContainers: 0 
> Incremented by: 12 SchedulerRequestKey{priority=0, allocationRequestId=0, 
> containerToUpdate=null} for: appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,578 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 12 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,579 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 11 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,614 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 10 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,614 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 9 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,614 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 8 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,615 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 7 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,615 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 6 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,615 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 5 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,829 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 4 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,829 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 3 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,864 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 2 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:14,864 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 1 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_0000012024-06-24 14:10:14,874 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (IPC Server handler 42 on default port 8030): Updates PendingContainers: 0 
> Incremented by: 4 SchedulerRequestKey{priority=0, allocationRequestId=0, 
> containerToUpdate=null} for: appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:15,080 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 4 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:15,081 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 3 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:15,115 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 2 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:15,115 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Allocate Updates 
> PendingContainers: 1 Decremented by: 1 SchedulerRequestKey{priority=0, 
> allocationRequestId=0, containerToUpdate=null} for: 
> appattempt_1719234929152_0004_000001
> 2024-06-24 14:10:17,931 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (IPC Server handler 3 on default port 8030): checking for deactivate of 
> application :application_1719234929152_0004
> 2024-06-24 14:10:20,743 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo 
> (SchedulerEventDispatcher:Event Processor): Application 
> application_1719234929152_0004 requests cleared {code}
> *RCA*
> It looks like the issue is happening due to asynchronous nature of AMRMClient 
> response communication. The newly allocated containers from Yarn is send as 
> part of the next request from the client causing additional additional. 
> Please check the following example. As per the above mentioned logs
>  # At time 2024-06-24 14:10:14,075, AM sends 20 container request to Yarn, 
> since this is the first request, yarn had not allocated any containers and 
> send 0 containers as response.
>  # At time 2024-06-24 14:10:14,454, AM sends 12 container request to Yarn 
> which suggests that before this time there was an empty request ask to Yarn 
> which had returned 8 containers as allocated in reponse. For this request was 
> well the Yarn returned 8 container as allocated in response.
>  # At time 2024-06-24 14:10:14,874, AM again send 4 container request to yarn 
> for which it returns 12 containers allocated (based on previous request) as 
> response.
>  # Since all the container request is exhausted, AM sends empty request to 
> yarn which returns 4 containers allocated (based on previous request)
>  # So in total 32 containers were allocated for 20 container request. The 
> problem here is AM (client) is being notified about the allocated container 
> request in the next heartbeat/request causing inconsistency.
>  
> *Solutions*
> *1. Modify the AM request in the Yarn/RM side*
> Normalize/update the AM container request to consider the allocated 
> containers. This can be done in the respective scheduler side i.e 
> newContainerAsk = AMContainerAsk - AllocatedContainers.
> *2. Make AM aware of allocated container*
> Before client making allocate request, Check AM should check with RM about 
> container allocated. This approach requires additional AM-RM client 
> communication which can be expense when there are large number of allocate 
> requests.
>  
> *PS: I could see similar behavior with Apache Tez and could happen with the 
> latest hadoop-3.4.0 version as well.*
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to