[ https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877774#comment-15877774 ]
Bibin A Chundatt commented on YARN-3884: ---------------------------------------- Thank you All .I think it make sense to publish only the ALLOCATED container to ATS.Initially myself and [~varun_saxena] had discussed same but thought will be an behaviour change. Understanding was would be good to have the reserved container info in ATS. . [~varun_saxena] Publishing attempt metrics of stale we can take up in another jira . > App History status not updated when RMContainer transitions from RESERVED to > KILLED > ----------------------------------------------------------------------------------- > > Key: YARN-3884 > URL: https://issues.apache.org/jira/browse/YARN-3884 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Environment: Suse11 Sp3 > Reporter: Bibin A Chundatt > Assignee: Bibin A Chundatt > Labels: oct16-easy > Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, > Elapsed Time.jpg, Test Result-Container status.jpg, YARN-3884.0002.patch, > YARN-3884.0003.patch, YARN-3884.0004.patch, YARN-3884.0005.patch, > YARN-3884.0006.patch, YARN-3884.0007.patch, YARN-3884.0008.patch > > > Setup > =============== > 1 NM 3072 16 cores each > Steps to reproduce > =============== > 1.Submit apps to Queue 1 with 512 mb 1 core > 2.Submit apps to Queue 2 with 512 mb and 5 core > lots of containers get reserved and unreserved in this case > {code} > 2015-07-02 20:45:31,169 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_e24_1435849994778_0002_01_000013 Container Transitioned from NEW to > RESERVED > 2015-07-02 20:45:31,170 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Reserved container application=application_1435849994778_0002 > resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, > absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, > usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, > numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 > used=<memory:2560, vCores:21> cluster=<memory:6144, vCores:32> > 2015-07-02 20:45:31,170 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, > absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, > usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, > numContainers=6 > 2015-07-02 20:45:31,170 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.96875 > absoluteUsedCapacity=0.96875 used=<memory:5632, vCores:31> > cluster=<memory:6144, vCores:32> > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_e24_1435849994778_0001_01_000014 Container Transitioned from NEW to > ALLOCATED > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf > OPERATION=AM Allocated Container TARGET=SchedulerApp > RESULT=SUCCESS APPID=application_1435849994778_0001 > CONTAINERID=container_e24_1435849994778_0001_01_000014 > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: > Assigned container container_e24_1435849994778_0001_01_000014 of capacity > <memory:512, vCores:1> on host host-10-19-92-117:64318, which has 6 > containers, <memory:3072, vCores:14> used and <memory:0, vCores:2> available > after allocation > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > assignedContainer application attempt=appattempt_1435849994778_0001_000001 > container=Container: [ContainerId: > container_e24_1435849994778_0001_01_000014, NodeId: host-10-19-92-117:64318, > NodeHttpAddress: host-10-19-92-117:65321, Resource: <memory:512, vCores:1>, > Priority: 20, Token: null, ] queue=default: capacity=0.2, > absoluteCapacity=0.2, usedResources=<memory:2560, vCores:5>, > usedCapacity=2.0846906, absoluteUsedCapacity=0.41666666, numApps=1, > numContainers=5 clusterResource=<memory:6144, vCores:32> > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting assigned queue: root.default stats: default: capacity=0.2, > absoluteCapacity=0.2, usedResources=<memory:3072, vCores:6>, > usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6 > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 > used=<memory:6144, vCores:32> cluster=<memory:6144, vCores:32> > 2015-07-02 20:45:32,143 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_e24_1435849994778_0001_01_000014 Container Transitioned from > ALLOCATED to ACQUIRED > 2015-07-02 20:45:32,174 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Trying to fulfill reservation for application application_1435849994778_0002 > on node: host-10-19-92-143:64318 > 2015-07-02 20:45:32,174 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Reserved container application=application_1435849994778_0002 > resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, > absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, > usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, > numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125 > used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32> > 2015-07-02 20:45:32,174 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Skipping scheduling since node host-10-19-92-143:64318 is reserved by > application appattempt_1435849994778_0002_000001 > 2015-07-02 20:45:32,213 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_e24_1435849994778_0001_01_000014 Container Transitioned from > ACQUIRED to RUNNING > 2015-07-02 20:45:32,213 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Null container completed... > 2015-07-02 20:45:33,178 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Trying to fulfill reservation for application application_1435849994778_0002 > on node: host-10-19-92-143:64318 > 2015-07-02 20:45:33,178 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Reserved container application=application_1435849994778_0002 > resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, > absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, > usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, > numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125 > used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32> > 2015-07-02 20:45:33,178 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Skipping scheduling since node host-10-19-92-143:64318 is reserved by > application appattempt_1435849994778_0002_000001 > 2015-07-02 20:45:33,704 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > Application application_1435849994778_0002 unreserved on node host: > host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3> > used=<memory:2560, vCores:13>, currently has 0 at priority 20; > currentReservation <memory:0, vCores:0> > 2015-07-02 20:45:33,704 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > QueueA used=<memory:2560, vCores:21> numContainers=5 user=dsperf > user-resources=<memory:2560, vCores:21> > 2015-07-02 20:45:33,710 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > completedContainer container=Container: [ContainerId: > container_e24_1435849994778_0002_01_000013, NodeId: host-10-19-92-143:64318, > NodeHttpAddress: host-10-19-92-143:65321, Resource: <memory:512, vCores:5>, > Priority: 20, Token: null, ] queue=QueueA: capacity=0.4, > absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, > usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, > numContainers=5 cluster=<memory:6144, vCores:32> > 2015-07-02 20:45:33,710 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > completedContainer queue=root usedCapacity=0.9166667 > absoluteUsedCapacity=0.9166667 used=<memory:5632, vCores:27> > cluster=<memory:6144, vCores:32> > 2015-07-02 20:45:33,711 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting completed queue: root.QueueA stats: QueueA: capacity=0.4, > absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, > usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, > numContainers=5 > 2015-07-02 20:45:33,711 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application attempt appattempt_1435849994778_0002_000001 released container > container_e24_1435849994778_0002_01_000013 on node: host: > host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3> > used=<memory:2560, vCores:13> with event: KILL > {code} > *Impact:* > In application history server the status get updated to -1000 (INVALID) > but the end time not updated so Elapsed Time always changes. > Please check the snapshot attached -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org