[jira] [Commented] (YARN-3884) RMContainerImpl transition from RESERVED to KILL apphistory status not updated

Varun Saxena (JIRA) Fri, 11 Nov 2016 17:08:15 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15658709#comment-15658709
 ]


Varun Saxena commented on YARN-3884:
------------------------------------

[~bibinchundatt], kindly rebase the patch. Its no longer applying cleanly.

As such the core changes in the patch based on the approach decided looks fine 
to me. We do not need to update attempt metrics and report the container to 
attempt as that is used to ack back to NM.

Few comments though.
# In TestCapacityScheduler#createReservation, why are you iterating thrice over 
node update events and breaking from the loop. The comment above says we need 
to wait but there is no sleep. It doesnt seem to be required as well anyways.
# killApp right at the end of the test case doesnt seem to be necessary.
# FinishedTransition and ReservationFinishTransition have similar code. We can 
probably move the entity publishing related code to a separate method and call 
it from both the places.
# Can we move this test case to a reservation specific test class like 
TestReservations, if possible.

> RMContainerImpl transition from RESERVED to KILL apphistory status not updated
> ------------------------------------------------------------------------------
>
>                 Key: YARN-3884
>                 URL: https://issues.apache.org/jira/browse/YARN-3884
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>         Environment: Suse11 Sp3
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>              Labels: oct16-easy
>         Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, 
> Elapsed Time.jpg, Test Result-Container status.jpg, YARN-3884.0002.patch
>
>
> Setup
> ===============
> 1 NM 3072 16 cores each
> Steps to reproduce
> ===============
> 1.Submit apps  to Queue 1 with 512 mb 1 core
> 2.Submit apps  to Queue 2 with 512 mb and 5 core
> lots of containers get reserved and unreserved in this case 
> {code}
> 2015-07-02 20:45:31,169 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0002_01_000013 Container Transitioned from NEW to 
> RESERVED
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Reserved container  application=application_1435849994778_0002 
> resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, 
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
> numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 
> used=<memory:2560, vCores:21> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, 
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
> numContainers=6
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=0.96875 
> absoluteUsedCapacity=0.96875 used=<memory:5632, vCores:31> 
> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0001_01_000014 Container Transitioned from NEW to 
> ALLOCATED
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=AM Allocated Container        TARGET=SchedulerApp     
> RESULT=SUCCESS  APPID=application_1435849994778_0001    
> CONTAINERID=container_e24_1435849994778_0001_01_000014
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Assigned container container_e24_1435849994778_0001_01_000014 of capacity 
> <memory:512, vCores:1> on host host-10-19-92-117:64318, which has 6 
> containers, <memory:3072, vCores:14> used and <memory:0, vCores:2> available 
> after allocation
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> assignedContainer application attempt=appattempt_1435849994778_0001_000001 
> container=Container: [ContainerId: 
> container_e24_1435849994778_0001_01_000014, NodeId: host-10-19-92-117:64318, 
> NodeHttpAddress: host-10-19-92-117:65321, Resource: <memory:512, vCores:1>, 
> Priority: 20, Token: null, ] queue=default: capacity=0.2, 
> absoluteCapacity=0.2, usedResources=<memory:2560, vCores:5>, 
> usedCapacity=2.0846906, absoluteUsedCapacity=0.41666666, numApps=1, 
> numContainers=5 clusterResource=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting assigned queue: root.default stats: default: capacity=0.2, 
> absoluteCapacity=0.2, usedResources=<memory:3072, vCores:6>, 
> usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 
> used=<memory:6144, vCores:32> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:32,143 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0001_01_000014 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2015-07-02 20:45:32,174 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Trying to fulfill reservation for application application_1435849994778_0002 
> on node: host-10-19-92-143:64318
> 2015-07-02 20:45:32,174 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Reserved container  application=application_1435849994778_0002 
> resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, 
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
> numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125 
> used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:32,174 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Skipping scheduling since node host-10-19-92-143:64318 is reserved by 
> application appattempt_1435849994778_0002_000001
> 2015-07-02 20:45:32,213 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0001_01_000014 Container Transitioned from 
> ACQUIRED to RUNNING
> 2015-07-02 20:45:32,213 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Null container completed...
> 2015-07-02 20:45:33,178 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Trying to fulfill reservation for application application_1435849994778_0002 
> on node: host-10-19-92-143:64318
> 2015-07-02 20:45:33,178 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Reserved container  application=application_1435849994778_0002 
> resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, 
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
> numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125 
> used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,178 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Skipping scheduling since node host-10-19-92-143:64318 is reserved by 
> application appattempt_1435849994778_0002_000001
> 2015-07-02 20:45:33,704 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
>  Application application_1435849994778_0002 unreserved  on node host: 
> host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3> 
> used=<memory:2560, vCores:13>, currently has 0 at priority 20; 
> currentReservation <memory:0, vCores:0>
> 2015-07-02 20:45:33,704 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> QueueA used=<memory:2560, vCores:21> numContainers=5 user=dsperf 
> user-resources=<memory:2560, vCores:21>
> 2015-07-02 20:45:33,710 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> completedContainer container=Container: [ContainerId: 
> container_e24_1435849994778_0002_01_000013, NodeId: host-10-19-92-143:64318, 
> NodeHttpAddress: host-10-19-92-143:65321, Resource: <memory:512, vCores:5>, 
> Priority: 20, Token: null, ] queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, 
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
> numContainers=5 cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,710 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> completedContainer queue=root usedCapacity=0.9166667 
> absoluteUsedCapacity=0.9166667 used=<memory:5632, vCores:27> 
> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,711 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting completed queue: root.QueueA stats: QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, 
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
> numContainers=5
> 2015-07-02 20:45:33,711 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1435849994778_0002_000001 released container 
> container_e24_1435849994778_0002_01_000013 on node: host: 
> host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3> 
> used=<memory:2560, vCores:13> with event: KILL
> {code}
> *Impact:*
> In application history server the status get updated to -1000 (INVALID)
> but the end time not updated so Elapsed Time always changes.
> Please check the snapshot attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3884) RMContainerImpl transition from RESERVED to KILL apphistory status not updated

Reply via email to