[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602522#comment-16602522 ]
niu commented on YARN-8513: --------------------------- Debug dump: {code:java} 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing shopee-test-cluster04:45454 of type STATUS_UPDATE 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType: NODE_UPDATE 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: nodeUpdate: shopee-test-cluster04:45454 cluster capacity: <memory:1351680, vCores:240> 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Node being looked for scheduling shopee-test-cluster04:45454 availableResource: <memory:82944, vCores:77> 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Trying to schedule on node: shopee-test-cluster04, available: <memory:82944, vCores:77> 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Trying to assign containers to child-queue of root 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: Check assign to queue: root nodePartition: , usedResources: <memory:1095680, vCores:8>, clusterResources: <memory:1351680, vCores:240>, currentUsedCapacity: 0.81060606, max-capacity: 1.0 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: printChildQueues - queue: root child-queues: root.dwusedCapacity=(1.1842697), label=(*)root.devusedCapacity=(0.016571993), label=(*) 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Trying to assign to queue: root.dev stats: dev: capacity=0.32, absoluteCapacity=0.32, usedResources=<memory:7168, vCores:1>, usedCapacity=0.016571993, absoluteUsedCapacity=0.0053030304, numApps=1, numContainers=1 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: assignContainers: partition= #applications=1 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: Check assign to queue: dev nodePartition: , usedResources: <memory:7168, vCores:1>, clusterResources: <memory:1351680, vCores:240>, currentUsedCapacity: 0.0053030304, max-capacity: 0.6 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager: userLimit is fetched. userLimit=<memory:432640, vCores:77>, userSpecificUserLimit=<memory:432640, vCores:77>, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition= 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Headroom calculation for user work: userLimit=<memory:432640, vCores:77> queueMaxAvailRes=<memory:811008, vCores:144> consumed=<memory:7168, vCores:1> partition= 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: pre-assignContainers for application application_1535930391687_0019 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager: userLimit is fetched. userLimit=<memory:432640, vCores:77>, userSpecificUserLimit=<memory:432640, vCores:77>, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition= 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1535930391687_0019 headRoom=<memory:425472, vCores:76> currentConsumption=7168 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalitySchedulingPlacementSet: Request={AllocationRequestId: 0, Priority: 1, Capability: <memory:360448, vCores:2>, # Containers: 3, Location: *, Relax Locality: true, Execution Type Request: null, Node Label Expression: } 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator: assignContainers: node=shopee-test-cluster04 application=application_1535930391687_0019 priority=1 pendingAsk=<per-allocation-resource=<memory:360448, vCores:2>,repeat=3> type=OFF_SWITCH 2018-09-03 11:44:11,175 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: Reserved container application=application_1535930391687_0019 resource=<memory:360448, vCores:2> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@65ed660 cluster=<memory:1351680, vCores:240> 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: post-assignContainers for application application_1535930391687_0019 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager: userLimit is fetched. userLimit=<memory:432640, vCores:77>, userSpecificUserLimit=<memory:432640, vCores:77>, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition= 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1535930391687_0019 headRoom=<memory:425472, vCores:76> currentConsumption=7168 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalitySchedulingPlacementSet: Request={AllocationRequestId: 0, Priority: 1, Capability: <memory:360448, vCores:2>, # Containers: 3, Location: *, Relax Locality: true, Execution Type Request: null, Node Label Expression: } 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.dev stats: dev: capacity=0.32, absoluteCapacity=0.32, usedResources=<memory:7168, vCores:1>, usedCapacity=0.016571993, absoluteUsedCapacity=0.0053030304, numApps=1, numContainers=1 --> <memory:360448, vCores:2>, OFF_SWITCH 2018-09-03 11:44:11,175 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.81060606 absoluteUsedCapacity=0.81060606 used=<memory:1095680, vCores:8> cluster=<memory:1351680, vCores:240> 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: ParentQ=root assignedSoFarInThisIteration=<memory:360448, vCores:2> usedCapacity=0.81060606 absoluteUsedCapacity=0.81060606 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Try to commit allocation proposal=New org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.ResourceCommitRequest: RESERVED=[(Application=appattempt_1535930391687_0019_000001; Node=shopee-test-cluster04:45454; Resource=<memory:360448, vCores:2>)] 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager: userLimit is fetched. userLimit=<memory:432640, vCores:77>, userSpecificUserLimit=<memory:432640, vCores:77>, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition= 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Headroom calculation for user work: userLimit=<memory:432640, vCores:77> queueMaxAvailRes=<memory:811008, vCores:144> consumed=<memory:7168, vCores:1> partition= 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: Used resource=<memory:1095680, vCores:8> exceeded maxResourceLimit of the queue =<memory:1351680, vCores:240> 2018-09-03 11:44:11,175 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal 2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Assigned maximum number of off-switch containers: 1, assignments so far: resource:<memory:360448, vCores:2>; type:OFF_SWITCH; excessReservation:null; applicationid:null; skipped:NONE; fulfilled reservation:false; allocations(count/resource):0/<memory:0, vCores:0>; reservations(count/resource):1/<memory:360448, vCores:2> 2018-09-03 11:44:11,287 DEBUG org.apache.hadoop.ipc.Server: got #68890 2018-09-03 11:44:11,287 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 30 on 8031: Call#68890 Retry#0 org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 10.65.205.151:60900 for RpcKind RPC_PROTOCOL_BUFFER 2018-09-03 11:44:11,287 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:work (auth:SIMPLE) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.ipc.Server: Served: nodeHeartbeat, queueTime= 1 procesingTime= 0 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType: STATUS_UPDATE 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing shopee-test-cluster03:45454 of type STATUS_UPDATE 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 30 on 8031: responding to Call#68890 Retry#0 org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 10.65.205.151:60900 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType: NODE_UPDATE 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 30 on 8031: responding to Call#68890 Retry#0 org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 10.65.205.151:60900 Wrote 42 bytes. 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: nodeUpdate: shopee-test-cluster03:45454 cluster capacity: <memory:1351680, vCores:240> 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Node being looked for scheduling shopee-test-cluster03:45454 availableResource: <memory:90112, vCores:78> 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Trying to schedule on node: shopee-test-cluster03, available: <memory:90112, vCores:78> 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Trying to assign containers to child-queue of root 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: Check assign to queue: root nodePartition: , usedResources: <memory:1095680, vCores:8>, clusterResources: <memory:1351680, vCores:240>, currentUsedCapacity: 0.81060606, max-capacity: 1.0 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: printChildQueues - queue: root child-queues: root.dwusedCapacity=(1.1842697), label=(*)root.devusedCapacity=(0.016571993), label=(*) 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Trying to assign to queue: root.dev stats: dev: capacity=0.32, absoluteCapacity=0.32, usedResources=<memory:7168, vCores:1>, usedCapacity=0.016571993, absoluteUsedCapacity=0.0053030304, numApps=1, numContainers=1 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: assignContainers: partition= #applications=1 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: Check assign to queue: dev nodePartition: , usedResources: <memory:7168, vCores:1>, clusterResources: <memory:1351680, vCores:240>, currentUsedCapacity: 0.0053030304, max-capacity: 0.6 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager: userLimit is fetched. userLimit=<memory:432640, vCores:77>, userSpecificUserLimit=<memory:432640, vCores:77>, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition= 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Headroom calculation for user work: userLimit=<memory:432640, vCores:77> queueMaxAvailRes=<memory:811008, vCores:144> consumed=<memory:7168, vCores:1> partition= 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: pre-assignContainers for application application_1535930391687_0019 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager: userLimit is fetched. userLimit=<memory:432640, vCores:77>, userSpecificUserLimit=<memory:432640, vCores:77>, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition= 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1535930391687_0019 headRoom=<memory:425472, vCores:76> currentConsumption=7168 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalitySchedulingPlacementSet: Request={AllocationRequestId: 0, Priority: 1, Capability: <memory:360448, vCores:2>, # Containers: 3, Location: *, Relax Locality: true, Execution Type Request: null, Node Label Expression: } 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator: assignContainers: node=shopee-test-cluster03 application=application_1535930391687_0019 priority=1 pendingAsk=<per-allocation-resource=<memory:360448, vCores:2>,repeat=3> type=OFF_SWITCH 2018-09-03 11:44:11,288 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: Reserved container application=application_1535930391687_0019 resource=<memory:360448, vCores:2> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@65ed660 cluster=<memory:1351680, vCores:240> 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: post-assignContainers for application application_1535930391687_0019 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager: userLimit is fetched. userLimit=<memory:432640, vCores:77>, userSpecificUserLimit=<memory:432640, vCores:77>, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition= 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1535930391687_0019 headRoom=<memory:425472, vCores:76> currentConsumption=7168 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalitySchedulingPlacementSet: Request={AllocationRequestId: 0, Priority: 1, Capability: <memory:360448, vCores:2>, # Containers: 3, Location: *, Relax Locality: true, Execution Type Request: null, Node Label Expression: } 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.dev stats: dev: capacity=0.32, absoluteCapacity=0.32, usedResources=<memory:7168, vCores:1>, usedCapacity=0.016571993, absoluteUsedCapacity=0.0053030304, numApps=1, numContainers=1 --> <memory:360448, vCores:2>, OFF_SWITCH 2018-09-03 11:44:11,288 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.81060606 absoluteUsedCapacity=0.81060606 used=<memory:1095680, vCores:8> cluster=<memory:1351680, vCores:240> 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: ParentQ=root assignedSoFarInThisIteration=<memory:360448, vCores:2> usedCapacity=0.81060606 absoluteUsedCapacity=0.81060606 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Try to commit allocation proposal=New org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.ResourceCommitRequest: RESERVED=[(Application=appattempt_1535930391687_0019_000001; Node=shopee-test-cluster03:45454; Resource=<memory:360448, vCores:2>)] 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager: userLimit is fetched. userLimit=<memory:432640, vCores:77>, userSpecificUserLimit=<memory:432640, vCores:77>, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition= 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Headroom calculation for user work: userLimit=<memory:432640, vCores:77> queueMaxAvailRes=<memory:811008, vCores:144> consumed=<memory:7168, vCores:1> partition= 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: Used resource=<memory:1095680, vCores:8> exceeded maxResourceLimit of the queue =<memory:1351680, vCores:240> 2018-09-03 11:44:11,288 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal 2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Assigned maximum number of off-switch containers: 1, assignments so far: resource:<memory:360448, vCores:2>; type:OFF_SWITCH; excessReservation:null; applicationid:null; skipped:NONE; fulfilled reservation:false; allocations(count/resource):0/<memory:0, vCores:0>; reservations(count/resource):1/<memory:360448, vCores:2> 2018-09-03 11:44:11,700 DEBUG org.apache.hadoop.ipc.Server: got #440 2018-09-03 11:44:11,700 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8032: Call#440 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.65.205.148:48970 for RpcKind RPC_PROTOCOL_BUFFER 2018-09-03 11:44:11,700 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:work (auth:SIMPLE) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) 2018-09-03 11:44:11,700 DEBUG org.apache.hadoop.yarn.server.security.ApplicationACLsManager: Verifying access-type VIEW_APP for work (auth:SIMPLE) on application application_1535930391687_0019 owned by work 2018-09-03 11:44:11,701 DEBUG org.apache.hadoop.ipc.Server: Served: getApplicationReport, queueTime= 0 procesingTime= 1 2018-09-03 11:44:11,701 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8032: responding to Call#440 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.65.205.148:48970 2018-09-03 11:44:11,701 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8032: responding to Call#440 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.65.205.148:48970 Wrote 358 bytes. 2018-09-03 11:44:11,989 DEBUG org.apache.hadoop.ipc.Server: got #3118 2018-09-03 11:44:11,990 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 27 on 8032: Call#3118 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.65.205.148:48370 for RpcKind RPC_PROTOCOL_BUFFER 2018-09-03 11:44:11,990 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:work (auth:SIMPLE) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) 2018-09-03 11:44:11,990 DEBUG org.apache.hadoop.yarn.server.security.ApplicationACLsManager: Verifying access-type VIEW_APP for work (auth:SIMPLE) on application application_1535930391687_0012 owned by work 2018-09-03 11:44:11,990 DEBUG org.apache.hadoop.ipc.Server: Served: getApplicationReport, queueTime= 1 procesingTime= 0 2018-09-03 11:44:11,990 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 27 on 8032: responding to Call#3118 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.65.205.148:48370 2018-09-03 11:44:11,990 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 27 on 8032: responding to Call#3118 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.65.205.148:48370 Wrote 361 bytes. 2018-09-03 11:44:12,005 DEBUG org.apache.hadoop.ipc.Server: got #502725 2018-09-03 11:44:12,005 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 31 on 8031: Call#502725 Retry#0 org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 10.65.205.150:38836 for RpcKind RPC_PROTOCOL_BUFFER 2018-09-03 11:44:12,005 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:work (auth:SIMPLE) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) 2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.ipc.Server: Served: nodeHeartbeat, queueTime= 1 procesingTime= 0 2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 31 on 8031: responding to Call#502725 Retry#0 org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 10.65.205.150:38836 2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType: STATUS_UPDATE 2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 31 on 8031: responding to Call#502725 Retry#0 org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 10.65.205.150:38836 Wrote 42 bytes. 2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing shopee-test-cluster02:45454 of type STATUS_UPDATE 2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType: NODE_UPDATE 2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: nodeUpdate: shopee-test-cluster02:45454 cluster capacity: <memory:1351680, vCores:240> 2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Node being looked for scheduling shopee-test-cluster02:45454 availableResource: <memory:82944, vCores:77> 2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Trying to schedule on node: shopee-test-cluster02, available: <memory:82944, vCores:77>{code} > CapacityScheduler infinite loop when queue is near fully utilized > ----------------------------------------------------------------- > > Key: YARN-8513 > URL: https://issues.apache.org/jira/browse/YARN-8513 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn > Affects Versions: 3.1.0, 2.9.1 > Environment: Ubuntu 14.04.5 and 16.04.4 > YARN is configured with one label and 5 queues. > Reporter: Chen Yufei > Priority: Major > Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, > jstack-5.log, top-during-lock.log, top-when-normal.log, yarn3-jstack1.log, > yarn3-jstack2.log, yarn3-jstack3.log, yarn3-jstack4.log, yarn3-jstack5.log, > yarn3-resourcemanager.log, yarn3-top > > > ResourceManager does not respond to any request when queue is near fully > utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM > restart, it can recover running jobs and start accepting new ones. > > Seems like CapacityScheduler is in an infinite loop printing out the > following log messages (more than 25,000 lines in a second): > > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.99816763 > absoluteUsedCapacity=0.99816763 used=<memory:16170624, vCores:1577> > cluster=<memory:29441544, vCores:5792>}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application attempt=appattempt_1530619767030_1652_000001 > container=null > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 > clusterResource=<memory:29441544, vCores:5792> type=NODE_LOCAL > requestedPartition=}} > > I encounter this problem several times after upgrading to YARN 2.9.1, while > the same configuration works fine under version 2.7.3. > > YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a > similar problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org