[ https://issues.apache.org/jira/browse/YARN-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916038#comment-16916038 ]
Bibin A Chundatt commented on YARN-9738: ---------------------------------------- [~BilwaST] As discussed offline need to handle get to nodes using null key. > Remove lock on ClusterNodeTracker#getNodeReport as it blocks application > submission > ----------------------------------------------------------------------------------- > > Key: YARN-9738 > URL: https://issues.apache.org/jira/browse/YARN-9738 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Bilwa S T > Assignee: Bilwa S T > Priority: Major > Attachments: YARN-9738-001.patch, YARN-9738-002.patch > > > *Env :* > Server OS :- UBUNTU > No. of Cluster Node:- 9120 NMs > Env Mode:- [Secure / Non secure]Secure > *Preconditions:* > ~9120 NM's was running > ~1250 applications was in running state > 35K applications was in pending state > *Test Steps:* > 1. Submit the application from 5 clients, each client 2 threads and total 10 > queues > 2. Once application submittion increases (for each application of > distributted shell will call getClusterNodes) > *ClientRMservice#getClusterNodes tries to get > ClusterNodeTracker#getNodeReport where map nodes is locked.* > {quote} > "IPC Server handler 36 on 45022" #246 daemon prio=5 os_prio=0 > tid=0x00007f75095de000 nid=0x1949c waiting on condition [0x00007f74cff78000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007f759f6d8858> (a > java.util.concurrent.locks.ReentrantReadWriteLock$FairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodeReport(ClusterNodeTracker.java:123) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getNodeReport(AbstractYarnScheduler.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.createNodeReports(ClientRMService.java:1067) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getClusterNodes(ClientRMService.java:992) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:313) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:589) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2792) > {quote} > *Instead we can make nodes as concurrentHashMap and remove readlock* -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org