[ https://issues.apache.org/jira/browse/YARN-10180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051499#comment-17051499 ]
Wangda Tan commented on YARN-10180: ----------------------------------- Thanks [~prabhujoseph] for filing this! I think we should think about to solve this in a short term. (Make sure block of write doesn't stop releasing thread). Also we need to solve this in a long term. (number of threads for ATS Client should be bounded instead of linear grow with number of apps, in a large cluster it is normal to have several thousands concurrent running apps). And it is worth to look at if RM has the same issue or not. > TimelineV2ClientImpl$TimelineEntityDispatcher threads leak > ---------------------------------------------------------- > > Key: YARN-10180 > URL: https://issues.apache.org/jira/browse/YARN-10180 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.3.0 > Reporter: Prabhu Joseph > Assignee: Prabhu Joseph > Priority: Major > > TimelineV2ClientImpl$TimelineEntityDispatcher threads leak when NM Timeline > Dispatcher thread is waiting for synchronous putEntities to complete and > which hangs for some reason. The STOP_TIMELINE_CLIENT for completed > applications waits in dispatcher queue causing threads started by > ApplicationImpl -> TimelineV2ClientImpl to leak. > {code} > "pool-19133-thread-1" #1362413 prio=5 os_prio=0 tid=0x00007f027bab0800 > nid=0x4786c waiting on condition [0x00007efdbb2bf000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000004272df388> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:426) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > *NM Timeline dispatcher Thread* > {code} > "NM Timeline dispatcher" #283 prio=5 os_prio=0 tid=0x00007f02db875000 > nid=0x25bc22 waiting on condition [0x00007f0255de9000]"NM Timeline > dispatcher" #283 prio=5 os_prio=0 tid=0x00007f02db875000 nid=0x25bc22 waiting > on condition [0x00007f0255de9000] java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) - parking to wait for > <0x0000000411d71310> (a > org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at > java.util.concurrent.FutureTask.get(FutureTask.java:191) at > org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:545) > at > org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:335) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.handleNMTimelineEvent(NMTimelinePublisher.java:145) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ForwardingEventHandler.handle(NMTimelinePublisher.java:427) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ForwardingEventHandler.handle(NMTimelinePublisher.java:422) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:748) > {code} > cc [~leftnoteasy] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org