[ 
https://issues.apache.org/jira/browse/YARN-10180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051499#comment-17051499
 ] 

Wangda Tan commented on YARN-10180:
-----------------------------------

Thanks [~prabhujoseph] for filing this! 

I think we should think about to solve this in a short term. (Make sure block 
of write doesn't stop releasing thread). Also we need to solve this in a long 
term. (number of threads for ATS Client should be bounded instead of linear 
grow with number of apps, in a large cluster it is normal to have several 
thousands concurrent running apps). 

And it is worth to look at if RM has the same issue or not.

> TimelineV2ClientImpl$TimelineEntityDispatcher threads leak
> ----------------------------------------------------------
>
>                 Key: YARN-10180
>                 URL: https://issues.apache.org/jira/browse/YARN-10180
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.3.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>
> TimelineV2ClientImpl$TimelineEntityDispatcher threads leak when NM Timeline 
> Dispatcher thread is waiting for synchronous putEntities to complete and 
> which hangs for some reason. The STOP_TIMELINE_CLIENT for completed 
> applications waits in dispatcher queue causing threads started by 
> ApplicationImpl -> TimelineV2ClientImpl to leak.
> {code}
> "pool-19133-thread-1" #1362413 prio=5 os_prio=0 tid=0x00007f027bab0800 
> nid=0x4786c waiting on condition [0x00007efdbb2bf000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000004272df388> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>         at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>         at 
> org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:426)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> *NM Timeline dispatcher Thread*
> {code}
> "NM Timeline dispatcher" #283 prio=5 os_prio=0 tid=0x00007f02db875000 
> nid=0x25bc22 waiting on condition [0x00007f0255de9000]"NM Timeline 
> dispatcher" #283 prio=5 os_prio=0 tid=0x00007f02db875000 nid=0x25bc22 waiting 
> on condition [0x00007f0255de9000]   java.lang.Thread.State: WAITING (parking) 
> at sun.misc.Unsafe.park(Native Method) - parking to wait for  
> <0x0000000411d71310> (a 
> org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder) 
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at 
> java.util.concurrent.FutureTask.get(FutureTask.java:191) at 
> org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:545)
>  at 
> org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:335)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.handleNMTimelineEvent(NMTimelinePublisher.java:145)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ForwardingEventHandler.handle(NMTimelinePublisher.java:427)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ForwardingEventHandler.handle(NMTimelinePublisher.java:422)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) 
> at java.lang.Thread.run(Thread.java:748) 
> {code}
> cc [~leftnoteasy]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to