[jira] [Created] (YARN-8493) LogAggregation in NodeManager is put off because great amount of long running app

JayceAu (JIRA) Tue, 03 Jul 2018 23:17:17 -0700

JayceAu created YARN-8493:
-----------------------------

             Summary: LogAggregation in NodeManager is put off because great 
amount of long running app
                 Key: YARN-8493
                 URL: https://issues.apache.org/jira/browse/YARN-8493
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
    Affects Versions: 2.6.0
            Reporter: JayceAu
             Fix For: 2.6.0



h2. Issue summary

In our Yarn cluster, on average, it will take 30 min to show the app log on web 
after the app is finished. This problem is caused by the limitation of 
threadPool size in NodeManager.

In NodeManager, it will set aside an appLogAggregator to do log Aggregation for 
each container running on this NodeManager. This appLogAggregator will occupy 
one thread in the threadPool until it's finished in the whole cluster.  
NodeManager uses FixedThreadPool (default size is 100) instead of 
CachedThreadPool which is used in the old version. At peak moment in our 
production environment, there is more than 350 AppLogAggregator running or 
queuing in thread pool and those app queuing will suffer from great log 
aggregation latency.
h2. Possible Solution

We can increase yarn.nodemanager.logaggregation.threadpool-size-max to a higher 
value to solve it. But this problem will happen again if the running app 
increase and it will create a lot of idle thread waiting for log aggregation. 

Our solution is not to put the {color:#333333}appLogAggregator {color}into the 
threadPool until it's finished:
 # give an callback to each {color:#333333}appLogAggregator to put itself into 
the threadPool, it's not called until it's notified{color}
 # if rollingMonitorInterval is greater than 0, NodeManager will set aside a 
thread in LogAggregationService to do log Aggregation for all the running app 
periodically

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8493) LogAggregation in NodeManager is put off because great amount of long running app

Reply via email to