[ https://issues.apache.org/jira/browse/YARN-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
JayceAu updated YARN-8493: -------------------------- Attachment: YARN-8493.001.patch > LogAggregation in NodeManager is put off because great amount of long running > app > --------------------------------------------------------------------------------- > > Key: YARN-8493 > URL: https://issues.apache.org/jira/browse/YARN-8493 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.6.0 > Reporter: JayceAu > Priority: Major > Fix For: 2.6.0 > > Attachments: YARN-8493.001.patch > > > h2. Issue summary > In our Yarn cluster, on average, it will take 30 min to show the app log on > web after the app is finished. This problem is caused by the limitation of > threadPool size in NodeManager. > In NodeManager, it will set aside an appLogAggregator to do log Aggregation > for each container running on this NodeManager. This appLogAggregator will > occupy one thread in the threadPool until it's finished in the whole cluster. > NodeManager uses FixedThreadPool (default size is 100) instead of > CachedThreadPool which is used in the old version. At peak moment in our > production environment, there is more than 350 AppLogAggregator running or > queuing in thread pool and those app queuing will suffer from great log > aggregation latency. > h2. Possible Solution > We can increase yarn.nodemanager.logaggregation.threadpool-size-max to a > higher value to solve it. But this problem will happen again if the running > app increase and it will create a lot of idle thread waiting for log > aggregation. > Our solution is not to put the {color:#333333}appLogAggregator {color}into > the threadPool until it's finished: > # give an callback to each {color:#333333}appLogAggregator to put itself > into the threadPool, it's not called until it's notified{color} > # if rollingMonitorInterval is greater than 0, NodeManager will set aside a > thread in LogAggregationService to do log Aggregation for all the running app > periodically > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org