[ https://issues.apache.org/jira/browse/YARN-8364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986613#comment-16986613 ]
zhoukang commented on YARN-8364: -------------------------------- I will work on this > NM aggregation thread should be able to exempt pool > --------------------------------------------------- > > Key: YARN-8364 > URL: https://issues.apache.org/jira/browse/YARN-8364 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation > Reporter: Oleksandr Shevchenko > Priority: Major > > For now, we have limited NM aggregation thread pool that can be configured by > the property yarn.nodemanager.logaggregation.threadpool-size-max=100. > When some application is starting it use one unit of the pool. And locks this > unit until the application is finished. As the result, another application > can aggregate their logs only when the previous application is finished. > Just for example: > yarn.nodemanager.logaggregation.threadpool-size-max=1 > 1. Start long-running application app1 > 2. Start short application app2 > 3. Finished app2 > 4. Finished app1 > 5. Aggregating logs of app1 > 6. Aggregating logs of app2 > In the real cluster, we can have many long running jobs (for example Spark > streaming), therefore short-running application do not aggregate their logs a > long time. It problem appears if the average number of jobs exceeds thread > pool size. All threads occupied by some applications, as the result we have > the huge delay between application finishing and logs uploading. > Will be good if we improve this behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org