[ https://issues.apache.org/jira/browse/YARN-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zizon updated YARN-9934: ------------------------ Attachment: (was: YARN-8246.patch) > LogAggregationService should not submit aggregator when app dir creation fail > ----------------------------------------------------------------------------- > > Key: YARN-9934 > URL: https://issues.apache.org/jira/browse/YARN-9934 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation > Reporter: Zizon > Priority: Minor > > Before submiting a log aggreation runnable, LogAggregationService will try > to create the aggreated log dir. > In some case, it may fail(e.g dir num exceed max limit) > > When it did failed and submitted to LogAggregationService, the runnable may > run forever if some app statue flip misbehavior(e.g not handling application > complete event rightfully,thus keeping appFinishing of AppLogAggregatorImpl > be always true). > > In our production(Version 2.7.3), this cause huge number of dangling > aggregator(~400+ LogAggregationService threads alive for some node, in which > nodemanager configured only 50+ vCPUs). > > The patch try to early throw the creation exception, avoiding starting > unnecessary log polling. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org