[ 
https://issues.apache.org/jira/browse/YARN-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zizon updated YARN-9934:
------------------------
    Attachment:     (was: YARN-8246.patch)

> LogAggregationService should not submit aggregator when app dir creation fail
> -----------------------------------------------------------------------------
>
>                 Key: YARN-9934
>                 URL: https://issues.apache.org/jira/browse/YARN-9934
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: log-aggregation
>            Reporter: Zizon
>            Priority: Minor
>
> Before submiting a log aggreation runnable, LogAggregationService  will try 
> to create the aggreated log dir.
> In some case, it may fail(e.g dir num exceed max limit)
>  
> When it did failed and submitted to LogAggregationService, the runnable may 
> run forever if some app statue flip misbehavior(e.g not handling application 
> complete event rightfully,thus keeping appFinishing of AppLogAggregatorImpl 
> be always true).
>  
> In our production(Version 2.7.3), this cause huge number of dangling 
> aggregator(~400+ LogAggregationService threads alive for some node, in which 
> nodemanager configured only 50+ vCPUs).
>  
> The patch try to early throw the creation exception, avoiding starting 
> unnecessary log polling. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to