[
https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584686#comment-17584686
]
ASF GitHub Bot commented on YARN-11277:
---------------------------------------
slfan1989 commented on code in PR #4797:
URL: https://github.com/apache/hadoop/pull/4797#discussion_r954621494
##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/NonAggregatingLogHandler.java:
##########
@@ -149,47 +157,52 @@ private void recover() throws IOException {
@Override
public void handle(LogHandlerEvent event) {
switch (event.getType()) {
- case APPLICATION_STARTED:
- LogHandlerAppStartedEvent appStartedEvent =
- (LogHandlerAppStartedEvent) event;
- this.appOwners.put(appStartedEvent.getApplicationId(),
- appStartedEvent.getUser());
- this.dispatcher.getEventHandler().handle(
- new ApplicationEvent(appStartedEvent.getApplicationId(),
- ApplicationEventType.APPLICATION_LOG_HANDLING_INITED));
+ case APPLICATION_STARTED:
+ LogHandlerAppStartedEvent appStartedEvent =
+ (LogHandlerAppStartedEvent) event;
+ this.appOwners.put(appStartedEvent.getApplicationId(),
+ appStartedEvent.getUser());
+ this.dispatcher.getEventHandler().handle(
+ new ApplicationEvent(appStartedEvent.getApplicationId(),
+ ApplicationEventType.APPLICATION_LOG_HANDLING_INITED));
+ break;
+ case CONTAINER_FINISHED:
+ // Ignore
+ break;
+ case APPLICATION_FINISHED:
+ LogHandlerAppFinishedEvent appFinishedEvent =
+ (LogHandlerAppFinishedEvent) event;
+ ApplicationId appId = appFinishedEvent.getApplicationId();
+ String user = appOwners.remove(appId);
+ if (user == null) {
+ LOG.error("Unable to locate user for " + appId);
Review Comment:
LOG {}
> trigger deletion of log-dir by size for NonAggregatingLogHandler
> ----------------------------------------------------------------
>
> Key: YARN-11277
> URL: https://issues.apache.org/jira/browse/YARN-11277
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 3.4.0
> Reporter: Xianming Lei
> Priority: Minor
> Labels: pull-request-available
>
> In our yarn cluster, the log files of some containers are too large, which
> causes the NodeManager to frequently switch to the unhealthy state. For logs
> that are too large, we can consider deleting them directly without delaying
> yarn.nodemanager.log.retain-seconds.
> Cluster environment:
> # 8k nodes+
> # 50w+ apps / day
> Configuration:
> # yarn.nodemanager.log.retain-seconds=3days
> # yarn.log-aggregation-enable=false
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]