[ 
https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17636460#comment-17636460
 ] 

ASF GitHub Bot commented on YARN-11277:
---------------------------------------

aajisaka commented on code in PR #4797:
URL: https://github.com/apache/hadoop/pull/4797#discussion_r1027616899


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/NonAggregatingLogHandler.java:
##########
@@ -149,47 +156,52 @@ private void recover() throws IOException {
   @Override
   public void handle(LogHandlerEvent event) {
     switch (event.getType()) {
-      case APPLICATION_STARTED:
-        LogHandlerAppStartedEvent appStartedEvent =
-            (LogHandlerAppStartedEvent) event;
-        this.appOwners.put(appStartedEvent.getApplicationId(),
-            appStartedEvent.getUser());
-        this.dispatcher.getEventHandler().handle(
-            new ApplicationEvent(appStartedEvent.getApplicationId(),
-                ApplicationEventType.APPLICATION_LOG_HANDLING_INITED));
+    case APPLICATION_STARTED:
+      LogHandlerAppStartedEvent appStartedEvent =
+          (LogHandlerAppStartedEvent) event;
+      this.appOwners.put(appStartedEvent.getApplicationId(),
+          appStartedEvent.getUser());
+      this.dispatcher.getEventHandler().handle(
+          new ApplicationEvent(appStartedEvent.getApplicationId(),
+              ApplicationEventType.APPLICATION_LOG_HANDLING_INITED));
+      break;
+    case CONTAINER_FINISHED:
+      // Ignore
+      break;
+    case APPLICATION_FINISHED:
+      LogHandlerAppFinishedEvent appFinishedEvent =
+          (LogHandlerAppFinishedEvent) event;
+      ApplicationId appId = appFinishedEvent.getApplicationId();
+      String user = appOwners.remove(appId);
+      if (user == null) {
+        LOG.error("Unable to locate user for " + appId);
+        // send LOG_HANDLING_FAILED out
+        NonAggregatingLogHandler.this.dispatcher.getEventHandler().handle(
+            new ApplicationEvent(appId,
+                ApplicationEventType.APPLICATION_LOG_HANDLING_FAILED));
         break;
-      case CONTAINER_FINISHED:
-        // Ignore
-        break;
-      case APPLICATION_FINISHED:
-        LogHandlerAppFinishedEvent appFinishedEvent =
-            (LogHandlerAppFinishedEvent) event;
-        ApplicationId appId = appFinishedEvent.getApplicationId();
+      }
+      LogDeleterRunnable logDeleter = new LogDeleterRunnable(user, appId);
+      long appLogSize = calculateSizeOfAppLogs(user, appId);
+      long deletionTimestamp = System.currentTimeMillis()
+          + this.deleteDelaySeconds * 1000;
+      LogDeleterProto deleterProto = LogDeleterProto.newBuilder()
+          .setUser(user)
+          .setDeletionTime(deletionTimestamp)
+          .build();
+      try {
+        stateStore.storeLogDeleter(appId, deleterProto);
+      } catch (IOException e) {
+        LOG.error("Unable to record log deleter state", e);
+      }
+      // delete no delay if log size exceed deleteThreshold
+      if (enableTriggerDeleteBySize && appLogSize >= deleteThreshold) {

Review Comment:
   Hi @leixm thank you for your update.
   
   1. Can we calculate the size of the application log directory only if the 
feature is enabled?
   2. Can we use `sched.schedule(logDeleter, 0, TimeUnit.SECONDS);` to delete 
the files in background?
   
   The code will be like
   ```java
   try {
     boolean logDeleterStarted = false;
     if (enableTriggerDeleteBySize) {
       final long appLogSize = calculateSizeOfAppLogs(user, appId);
       if (appLogSize >= threshold) {
         ...
         sched.schedule(logDeleter, 0, TimeUnit.SECONDS);
         logDeleterStarted = true;
       }
     }
     if (!logDeleterStarted) {
       sched.schedule(logDeleter, this.deleteDelaySeconds, TimeUnit.SECONDS);
     }
   } catch (RejectedExecutionException e) {
     logDeleter.run();
   }
   ```





> trigger deletion of log-dir by size for NonAggregatingLogHandler
> ----------------------------------------------------------------
>
>                 Key: YARN-11277
>                 URL: https://issues.apache.org/jira/browse/YARN-11277
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 3.4.0
>            Reporter: Xianming Lei
>            Priority: Minor
>              Labels: pull-request-available
>
> In our yarn cluster, the log files of some containers are too large, which 
> causes the NodeManager to frequently switch to the unhealthy state. For logs 
> that are too large, we can consider deleting them directly without delaying 
> yarn.nodemanager.log.retain-seconds.
> Cluster environment:
>  # 8k nodes+
>  # 50w+ apps  / day
> Configuration:
>  # yarn.nodemanager.log.retain-seconds=3days
>  # yarn.log-aggregation-enable=false
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to