[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler

ASF GitHub Bot (Jira) Sat, 01 Oct 2022 11:23:05 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611965#comment-17611965
 ]


ASF GitHub Bot commented on YARN-11277:
---------------------------------------

aajisaka commented on code in PR #4797:
URL: https://github.com/apache/hadoop/pull/4797#discussion_r985128293


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml:
##########
@@ -4954,6 +4964,18 @@
   </property>
 
   <property>
+    <name>yarn.nodemanager.log.delete.threshold.mb</name>
+    <value>102400</value>
+    <description>
+      Optional.
+      Default is 102400

Review Comment:
   Would you remove this line as it is already documented in 
<value>102400</value>?



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml:
##########
@@ -4954,6 +4964,18 @@
   </property>
 
   <property>
+    <name>yarn.nodemanager.log.delete.threshold.mb</name>
+    <value>102400</value>
+    <description>
+      Optional.
+      Default is 102400
+      Trigger log-dir deletion when size bigger than

Review Comment:
   What size? Total log size or the largest log file size?



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml:
##########
@@ -4943,6 +4943,16 @@
     </description>
   </property>
 
+  <property>
+    <name>yarn.nodemanager.log.trigger.delete.by-size.enabled</name>
+    <value>false</value>
+    <description>
+      Optional.
+      Default is false

Review Comment:
   Remove this line. false is already documented in `<value>false</value>`.



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/NonAggregatingLogHandler.java:
##########
@@ -90,6 +93,12 @@ protected void serviceInit(Configuration conf) throws 
Exception {
     this.deleteDelaySeconds =
         conf.getLong(YarnConfiguration.NM_LOG_RETAIN_SECONDS,
                 YarnConfiguration.DEFAULT_NM_LOG_RETAIN_SECONDS);
+    this.enableTriggerDeleteBySize =
+        
conf.getBoolean(YarnConfiguration.NM_LOG_TRIGGER_DELETE_BY_SIZE_ENABLED,
+            YarnConfiguration.DEFAULT_NM_LOG_TRIGGER_DELETE_BY_SIZE_ENABLED);
+    this.deleteThresholdMb =
+        conf.getLong(YarnConfiguration.NM_LOG_DELETE_THRESHOLD_MB,
+            YarnConfiguration.DEFAULE_NM_LOG_DELETE_THRESHOLD_MB);

Review Comment:
   I recommend to remove `mb` from the parameter name and use 
`conf.getLongBytes` to allow suffix such as  `100g`. Also, could you document 
how to use the suffix in yarn-site.xml as below?
   ```
   You can use the following suffix (case insensitive): k(kilo), m(mega), 
g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, 
etc.), Or provide complete size in bytes (such as 134217728 for 128 MB).
   ```



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java:
##########
@@ -4694,6 +4694,16 @@ public static boolean areNodeLabelsEnabled(
   public static final String DEFAULT_YARN_WORKFLOW_ID_TAG_PREFIX =
       "workflowid:";
 
+  /** Enabled trigger log-dir deletion by size for NonAggregatingLogHandler. */
+  public static final String NM_LOG_TRIGGER_DELETE_BY_SIZE_ENABLED = NM_PREFIX 
+
+      "log.trigger.delete.by-size.enabled";
+  public static final boolean DEFAULT_NM_LOG_TRIGGER_DELETE_BY_SIZE_ENABLED = 
false;
+
+  /** Trigger log-dir deletion when size bigger than 
yarn.nodemanager.log.delete.threshold.mb.
+   *  Depends on yarn.nodemanager.log.trigger.delete.by-size.enabled = true. */
+  public static final String NM_LOG_DELETE_THRESHOLD_MB = NM_PREFIX + 
"log.delete.threshold.mb";
+  public static final long DEFAULE_NM_LOG_DELETE_THRESHOLD_MB = 100 * 1024;

Review Comment:
   typo: DEFAULE -> DEFAULT





> trigger deletion of log-dir by size for NonAggregatingLogHandler
> ----------------------------------------------------------------
>
>                 Key: YARN-11277
>                 URL: https://issues.apache.org/jira/browse/YARN-11277
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 3.4.0
>            Reporter: Xianming Lei
>            Priority: Minor
>              Labels: pull-request-available
>
> In our yarn cluster, the log files of some containers are too large, which 
> causes the NodeManager to frequently switch to the unhealthy state. For logs 
> that are too large, we can consider deleting them directly without delaying 
> yarn.nodemanager.log.retain-seconds.
> Cluster environment:
>  # 8k nodes+
>  # 50w+ apps  / day
> Configuration:
>  # yarn.nodemanager.log.retain-seconds=3days
>  # yarn.log-aggregation-enable=false
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler

Reply via email to