[
https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611965#comment-17611965
]
ASF GitHub Bot commented on YARN-11277:
---------------------------------------
aajisaka commented on code in PR #4797:
URL: https://github.com/apache/hadoop/pull/4797#discussion_r985128293
##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml:
##########
@@ -4954,6 +4964,18 @@
</property>
<property>
+ <name>yarn.nodemanager.log.delete.threshold.mb</name>
+ <value>102400</value>
+ <description>
+ Optional.
+ Default is 102400
Review Comment:
Would you remove this line as it is already documented in
<value>102400</value>?
##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml:
##########
@@ -4954,6 +4964,18 @@
</property>
<property>
+ <name>yarn.nodemanager.log.delete.threshold.mb</name>
+ <value>102400</value>
+ <description>
+ Optional.
+ Default is 102400
+ Trigger log-dir deletion when size bigger than
Review Comment:
What size? Total log size or the largest log file size?
##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml:
##########
@@ -4943,6 +4943,16 @@
</description>
</property>
+ <property>
+ <name>yarn.nodemanager.log.trigger.delete.by-size.enabled</name>
+ <value>false</value>
+ <description>
+ Optional.
+ Default is false
Review Comment:
Remove this line. false is already documented in `<value>false</value>`.
##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/NonAggregatingLogHandler.java:
##########
@@ -90,6 +93,12 @@ protected void serviceInit(Configuration conf) throws
Exception {
this.deleteDelaySeconds =
conf.getLong(YarnConfiguration.NM_LOG_RETAIN_SECONDS,
YarnConfiguration.DEFAULT_NM_LOG_RETAIN_SECONDS);
+ this.enableTriggerDeleteBySize =
+
conf.getBoolean(YarnConfiguration.NM_LOG_TRIGGER_DELETE_BY_SIZE_ENABLED,
+ YarnConfiguration.DEFAULT_NM_LOG_TRIGGER_DELETE_BY_SIZE_ENABLED);
+ this.deleteThresholdMb =
+ conf.getLong(YarnConfiguration.NM_LOG_DELETE_THRESHOLD_MB,
+ YarnConfiguration.DEFAULE_NM_LOG_DELETE_THRESHOLD_MB);
Review Comment:
I recommend to remove `mb` from the parameter name and use
`conf.getLongBytes` to allow suffix such as `100g`. Also, could you document
how to use the suffix in yarn-site.xml as below?
```
You can use the following suffix (case insensitive): k(kilo), m(mega),
g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g,
etc.), Or provide complete size in bytes (such as 134217728 for 128 MB).
```
##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java:
##########
@@ -4694,6 +4694,16 @@ public static boolean areNodeLabelsEnabled(
public static final String DEFAULT_YARN_WORKFLOW_ID_TAG_PREFIX =
"workflowid:";
+ /** Enabled trigger log-dir deletion by size for NonAggregatingLogHandler. */
+ public static final String NM_LOG_TRIGGER_DELETE_BY_SIZE_ENABLED = NM_PREFIX
+
+ "log.trigger.delete.by-size.enabled";
+ public static final boolean DEFAULT_NM_LOG_TRIGGER_DELETE_BY_SIZE_ENABLED =
false;
+
+ /** Trigger log-dir deletion when size bigger than
yarn.nodemanager.log.delete.threshold.mb.
+ * Depends on yarn.nodemanager.log.trigger.delete.by-size.enabled = true. */
+ public static final String NM_LOG_DELETE_THRESHOLD_MB = NM_PREFIX +
"log.delete.threshold.mb";
+ public static final long DEFAULE_NM_LOG_DELETE_THRESHOLD_MB = 100 * 1024;
Review Comment:
typo: DEFAULE -> DEFAULT
> trigger deletion of log-dir by size for NonAggregatingLogHandler
> ----------------------------------------------------------------
>
> Key: YARN-11277
> URL: https://issues.apache.org/jira/browse/YARN-11277
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 3.4.0
> Reporter: Xianming Lei
> Priority: Minor
> Labels: pull-request-available
>
> In our yarn cluster, the log files of some containers are too large, which
> causes the NodeManager to frequently switch to the unhealthy state. For logs
> that are too large, we can consider deleting them directly without delaying
> yarn.nodemanager.log.retain-seconds.
> Cluster environment:
> # 8k nodes+
> # 50w+ apps / day
> Configuration:
> # yarn.nodemanager.log.retain-seconds=3days
> # yarn.log-aggregation-enable=false
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]