[jira] [Commented] (YARN-10565) Refactor CS queue initialization to simplify weight mode calculation
[ https://issues.apache.org/jira/browse/YARN-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468411#comment-17468411 ] Andras Gyori commented on YARN-10565: - [~bteke] Is it still a viable fix? I think its description is obsolete. > Refactor CS queue initialization to simplify weight mode calculation > > > Key: YARN-10565 > URL: https://issues.apache.org/jira/browse/YARN-10565 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10565.001.patch, YARN-10565.002.patch > > > In YARN-10504 weight mode support was introduced to CS. This jira is a > followup to simplify and restructure the initialization, so that the weight > calculation/absolute/percentage mode is easier to understand and modify. > To be refactored: > * In ParentQueue.java#1099 the error message should be more specific, instead > of the {{LOG.error("Fatal issue found: e", e);}} > * AutoCreatedLeafQueue.clearConfigurableFields should clear NORMALIZED_WEIGHT > just to be on the safe side > * Uncomment the commented assertions in > TestCapacitySchedulerAutoCreatedQueueBase.validateEffectiveMinResource > * Check whether the assertion modification in TestRMWebServices is absolutely > necessary or could be hiding a bug. > * Same for TestRMWebServicesForCSWithPartitions.java > Additional information: > The original flow was modified to allow the dynamic weight-capacity > calculation. > This resulted in a new flow, which is now harder to understand. > With a cleanup it could be made simpler, the duplicate calculations could be > avoided. > The changed functionality should either be explained (if deemed correct) or > fixed (see YARN-10590). > Investigate how the CS reinit works, it could contain some possibly redundant > initialization code fragments. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10925) Simplify AbstractCSQueue#setupQueueConfigs
[ https://issues.apache.org/jira/browse/YARN-10925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468409#comment-17468409 ] Andras Gyori commented on YARN-10925: - [~bteke] [~snemeth] I find setupQueueConfigs already in a good enough state. What is your opinion about it? > Simplify AbstractCSQueue#setupQueueConfigs > -- > > Key: YARN-10925 > URL: https://issues.apache.org/jira/browse/YARN-10925 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11057) NodeManager may generate too many empty log dirs when we configure many log dirs
[ https://issues.apache.org/jira/browse/YARN-11057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Guangdong updated YARN-11057: - Description: NodeManager may generate too many empty log dirs when we configure many log dirs in NonAggregationLogHandler mode.For example: We have 24 disks, 512G memory,hypothesis that average time cost is 1 min for every container and average container's size is 4g.Then parallel running containers in one server are 512G / 4G = 128. Every container will generate more than 24 directories in current policy.Then total directories in one week is 128 * 24 * (60 * 24 * 7) = 30 965 760 . Which will consume too many inods in server and affect (was: NodeManager may generate too many empty log dirs when we configure many log dirs in NonAggregationLogHandler mode.For example: We have 24 disks, 512G memory,average time for one container ) > NodeManager may generate too many empty log dirs when we configure many log > dirs > > > Key: YARN-11057 > URL: https://issues.apache.org/jira/browse/YARN-11057 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager >Affects Versions: 2.7.7, 3.3.1 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Major > > NodeManager may generate too many empty log dirs when we configure many log > dirs in NonAggregationLogHandler mode.For example: We have 24 disks, 512G > memory,hypothesis that average time cost is 1 min for every container and > average container's size is 4g.Then parallel running containers in one server > are 512G / 4G = 128. Every container will generate more than 24 directories > in current policy.Then total directories in one week is 128 * 24 * (60 * 24 * > 7) = 30 965 760 . Which will consume too many inods in server and affect -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11057) NodeManager may generate too many empty log dirs when we configure many log dirs
[ https://issues.apache.org/jira/browse/YARN-11057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Guangdong updated YARN-11057: - Description: NodeManager may generate too many empty log dirs when we configure many log dirs in NonAggregationLogHandler mode.For example: We have 24 disks, 512G memory,average time for one container (was: NodeManager may generate too many empty log dirs when we configure many log dirs in NonAggregationLogHandler mode.For example: We have 24 disks, 512G memory,) > NodeManager may generate too many empty log dirs when we configure many log > dirs > > > Key: YARN-11057 > URL: https://issues.apache.org/jira/browse/YARN-11057 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager >Affects Versions: 2.7.7, 3.3.1 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Major > > NodeManager may generate too many empty log dirs when we configure many log > dirs in NonAggregationLogHandler mode.For example: We have 24 disks, 512G > memory,average time for one container -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11053) AuxService should not use class name as default system classes
[ https://issues.apache.org/jira/browse/YARN-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-11053: - Fix Version/s: 3.3.2 (was: 3.3.3) Cherry-picked to branch-3.3.2. > AuxService should not use class name as default system classes > -- > > Key: YARN-11053 > URL: https://issues.apache.org/jira/browse/YARN-11053 > Project: Hadoop YARN > Issue Type: Bug > Components: auxservices >Affects Versions: 3.3.1 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 50m > Remaining Estimate: 0h > > Following Apache Spark document to configure Spark Shuffle Service as YARN > AuxService, > [https://spark.apache.org/docs/3.2.0/running-on-yarn.html#running-multiple-versions-of-the-spark-shuffle-service] > > {code:java} > > yarn.nodemanager.aux-services > spark_shuffle > > > yarn.nodemanager.aux-services.spark_shuffle.classpath > /opt/apache/spark/yarn/* > > > > yarn.nodemanager.aux-services.spark_shuffle.class</name> > org.apache.spark.network.yarn.YarnShuffleService >{code} > but failed with exception > {code:java} > 2021-12-02 15:34:00,886 INFO util.ApplicationClassLoader: classpath: > [file:/opt/apache/spark/yarn/spark-3.2.0-yarn-shuffle.jar] > 2021-12-02 15:34:00,886 INFO util.ApplicationClassLoader: system classes: > [org.apache.spark.network.yarn.YarnShuffleService] > 2021-12-02 15:34:00,887 INFO service.AbstractService: Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed > in state INITED > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.ClassNotFoundException: > org.apache.spark.network.yarn.YarnShuffleService > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:482) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:761) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:327) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:494) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:962) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1042) > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.network.yarn.YarnShuffleService > at java.net.URLClassLoader.findClass(URLClassLoader.java:387) > at java.lang.ClassLoader.loadClass(ClassLoader.java:419) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:165) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.createAuxServiceFromLocalClasspath(AuxServices.java:242) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.createAuxService(AuxServices.java:271) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:452) > ... 10 more > {code} > A workaround is adding > {code:java} > > yarn.nodemanager.aux-services.spark_shuffle.system-classes > not.existed.class > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11055) In cgroups-operations.c some fprintf format strings don't end with "\n"
[ https://issues.apache.org/jira/browse/YARN-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11055: -- Labels: cgroups easyfix pull-request-available (was: cgroups easyfix) > In cgroups-operations.c some fprintf format strings don't end with "\n" > > > Key: YARN-11055 > URL: https://issues.apache.org/jira/browse/YARN-11055 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.3.1 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Minor > Labels: cgroups, easyfix, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In cgroup-operations.c some {{{}fprintf{}}}s are missing a newline char at > the end leading to a hard-to-parse error message output > example: > https://github.com/apache/hadoop/blame/b225287913ac366a531eacfa0266adbdf03d883e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.c#L130 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11055) In cgroups-operations.c some fprintf format strings don't end with "\n"
[ https://issues.apache.org/jira/browse/YARN-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov reassigned YARN-11055: Assignee: Gera Shegalov > In cgroups-operations.c some fprintf format strings don't end with "\n" > > > Key: YARN-11055 > URL: https://issues.apache.org/jira/browse/YARN-11055 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.3.1 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Minor > Labels: cgroups, easyfix > > In cgroup-operations.c some {{{}fprintf{}}}s are missing a newline char at > the end leading to a hard-to-parse error message output > example: > https://github.com/apache/hadoop/blame/b225287913ac366a531eacfa0266adbdf03d883e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.c#L130 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org