[jira] [Commented] (YARN-10565) Refactor CS queue initialization to simplify weight mode calculation

2022-01-03 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468411#comment-17468411
 ] 

Andras Gyori commented on YARN-10565:
-

[~bteke] Is it still a viable fix? I think its description is obsolete. 

> Refactor CS queue initialization to simplify weight mode calculation
> 
>
> Key: YARN-10565
> URL: https://issues.apache.org/jira/browse/YARN-10565
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10565.001.patch, YARN-10565.002.patch
>
>
> In YARN-10504 weight mode support was introduced to CS. This jira is a 
> followup to simplify and restructure the initialization, so that the weight 
> calculation/absolute/percentage mode is easier to understand and modify.
> To be refactored:
> * In ParentQueue.java#1099 the error message should be more specific, instead 
> of the {{LOG.error("Fatal issue found: e", e);}}
> * AutoCreatedLeafQueue.clearConfigurableFields should clear NORMALIZED_WEIGHT 
> just to be on the safe side
> * Uncomment the commented assertions in 
> TestCapacitySchedulerAutoCreatedQueueBase.validateEffectiveMinResource
> * Check whether the assertion modification in TestRMWebServices is absolutely 
> necessary or could be hiding a bug.
> * Same for TestRMWebServicesForCSWithPartitions.java
> Additional information:
> The original flow was modified to allow the dynamic weight-capacity 
> calculation. 
> This resulted in a new flow, which is now harder to understand.
> With a cleanup it could be made simpler, the duplicate calculations could be 
> avoided. 
> The changed functionality should either be explained (if deemed correct) or 
> fixed (see YARN-10590).
> Investigate how the CS reinit works, it could contain some possibly redundant 
> initialization code fragments.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10925) Simplify AbstractCSQueue#setupQueueConfigs

2022-01-03 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468409#comment-17468409
 ] 

Andras Gyori commented on YARN-10925:
-

[~bteke] [~snemeth] I find setupQueueConfigs already in a good enough state. 
What is your opinion about it?

> Simplify AbstractCSQueue#setupQueueConfigs
> --
>
> Key: YARN-10925
> URL: https://issues.apache.org/jira/browse/YARN-10925
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11057) NodeManager may generate too many empty log dirs when we configure many log dirs

2022-01-03 Thread Yao Guangdong (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Guangdong updated YARN-11057:
-
Description: NodeManager may generate too many empty log dirs when we 
configure many log dirs in NonAggregationLogHandler mode.For example: We have 
24 disks, 512G memory,hypothesis that average time cost is 1 min for every 
container  and average container's size is 4g.Then parallel running containers 
in one server are 512G / 4G = 128. Every container will generate more than 24 
directories in current policy.Then total directories in one week is 128 * 24 * 
(60 * 24 * 7) = 30 965 760 . Which will consume too many inods in server and 
affect   (was: NodeManager may generate too many empty log dirs when we 
configure many log dirs in NonAggregationLogHandler mode.For example: We have 
24 disks, 512G memory,average time for one container )

> NodeManager may generate too many empty log dirs when we configure many log 
> dirs
> 
>
> Key: YARN-11057
> URL: https://issues.apache.org/jira/browse/YARN-11057
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.7.7, 3.3.1
>Reporter: Yao Guangdong
>Assignee: Yao Guangdong
>Priority: Major
>
> NodeManager may generate too many empty log dirs when we configure many log 
> dirs in NonAggregationLogHandler mode.For example: We have 24 disks, 512G 
> memory,hypothesis that average time cost is 1 min for every container  and 
> average container's size is 4g.Then parallel running containers in one server 
> are 512G / 4G = 128. Every container will generate more than 24 directories 
> in current policy.Then total directories in one week is 128 * 24 * (60 * 24 * 
> 7) = 30 965 760 . Which will consume too many inods in server and affect 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11057) NodeManager may generate too many empty log dirs when we configure many log dirs

2022-01-03 Thread Yao Guangdong (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Guangdong updated YARN-11057:
-
Description: NodeManager may generate too many empty log dirs when we 
configure many log dirs in NonAggregationLogHandler mode.For example: We have 
24 disks, 512G memory,average time for one container   (was: NodeManager may 
generate too many empty log dirs when we configure many log dirs in 
NonAggregationLogHandler mode.For example: We have 24 disks, 512G memory,)

> NodeManager may generate too many empty log dirs when we configure many log 
> dirs
> 
>
> Key: YARN-11057
> URL: https://issues.apache.org/jira/browse/YARN-11057
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.7.7, 3.3.1
>Reporter: Yao Guangdong
>Assignee: Yao Guangdong
>Priority: Major
>
> NodeManager may generate too many empty log dirs when we configure many log 
> dirs in NonAggregationLogHandler mode.For example: We have 24 disks, 512G 
> memory,average time for one container 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11053) AuxService should not use class name as default system classes

2022-01-03 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-11053:
-
Fix Version/s: 3.3.2
   (was: 3.3.3)

Cherry-picked to branch-3.3.2.

> AuxService should not use class name as default system classes
> --
>
> Key: YARN-11053
> URL: https://issues.apache.org/jira/browse/YARN-11053
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: auxservices
>Affects Versions: 3.3.1
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Following Apache Spark document to configure Spark Shuffle Service as YARN 
> AuxService,
> [https://spark.apache.org/docs/3.2.0/running-on-yarn.html#running-multiple-versions-of-the-spark-shuffle-service]
>  
> {code:java}
>   
> yarn.nodemanager.aux-services
> spark_shuffle
>   
>   
> yarn.nodemanager.aux-services.spark_shuffle.classpath
> /opt/apache/spark/yarn/*
>   
>   
> 
> yarn.nodemanager.aux-services.spark_shuffle.class</name>
> org.apache.spark.network.yarn.YarnShuffleService
>{code}
>  but failed with exception
> {code:java}
> 2021-12-02 15:34:00,886 INFO util.ApplicationClassLoader: classpath: 
> [file:/opt/apache/spark/yarn/spark-3.2.0-yarn-shuffle.jar]
> 2021-12-02 15:34:00,886 INFO util.ApplicationClassLoader: system classes: 
> [org.apache.spark.network.yarn.YarnShuffleService]
> 2021-12-02 15:34:00,887 INFO service.AbstractService: Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
> in state INITED
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.ClassNotFoundException: 
> org.apache.spark.network.yarn.YarnShuffleService
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:482)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:761)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:327)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:494)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:962)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1042)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.network.yarn.YarnShuffleService
> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
> at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:165)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.createAuxServiceFromLocalClasspath(AuxServices.java:242)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.createAuxService(AuxServices.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:452)
> ... 10 more
> {code}
> A workaround is adding
> {code:java}
> 
> yarn.nodemanager.aux-services.spark_shuffle.system-classes
> not.existed.class
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11055) In cgroups-operations.c some fprintf format strings don't end with "\n"

2022-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11055:
--
Labels: cgroups easyfix pull-request-available  (was: cgroups easyfix)

> In cgroups-operations.c some fprintf format strings don't end with "\n" 
> 
>
> Key: YARN-11055
> URL: https://issues.apache.org/jira/browse/YARN-11055
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.3.1
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Minor
>  Labels: cgroups, easyfix, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In cgroup-operations.c some {{{}fprintf{}}}s are missing a newline char at 
> the end leading to a hard-to-parse error message output 
> example: 
> https://github.com/apache/hadoop/blame/b225287913ac366a531eacfa0266adbdf03d883e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.c#L130
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-11055) In cgroups-operations.c some fprintf format strings don't end with "\n"

2022-01-03 Thread Gera Shegalov (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov reassigned YARN-11055:


Assignee: Gera Shegalov

> In cgroups-operations.c some fprintf format strings don't end with "\n" 
> 
>
> Key: YARN-11055
> URL: https://issues.apache.org/jira/browse/YARN-11055
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.3.1
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Minor
>  Labels: cgroups, easyfix
>
> In cgroup-operations.c some {{{}fprintf{}}}s are missing a newline char at 
> the end leading to a hard-to-parse error message output 
> example: 
> https://github.com/apache/hadoop/blame/b225287913ac366a531eacfa0266adbdf03d883e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.c#L130
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org