[ 
https://issues.apache.org/jira/browse/YARN-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845916#comment-17845916
 ] 

ASF GitHub Bot commented on YARN-11692:
---------------------------------------

brumi1024 commented on code in PR #6821:
URL: https://github.com/apache/hadoop/pull/6821#discussion_r1598441803


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java:
##########
@@ -101,18 +120,18 @@ private static CGroupsHandler 
getInitializedCGroupsHandler(Configuration conf)
    */
 
   public static CGroupsHandler getCGroupsHandler() {
-    return cGroupsHandler;
+    return cgroupsV2Enabled ? cGroupsV2Handler : cGroupsV1Handler;

Review Comment:
   getCGroupsHandler method is called from DefaultOOMHandler, 
CGroupElasticMemoryController, which seemingly rely on v1 memory limits, so the 
related changes should be addressed in 
[YARN-11678](https://issues.apache.org/jira/browse/YARN-11678). It's also 
called from the CGroupsResourceCalculator, which will handle the v1/v2 
differences separately. 
   
   It's called from OCIContainerRuntime, but that class doesn't use it 
cgroupHandler, DockerLinuxContainerRuntime calls it to check resource limits, 
and RuncContainerRuntime calls it for checking and storing cpu limits. Having 
this return the v1 or v2 handler based on the config will result in issues, 
even in mixed mode, where for example the CPU is mounted in a v1 structure, so 
I suggest leaving this as is for now, and updating it once 
DockerLinuxContainerRuntime and RuncContainerRuntime is updated to handle the 
v2 files.



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java:
##########
@@ -63,35 +63,54 @@ public class ResourceHandlerModule {
    * as resource metrics functionality. We need to ensure that the same
    * instance is used for both.
    */
+  private static volatile CGroupsHandler cGroupsV1Handler;
+  private static volatile CGroupsHandler cGroupsV2Handler;
   private static volatile TrafficControlBandwidthHandlerImpl
       trafficControlBandwidthHandler;
   private static volatile NetworkPacketTaggingHandlerImpl
       networkPacketTaggingHandlerImpl;
-  private static volatile CGroupsHandler cGroupsHandler;
   private static volatile CGroupsBlkioResourceHandlerImpl
       cGroupsBlkioResourceHandler;
   private static volatile MemoryResourceHandler
       cGroupsMemoryResourceHandler;
   private static volatile CpuResourceHandler
       cGroupsCpuResourceHandler;
 
-  /**
-   * Returns an initialized, thread-safe CGroupsHandler instance.
-   */
-  private static CGroupsHandler getInitializedCGroupsHandler(Configuration 
conf)
+  private static void initializeCGroupsHandlers(Configuration conf) throws 
ResourceHandlerException {
+    initializeCGroupsV1Handler(conf);
+    if (cgroupsV2Enabled) {
+      initializeCGroupsV2Handler(conf);
+    }
+  }
+
+  private static void initializeCGroupsV1Handler(Configuration conf)
       throws ResourceHandlerException {
-    if (cGroupsHandler == null) {
+    if (cGroupsV1Handler == null) {
       synchronized (CGroupsHandler.class) {
-        if (cGroupsHandler == null) {
-          cGroupsHandler = cgroupsV2Enabled
-              ? new CGroupsV2HandlerImpl(conf, 
PrivilegedOperationExecutor.getInstance(conf))
-              : new CGroupsHandlerImpl(conf, 
PrivilegedOperationExecutor.getInstance(conf));
-          LOG.debug("Value of CGroupsHandler is: {}", cGroupsHandler);
+        if (cGroupsV1Handler == null) {

Review Comment:
   Do we need the null check twice?



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml:
##########
@@ -2087,6 +2087,20 @@
     <name>yarn.nodemanager.linux-container-executor.cgroups.mount-path</name>
   </property>
 
+  <property>
+    <description>This property sets the mount path for CGroups v2.
+      This parameter is optional, and needed to be set only in mixed mode,
+      when CGroups v2 is mounted in a different path than Cgroups v1.
+      For example, in hybrid mode, CGroups v1 controllers can be mounted in
+      /sys/fs/cgroup, while v2 can be mounted in /sys/fs/cgroup/unified folder.

Review Comment:
   > v1 controllers can be mounted in /sys/fs/cgroup
   
   Might be a bit misleading in this format, maybe say something like this:
   "v1 controllers can be mounted under /sys/fs/cgroup/ (for example 
/sys/fs/cgroup/cpu,cpuacct)". Otherwise it can seem like the cgroup v2 FS is 
mounted under a cgroup v1 FS.



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java:
##########
@@ -63,35 +63,54 @@ public class ResourceHandlerModule {
    * as resource metrics functionality. We need to ensure that the same
    * instance is used for both.
    */
+  private static volatile CGroupsHandler cGroupsV1Handler;

Review Comment:
   Nit, since we're renaming both variables. According to the [official 
terminology](https://docs.kernel.org/admin-guide/cgroup-v2.html):
   
   > “cgroup” stands for “control group” and is never capitalized. The singular 
form is used to designate the whole feature and also as a qualifier as in 
“cgroup controllers”. When explicitly referring to multiple individual control 
groups, the plural form “cgroups” is used.
   
   I think cGroupsV1Handler/cGroupsV2Handler should actually be 
cgroupV1Handler/cgroupV2Handler. I wanted to do this in YARN-11672, but since I 
haven't touched the original handler variable postponed it - this seems like a 
good place to change it, if you agree.



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java:
##########
@@ -63,35 +63,54 @@ public class ResourceHandlerModule {
    * as resource metrics functionality. We need to ensure that the same
    * instance is used for both.
    */
+  private static volatile CGroupsHandler cGroupsV1Handler;
+  private static volatile CGroupsHandler cGroupsV2Handler;
   private static volatile TrafficControlBandwidthHandlerImpl
       trafficControlBandwidthHandler;
   private static volatile NetworkPacketTaggingHandlerImpl
       networkPacketTaggingHandlerImpl;
-  private static volatile CGroupsHandler cGroupsHandler;
   private static volatile CGroupsBlkioResourceHandlerImpl
       cGroupsBlkioResourceHandler;
   private static volatile MemoryResourceHandler
       cGroupsMemoryResourceHandler;
   private static volatile CpuResourceHandler
       cGroupsCpuResourceHandler;
 
-  /**
-   * Returns an initialized, thread-safe CGroupsHandler instance.
-   */
-  private static CGroupsHandler getInitializedCGroupsHandler(Configuration 
conf)
+  private static void initializeCGroupsHandlers(Configuration conf) throws 
ResourceHandlerException {

Review Comment:
   Nit: based on the comment above: initializeCGroupsHandlers -> 
initializeCgroupHandlers, and the same applies for the individual init methods.





> Support mixed cgroup v1/v2 controller structure
> -----------------------------------------------
>
>                 Key: YARN-11692
>                 URL: https://issues.apache.org/jira/browse/YARN-11692
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Benjamin Teke
>            Assignee: Peter Szucs
>            Priority: Major
>              Labels: pull-request-available
>
> There were heavy changes on the device side in cgroup v2. To keep supporting 
> FGPAs and GPUs short term, mixed structures where some of the cgroup 
> controllers are from v1 while others from v2 should be supported. More info: 
> https://dropbear.xyz/2023/05/23/devices-with-cgroup-v2/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to