[ 
https://issues.apache.org/jira/browse/YARN-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840843#comment-17840843
 ] 

ASF GitHub Bot commented on YARN-11674:
---------------------------------------

tomicooler commented on code in PR #6751:
URL: https://github.com/apache/hadoop/pull/6751#discussion_r1579625169


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2CpuResourceHandlerImpl.java:
##########
@@ -0,0 +1,99 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.classification.VisibleForTesting;
+
+/**
+ * An implementation for using CGroups V2 to restrict CPU usage on Linux. The
+ * implementation supports 3 different controls - restrict usage of all YARN
+ * containers, restrict relative usage of individual YARN containers and
+ * restrict usage of individual YARN containers. Admins can set the overall CPU
+ * to be used by all YARN containers - this is implemented by setting
+ * cpu.max to the value desired. If strict resource usage mode is not enabled,
+ * cpu.weight is set for individual containers - this prevents containers from
+ * exceeding the overall limit for YARN containers but individual containers
+ * can use as much of the CPU as available(under the YARN limit). If strict
+ * resource usage is enabled, then container can only use the percentage of
+ * CPU allocated to them and this is again implemented using cpu.max.
+ */
+@InterfaceStability.Unstable
+@InterfaceAudience.Private
+public class CGroupsV2CpuResourceHandlerImpl extends 
AbstractCGroupsCpuResourceHandler {
+  private static final CGroupsHandler.CGroupController CPU =
+      CGroupsHandler.CGroupController.CPU;
+
+  @VisibleForTesting
+  static final int CPU_DEFAULT_WEIGHT = 100; // cgroup v2 default
+  static final int CPU_DEFAULT_WEIGHT_OPPORTUNISTIC = 1;
+  static final int CPU_MAX_WEIGHT = 10000;
+  static final String NO_LIMIT = "max";
+
+
+  CGroupsV2CpuResourceHandlerImpl(CGroupsHandler cGroupsHandler) {
+    super(cGroupsHandler);
+  }
+
+  @Override
+  protected void updateCgroupMaxCpuLimit(String cgroupId, String max, String 
period)
+      throws ResourceHandlerException {
+    String cpuMaxLimit = cGroupsHandler.getCGroupParam(CPU, cgroupId,

Review Comment:
   Maybe a small doc here about the file format: 
https://docs.kernel.org/admin-guide/cgroup-v2.html#cpu-interface-files
   
   ```
   A read-write two value file which exists on non-root cgroups. The default is 
“max 100000”.
   
   The maximum bandwidth limit. It’s in the following format:
   
   $MAX $PERIOD
   which indicates that the group may consume up to $MAX in each $PERIOD 
duration. “max” for $MAX indicates no limit. If only one number is written, 
$MAX is updated.
   ```
   
   
   rename: cpuMaxLimit  -> currentCpuMax



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/AbstractCGroupsCpuResourceHandler.java:
##########
@@ -0,0 +1,219 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.yarn.api.records.ContainerId;
+import org.apache.hadoop.yarn.api.records.ExecutionType;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.security.ContainerTokenIdentifier;
+import 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
+import 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperation;
+import org.apache.hadoop.yarn.server.nodemanager.util.NodeManagerHardwareUtils;
+import org.apache.hadoop.yarn.util.ResourceCalculatorPlugin;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.List;
+
+@InterfaceStability.Unstable
+@InterfaceAudience.Private
+public abstract class AbstractCGroupsCpuResourceHandler implements 
CpuResourceHandler {
+
+  static final Logger LOG =
+       LoggerFactory.getLogger(AbstractCGroupsCpuResourceHandler.class);
+
+  protected CGroupsHandler cGroupsHandler;
+  private boolean strictResourceUsageMode = false;
+  private float yarnProcessors;
+  private int nodeVCores;
+  private static final CGroupsHandler.CGroupController CPU =
+      CGroupsHandler.CGroupController.CPU;
+
+  @VisibleForTesting
+  static final int MAX_QUOTA_US = 1000 * 1000;
+  @VisibleForTesting
+  static final int MIN_PERIOD_US = 1000;
+
+  AbstractCGroupsCpuResourceHandler(CGroupsHandler cGroupsHandler) {
+    this.cGroupsHandler = cGroupsHandler;
+  }
+
+  @Override
+  public List<PrivilegedOperation> bootstrap(Configuration conf)
+      throws ResourceHandlerException {
+    return bootstrap(
+        ResourceCalculatorPlugin.getResourceCalculatorPlugin(null, conf), 
conf);
+  }
+
+  @VisibleForTesting
+  List<PrivilegedOperation> bootstrap(
+      ResourceCalculatorPlugin plugin, Configuration conf)
+      throws ResourceHandlerException {
+    this.strictResourceUsageMode = conf.getBoolean(
+        YarnConfiguration.NM_LINUX_CONTAINER_CGROUPS_STRICT_RESOURCE_USAGE,
+        
YarnConfiguration.DEFAULT_NM_LINUX_CONTAINER_CGROUPS_STRICT_RESOURCE_USAGE);
+    this.cGroupsHandler.initializeCGroupController(CPU);
+    nodeVCores = NodeManagerHardwareUtils.getVCores(plugin, conf);
+
+    // cap overall usage to the number of cores allocated to YARN
+    yarnProcessors = NodeManagerHardwareUtils.getContainersCPUs(plugin, conf);
+    int systemProcessors = NodeManagerHardwareUtils.getNodeCPUs(plugin, conf);
+    boolean existingCpuLimits;
+    existingCpuLimits = cpuLimitExists(
+        cGroupsHandler.getPathForCGroup(CPU, ""));
+
+    if (systemProcessors != (int) yarnProcessors) {
+      LOG.info("YARN containers restricted to " + yarnProcessors + " cores");
+      int[] limits = getOverallLimits(yarnProcessors);
+      updateCgroupMaxCpuLimit("", String.valueOf(limits[1]), 
String.valueOf(limits[0]));
+    } else if (existingCpuLimits) {
+      LOG.info("Removing CPU constraints for YARN containers.");
+      updateCgroupMaxCpuLimit("", String.valueOf(-1), null);
+    }
+    return null;
+  }
+
+  protected abstract void updateCgroupMaxCpuLimit(String cgroupId, String 
quota, String period)
+      throws ResourceHandlerException;
+  protected abstract boolean cpuLimitExists(String path) throws 
ResourceHandlerException;
+
+
+  @VisibleForTesting
+  @InterfaceAudience.Private
+  public static int[] getOverallLimits(float yarnProcessors) {

Review Comment:
   NIT. For readability we could introduce a wrapper over this that return an 
object with explicit `period` and `quota` fields. The newly introduced 
`updateCgroupMaxCpuLimit` and alike methods could use this object as parameter.
   
   Since `getOverallLimits` is used in the depcreated LCE code as well we, must 
keep this too, so I'm fine by keeping it as is.





> Update CpuResourceHandler implementation for cgroup v2 support
> --------------------------------------------------------------
>
>                 Key: YARN-11674
>                 URL: https://issues.apache.org/jira/browse/YARN-11674
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Benjamin Teke
>            Assignee: Benjamin Teke
>            Priority: Major
>              Labels: pull-request-available
>
> cgroup v2 has some changes in various controllers (some changed their 
> functionality, some were removed). This task is about checking if 
> CpuResourceHandler's 
> [implementation|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsCpuResourceHandlerImpl.java#L60]
>  need any updates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to