[ https://issues.apache.org/jira/browse/YARN-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840843#comment-17840843 ]
ASF GitHub Bot commented on YARN-11674: --------------------------------------- tomicooler commented on code in PR #6751: URL: https://github.com/apache/hadoop/pull/6751#discussion_r1579625169 ########## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2CpuResourceHandlerImpl.java: ########## @@ -0,0 +1,99 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; +import org.apache.hadoop.classification.VisibleForTesting; + +/** + * An implementation for using CGroups V2 to restrict CPU usage on Linux. The + * implementation supports 3 different controls - restrict usage of all YARN + * containers, restrict relative usage of individual YARN containers and + * restrict usage of individual YARN containers. Admins can set the overall CPU + * to be used by all YARN containers - this is implemented by setting + * cpu.max to the value desired. If strict resource usage mode is not enabled, + * cpu.weight is set for individual containers - this prevents containers from + * exceeding the overall limit for YARN containers but individual containers + * can use as much of the CPU as available(under the YARN limit). If strict + * resource usage is enabled, then container can only use the percentage of + * CPU allocated to them and this is again implemented using cpu.max. + */ +@InterfaceStability.Unstable +@InterfaceAudience.Private +public class CGroupsV2CpuResourceHandlerImpl extends AbstractCGroupsCpuResourceHandler { + private static final CGroupsHandler.CGroupController CPU = + CGroupsHandler.CGroupController.CPU; + + @VisibleForTesting + static final int CPU_DEFAULT_WEIGHT = 100; // cgroup v2 default + static final int CPU_DEFAULT_WEIGHT_OPPORTUNISTIC = 1; + static final int CPU_MAX_WEIGHT = 10000; + static final String NO_LIMIT = "max"; + + + CGroupsV2CpuResourceHandlerImpl(CGroupsHandler cGroupsHandler) { + super(cGroupsHandler); + } + + @Override + protected void updateCgroupMaxCpuLimit(String cgroupId, String max, String period) + throws ResourceHandlerException { + String cpuMaxLimit = cGroupsHandler.getCGroupParam(CPU, cgroupId, Review Comment: Maybe a small doc here about the file format: https://docs.kernel.org/admin-guide/cgroup-v2.html#cpu-interface-files ``` A read-write two value file which exists on non-root cgroups. The default is “max 100000”. The maximum bandwidth limit. It’s in the following format: $MAX $PERIOD which indicates that the group may consume up to $MAX in each $PERIOD duration. “max” for $MAX indicates no limit. If only one number is written, $MAX is updated. ``` rename: cpuMaxLimit -> currentCpuMax ########## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/AbstractCGroupsCpuResourceHandler.java: ########## @@ -0,0 +1,219 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; +import org.apache.hadoop.classification.VisibleForTesting; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.yarn.api.records.ContainerId; +import org.apache.hadoop.yarn.api.records.ExecutionType; +import org.apache.hadoop.yarn.api.records.Resource; +import org.apache.hadoop.yarn.conf.YarnConfiguration; +import org.apache.hadoop.yarn.security.ContainerTokenIdentifier; +import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container; +import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperation; +import org.apache.hadoop.yarn.server.nodemanager.util.NodeManagerHardwareUtils; +import org.apache.hadoop.yarn.util.ResourceCalculatorPlugin; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.File; +import java.util.ArrayList; +import java.util.List; + +@InterfaceStability.Unstable +@InterfaceAudience.Private +public abstract class AbstractCGroupsCpuResourceHandler implements CpuResourceHandler { + + static final Logger LOG = + LoggerFactory.getLogger(AbstractCGroupsCpuResourceHandler.class); + + protected CGroupsHandler cGroupsHandler; + private boolean strictResourceUsageMode = false; + private float yarnProcessors; + private int nodeVCores; + private static final CGroupsHandler.CGroupController CPU = + CGroupsHandler.CGroupController.CPU; + + @VisibleForTesting + static final int MAX_QUOTA_US = 1000 * 1000; + @VisibleForTesting + static final int MIN_PERIOD_US = 1000; + + AbstractCGroupsCpuResourceHandler(CGroupsHandler cGroupsHandler) { + this.cGroupsHandler = cGroupsHandler; + } + + @Override + public List<PrivilegedOperation> bootstrap(Configuration conf) + throws ResourceHandlerException { + return bootstrap( + ResourceCalculatorPlugin.getResourceCalculatorPlugin(null, conf), conf); + } + + @VisibleForTesting + List<PrivilegedOperation> bootstrap( + ResourceCalculatorPlugin plugin, Configuration conf) + throws ResourceHandlerException { + this.strictResourceUsageMode = conf.getBoolean( + YarnConfiguration.NM_LINUX_CONTAINER_CGROUPS_STRICT_RESOURCE_USAGE, + YarnConfiguration.DEFAULT_NM_LINUX_CONTAINER_CGROUPS_STRICT_RESOURCE_USAGE); + this.cGroupsHandler.initializeCGroupController(CPU); + nodeVCores = NodeManagerHardwareUtils.getVCores(plugin, conf); + + // cap overall usage to the number of cores allocated to YARN + yarnProcessors = NodeManagerHardwareUtils.getContainersCPUs(plugin, conf); + int systemProcessors = NodeManagerHardwareUtils.getNodeCPUs(plugin, conf); + boolean existingCpuLimits; + existingCpuLimits = cpuLimitExists( + cGroupsHandler.getPathForCGroup(CPU, "")); + + if (systemProcessors != (int) yarnProcessors) { + LOG.info("YARN containers restricted to " + yarnProcessors + " cores"); + int[] limits = getOverallLimits(yarnProcessors); + updateCgroupMaxCpuLimit("", String.valueOf(limits[1]), String.valueOf(limits[0])); + } else if (existingCpuLimits) { + LOG.info("Removing CPU constraints for YARN containers."); + updateCgroupMaxCpuLimit("", String.valueOf(-1), null); + } + return null; + } + + protected abstract void updateCgroupMaxCpuLimit(String cgroupId, String quota, String period) + throws ResourceHandlerException; + protected abstract boolean cpuLimitExists(String path) throws ResourceHandlerException; + + + @VisibleForTesting + @InterfaceAudience.Private + public static int[] getOverallLimits(float yarnProcessors) { Review Comment: NIT. For readability we could introduce a wrapper over this that return an object with explicit `period` and `quota` fields. The newly introduced `updateCgroupMaxCpuLimit` and alike methods could use this object as parameter. Since `getOverallLimits` is used in the depcreated LCE code as well we, must keep this too, so I'm fine by keeping it as is. > Update CpuResourceHandler implementation for cgroup v2 support > -------------------------------------------------------------- > > Key: YARN-11674 > URL: https://issues.apache.org/jira/browse/YARN-11674 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Benjamin Teke > Assignee: Benjamin Teke > Priority: Major > Labels: pull-request-available > > cgroup v2 has some changes in various controllers (some changed their > functionality, some were removed). This task is about checking if > CpuResourceHandler's > [implementation|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsCpuResourceHandlerImpl.java#L60] > need any updates. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org