[
https://issues.apache.org/jira/browse/YARN-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benjamin Teke updated YARN-11743:
---------------------------------
Description:
Cgroup v1/v2 mixed mode support was introduced in YARN-11692, however it does
not support an edgecase where NM has the cgroup v2 support enabled (using
{{yarn.nodemanager.linux-container-executor.cgroups.v2.enabled}} set to true),
but there are only cgroup v1 controllers mounted. In larger clusters there is a
chance that some part of the cluster is already on newer OSes with cgroup v2 as
a default, and others are still using v1.
Currently trying to launch an NM with cgroup v2 support enabled will fail if
there are no cgroup.controllers file present:
{code:java}
Failed to initialize controller paths! Exception:
java.io.IOException: No cgroup controllers file found in the directory
specified: /var/lib/yarn-ce/cgroups
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.readControllersFile(CGroupsV2HandlerImpl.java:130)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.parsePreConfiguredMountPath(CGroupsV2HandlerImpl.java:101)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.initializeControllerPaths(AbstractCGroupsHandler.java:133)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.init(AbstractCGroupsHandler.java:107)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.<init>(AbstractCGroupsHandler.java:103)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.<init>(CGroupsV2HandlerImpl.java:71)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.<init>(CGroupsV2HandlerImpl.java:83)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeCGroupV2Handler(ResourceHandlerModule.java:106)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeCGroupHandlers(ResourceHandlerModule.java:83)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initCGroupsCpuResourceHandler(ResourceHandlerModule.java:177)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeConfiguredResourceHandlerChain(ResourceHandlerModule.java:334)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.getConfiguredResourceHandlerChain(ResourceHandlerModule.java:383)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:314)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:427)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:974)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054)
{code}
Basically
[readControllersFile|https://github.com/apache/hadoop/blob/950b2ff773fa828eb13bed7c3fe6b3d52c7fff18/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2HandlerImpl.java#L127]'s
thrown error should be handled if the required controllers are mounted in v1.
was:
Cgroup v1/v2 mixed mode support was introduced in YARN-11692, however it does
not support an edgecase where NM has the cgroup v2 support enabled (using
{{yarn.nodemanager.linux-container-executor.cgroups.v2.enabled}} set to true),
but there are only cgroup v1 controllers mounted. In larger clusters there is a
chance that some part of the cluster is already on newer OSes with cgroup v2 as
a default, and others are still using v1.
Currently trying to launch an NM with cgroup v2 support enabled will fail if
there are no cgroup.controllers file present.
> Cgroup v2 support should fall back to v1 when there are no v2 controllers
> -------------------------------------------------------------------------
>
> Key: YARN-11743
> URL: https://issues.apache.org/jira/browse/YARN-11743
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn
> Reporter: Benjamin Teke
> Priority: Major
>
> Cgroup v1/v2 mixed mode support was introduced in YARN-11692, however it does
> not support an edgecase where NM has the cgroup v2 support enabled (using
> {{yarn.nodemanager.linux-container-executor.cgroups.v2.enabled}} set to
> true), but there are only cgroup v1 controllers mounted. In larger clusters
> there is a chance that some part of the cluster is already on newer OSes with
> cgroup v2 as a default, and others are still using v1.
> Currently trying to launch an NM with cgroup v2 support enabled will fail if
> there are no cgroup.controllers file present:
> {code:java}
> Failed to initialize controller paths! Exception:
> java.io.IOException: No cgroup controllers file found in the directory
> specified: /var/lib/yarn-ce/cgroups
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.readControllersFile(CGroupsV2HandlerImpl.java:130)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.parsePreConfiguredMountPath(CGroupsV2HandlerImpl.java:101)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.initializeControllerPaths(AbstractCGroupsHandler.java:133)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.init(AbstractCGroupsHandler.java:107)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.<init>(AbstractCGroupsHandler.java:103)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.<init>(CGroupsV2HandlerImpl.java:71)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.<init>(CGroupsV2HandlerImpl.java:83)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeCGroupV2Handler(ResourceHandlerModule.java:106)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeCGroupHandlers(ResourceHandlerModule.java:83)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initCGroupsCpuResourceHandler(ResourceHandlerModule.java:177)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeConfiguredResourceHandlerChain(ResourceHandlerModule.java:334)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.getConfiguredResourceHandlerChain(ResourceHandlerModule.java:383)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:314)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:427)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:974)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054)
> {code}
> Basically
> [readControllersFile|https://github.com/apache/hadoop/blob/950b2ff773fa828eb13bed7c3fe6b3d52c7fff18/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2HandlerImpl.java#L127]'s
> thrown error should be handled if the required controllers are mounted in v1.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]