Dear Yarn Community,
I hope this email finds you well. I am encountering an issue with Yarn and Cgroups on CentOS 8 and would greatly appreciate your insights and guidance in resolving it. I am using Hadoop version 3.3.4 and I'm trying to activate Cgroups in order to utilize GPUs within a Docker environment. Following the official documentation at Hadoop NodeManager Cgroups Documentation<https://hadoop.apache.org/docs/r3.3.4/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html>, I set the parameter yarn.nodemanager.linux-container-executor.cgroups.mount to false to manage Cgroups myself for security reasons. As CentOS 8 uses Cgroup v1, I configured the following parameters: * yarn.nodemanager.linux-container-executor.cgroups.hierarchy to /hadoop-yarn * yarn.nodemanager.linux-container-executor.cgroups.mount-path to /sys/fs/cgroup Yarn requires three Cgroups: cpu, cpuacct, and devices. To ensure that /hadoop-yarn remains persistent, I installed the libcgroup RPM package and updated /etc/cgconfig.conf as follows: group hadoop-yarn { perm { admin { uid = yarn; gid = hadoop; } task { uid = yarn; gid = hadoop; } } cpu { } cpuacct { } devices { } } I started the cgconfig service, and the three directories were successfully created: $ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/ drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/ drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:27 /sys/fs/cgroup/devices/hadoop-yarn/ However, my problem arises when someone executes systemctl daemon-reload. It seems that the devices directory is deleted: $ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d ls: cannot access '/sys/fs/cgroup/devices/hadoop-yarn/': No such file or directory drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/ drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/ I have checked the logs, but there is no indication of why this directory is being deleted. Unfortunately, this causes Yarn NodeManager to stop functioning, requiring a restart once the directory is recreated. As an alternative to the cgconfig service, I also tried creating my own service to manage these directories, but the behavior remains the same. [Unit] Description=Custom cgroup for Hadoop YARN [Service] ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpu/hadoop-yarn ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuacct/hadoop-yarn ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/devices/hadoop-yarn ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpu/hadoop-yarn/ ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpuacct/hadoop-yarn/ ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/devices/hadoop-yarn/ ExecStart=/bin/true Slice=hadoop-yarn.slice MemoryAccounting=yes MemoryLimit=1G [Install] WantedBy=multi-user.target I'm currently at a loss as to how to resolve this issue. Any help, insights, or suggestions from the Yarn community would be greatly appreciated. Thank you in advance for your assistance. Best regards, Jean-Baptiste Guet P.S.: I have also posted my issue on stackoverflow without success there<https://stackoverflow.com/questions/77067383/the-hadoop-yarn-cgroup-directory-is-deleted-after-each-systemctl-daemon-reload>.