Dear Yarn Community,

I hope this email finds you well. I am encountering an issue with Yarn and 
Cgroups on CentOS 8 and would greatly appreciate your insights and guidance in 
resolving it.

I am using Hadoop version 3.3.4 and I'm trying to activate Cgroups in order to 
utilize GPUs within a Docker environment. Following the official documentation 
at Hadoop NodeManager Cgroups 
Documentation<https://hadoop.apache.org/docs/r3.3.4/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html>,
 I set the parameter yarn.nodemanager.linux-container-executor.cgroups.mount to 
false to manage Cgroups myself for security reasons.


As CentOS 8 uses Cgroup v1, I configured the following parameters:

  *   yarn.nodemanager.linux-container-executor.cgroups.hierarchy to 
/hadoop-yarn
  *   yarn.nodemanager.linux-container-executor.cgroups.mount-path to 
/sys/fs/cgroup

Yarn requires three Cgroups: cpu, cpuacct, and devices.

To ensure that /hadoop-yarn remains persistent, I installed the libcgroup RPM 
package and updated /etc/cgconfig.conf as follows:

group hadoop-yarn {
    perm {
        admin {
            uid = yarn;
            gid = hadoop;
        }
        task {
            uid = yarn;
            gid = hadoop;
        }
    }
    cpu {
    }
    cpuacct {
    }
    devices {
    }
}


I started the cgconfig service, and the three directories were successfully 
created:

$ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d
drwxr-xr-x 2 yarn hadoop 0 Sep  8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/
drwxr-xr-x 2 yarn hadoop 0 Sep  8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/
drwxr-xr-x 2 yarn hadoop 0 Sep  8 13:27 /sys/fs/cgroup/devices/hadoop-yarn/


However, my problem arises when someone executes systemctl daemon-reload. It 
seems that the devices directory is deleted:

$ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d
ls: cannot access '/sys/fs/cgroup/devices/hadoop-yarn/': No such file or 
directory
drwxr-xr-x 2 yarn hadoop 0 Sep  8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/
drwxr-xr-x 2 yarn hadoop 0 Sep  8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/


I have checked the logs, but there is no indication of why this directory is 
being deleted. Unfortunately, this causes Yarn NodeManager to stop functioning, 
requiring a restart once the directory is recreated.

As an alternative to the cgconfig service, I also tried creating my own service 
to manage these directories, but the behavior remains the same.

[Unit]
Description=Custom cgroup for Hadoop YARN

[Service]
ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpu/hadoop-yarn
ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuacct/hadoop-yarn
ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/devices/hadoop-yarn
ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpu/hadoop-yarn/
ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpuacct/hadoop-yarn/
ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/devices/hadoop-yarn/
ExecStart=/bin/true
Slice=hadoop-yarn.slice
MemoryAccounting=yes
MemoryLimit=1G

[Install]
WantedBy=multi-user.target


I'm currently at a loss as to how to resolve this issue. Any help, insights, or 
suggestions from the Yarn community would be greatly appreciated.


Thank you in advance for your assistance.

Best regards,


Jean-Baptiste Guet


P.S.: I have also posted my issue on stackoverflow without success 
there<https://stackoverflow.com/questions/77067383/the-hadoop-yarn-cgroup-directory-is-deleted-after-each-systemctl-daemon-reload>.

Reply via email to