Chris Riccomini created YARN-799: ------------------------------------ Summary: CgroupsLCEResourcesHandler tries to write to cgroup.procs Key: YARN-799 URL: https://issues.apache.org/jira/browse/YARN-799 Project: Hadoop YARN Issue Type: Bug Reporter: Chris Riccomini
The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder("cgroups="); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs"); sb.append(","); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. bq. $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux As a result, when the container-executor tries to run, it fails with this error message: bq. fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n", This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: bq. $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira