[ https://issues.apache.org/jira/browse/YARN-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bob updated YARN-4508: ---------------------- Description: In one scenarios , could result in mount_cgroup return success, but actually the request cgroup controller mount failed. Below code should enhance the condition check: {code} } else { fprintf(LOGFILE, "Failed to mount cgroup controller %s at %s - %s\n", controller, mount_path, strerror(errno)); // if controller is already mounted, don't stop trying to mount others if (errno != EBUSY) { result = -1; } } {code} In below scenarios can reproduce the issue: 1.Start NM, it will mount cgroups normally 2.Manually unmount the cgroups used by NM 3.Restart NM, NM can start successfully , but container can't be started due to cgroups did not mounted successfully. was: In one scenarios , could result in mount_cgroup return success, but actually the request cgroup controller mount failed. Below code should enhance the condition check: {code} } else { fprintf(LOGFILE, "Failed to mount cgroup controller %s at %s - %s\n", controller, mount_path, strerror(errno)); // if controller is already mounted, don't stop trying to mount others if (errno != EBUSY) { result = -1; } } {code} In below scenarios can reproduce the issue: 1.Start NM, it will mount cgroups normally 2.Manually unmount the cgroups used by NM 3.Restart NM, NM can start successfully , but container cant be started due to cgroups did not mounted successfully. > The mount_cgroup method in container-executor.c should enhance mount check > when mount the request cgroup controller. > -------------------------------------------------------------------------------------------------------------------- > > Key: YARN-4508 > URL: https://issues.apache.org/jira/browse/YARN-4508 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Affects Versions: 2.6.1, 2.7.1 > Reporter: Bob > Priority: Minor > > In one scenarios , could result in mount_cgroup return success, but actually > the request cgroup controller mount failed. > Below code should enhance the condition check: > {code} > } else { > fprintf(LOGFILE, "Failed to mount cgroup controller %s at %s - %s\n", > controller, mount_path, strerror(errno)); > // if controller is already mounted, don't stop trying to mount others > if (errno != EBUSY) { > result = -1; > } > } > {code} > In below scenarios can reproduce the issue: > 1.Start NM, it will mount cgroups normally > 2.Manually unmount the cgroups used by NM > 3.Restart NM, NM can start successfully , but container can't be started due > to cgroups did not mounted successfully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)