[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309920#comment-14309920 ]
Jason Lowe commented on YARN-2809: ---------------------------------- +1 lgtm. Will commit this early next week if there are no objections. > Implement workaround for linux kernel panic when removing cgroup > ---------------------------------------------------------------- > > Key: YARN-2809 > URL: https://issues.apache.org/jira/browse/YARN-2809 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.6.0 > Environment: RHEL 6.4 > Reporter: Nathan Roberts > Assignee: Nathan Roberts > Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch > > > Some older versions of linux have a bug that can cause a kernel panic when > the LCE attempts to remove a cgroup. It is a race condition so it's a bit > rare but on a few thousand node cluster it can result in a couple of panics > per day. > This is the commit that likely (haven't verified) fixes the problem in linux: > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.y&id=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 > Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)