[ 
https://issues.apache.org/jira/browse/YARN-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob updated YARN-4508:
----------------------
    Description: 
In one scenarios , could result in mount_cgroup return success, but actually 
the request cgroup controller mount failed.
Below code should enhance the condition check:
{code}
    } else {
      fprintf(LOGFILE, "Failed to mount cgroup controller %s at %s - %s\n",
                controller, mount_path, strerror(errno));
      // if controller is already mounted, don't stop trying to mount others
      if (errno != EBUSY) {
        result = -1;
      }
    }
{code}
In below scenarios can reproduce the issue:
1.Start NM, it will mount cgroups normally
2.Manually unmount the cgroups used by NM
3.Restart NM, NM can start successfully , but container  can't be started due 
to cgroups did not mounted successfully. 



  was:
In one scenarios , could result in mount_cgroup return success, but actually 
the request cgroup controller mount failed.
Below code should enhance the condition check:
{code}
    } else {
      fprintf(LOGFILE, "Failed to mount cgroup controller %s at %s - %s\n",
                controller, mount_path, strerror(errno));
      // if controller is already mounted, don't stop trying to mount others
      if (errno != EBUSY) {
        result = -1;
      }
    }
{code}
In below scenarios can reproduce the issue:
1.Start NM, it will mount cgroups normally
2.Manually unmount the cgroups used by NM
3.Restart NM, NM can start successfully , but container  cant be started due to 
cgroups did not mounted successfully. 




> The mount_cgroup method in container-executor.c should enhance mount check 
> when mount the request cgroup controller.
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4508
>                 URL: https://issues.apache.org/jira/browse/YARN-4508
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.6.1, 2.7.1
>            Reporter: Bob
>            Priority: Minor
>
> In one scenarios , could result in mount_cgroup return success, but actually 
> the request cgroup controller mount failed.
> Below code should enhance the condition check:
> {code}
>     } else {
>       fprintf(LOGFILE, "Failed to mount cgroup controller %s at %s - %s\n",
>                 controller, mount_path, strerror(errno));
>       // if controller is already mounted, don't stop trying to mount others
>       if (errno != EBUSY) {
>         result = -1;
>       }
>     }
> {code}
> In below scenarios can reproduce the issue:
> 1.Start NM, it will mount cgroups normally
> 2.Manually unmount the cgroups used by NM
> 3.Restart NM, NM can start successfully , but container  can't be started due 
> to cgroups did not mounted successfully. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to