[ https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876144#comment-16876144 ]
Peter Bacsko commented on YARN-9660: ------------------------------------ cc [~shaneku...@gmail.com] [~eyang] [~snemeth] - what do you guys think? I believe some of these could be detected and even printed to the user. The hard-coded {{/bin/bash}} could be overridable in {{UnixShellScriptBuilder}}. We have options here. > Enhance documentation of Docker on YARN support > ----------------------------------------------- > > Key: YARN-9660 > URL: https://issues.apache.org/jira/browse/YARN-9660 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, nodemanager > Reporter: Peter Bacsko > Priority: Major > > Right now, using Docker on YARN has some hard requirements. If these > requirements are not met, then launching the containers will fail and and > error message will be printed. Depending on how familiar the user is with > Docker, it might or might not be easy for them to understand what went wrong > and how to fix the underlying problem. > It would be important to explicitly document these requirements along with > the error messages. > *#1: CGroups handler cannot be systemd* > If docker deamon runs with systemd cgroups handler, we receive the following > error upon launching a container: > {noformat} > Container id: container_1561638268473_0006_01_000002 > Exit code: 7 > Exception message: Launch container failed > Shell error output: /usr/bin/docker-current: Error response from daemon: > cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". > See '/usr/bin/docker-current run --help'. > Shell output: main : command provided 4 > main : run as user is johndoe > main : requested yarn user is johndoe > {noformat} > Solution: switch to cgroupfs. Doing so can be OS-specific, but we can > document a {{systemcl}} example. > > *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container* > Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. > It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and > there's only {{/bin/sh}}. > If we try to use these kind of images, we'll see the following error message: > {noformat} > Container id: container_1561638268473_0015_01_000002 > Exit code: 7 > Exception message: Launch container failed > Shell error output: /usr/bin/docker-current: Error response from daemon: oci > runtime error: container_linux.go:235: starting container process caused > "exec: \"bash\": executable file not found in $PATH". > Shell output: main : command provided 4 > main : run as user is johndoe > main : requested yarn user is johndoe > {noformat} > > *#3: {{find}} command must be available on the {{$PATH}}* > It seems obvious that we have the {{find}} command, but even very popular > images like {{fedora}} requires that we install it separately. > If we don't have {{find}} available, then {{launcher_container.sh}} fails > with: > {noformat} > [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. > Error file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_000002/launch_container.sh: > line 44: find: command not found > Last 4096 bytes of stderr.txt : > [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. > Error file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_000002/launch_container.sh: > line 44: find: command not found > Last 4096 bytes of stderr.txt : > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org