[ 
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876165#comment-16876165
 ] 

Szilard Nemeth commented on YARN-9660:
--------------------------------------

Hi [~pbacsko]!
Thanks for these improvement proposal of the documentation!
I think it's obvious that all of these points about docker image requirements 
should be documented properly with some examples on compatible images.

1. As we discussed offline, the active cgroup handler could be easily printable 
with running "docker info". I guess this is an OS-independent way to detect the 
active handler.
However, if we want to detect it, we need to run docker info before running any 
container and we would also rely on the output of docker info. 
I don't know how likely the output of docker info changes, but in the end, it's 
a dependency anyway.

2. and 3. I think we could detect this easily by creating some 
"image-validation" phase where we would check the availability of the bash 
commands that we are utilizing with the container executor script.
If we agree on having such a validation phase, point #1 also could be the part 
of the validation process.

All in all, I'm voting for updating the doc and having as many validations as 
possible, as it makes the Docker feature more easy and straightforward to use.

> Enhance documentation of Docker on YARN support
> -----------------------------------------------
>
>                 Key: YARN-9660
>                 URL: https://issues.apache.org/jira/browse/YARN-9660
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: documentation, nodemanager
>            Reporter: Peter Bacsko
>            Priority: Major
>
> Right now, using Docker on YARN has some hard requirements. If these 
> requirements are not met, then launching the containers will fail and and 
> error message will be printed. Depending on how familiar the user is with 
> Docker, it might or might not be easy for them to understand what went wrong 
> and how to fix the underlying problem.
> It would be important to explicitly document these requirements along with 
> the error messages.
> *#1: CGroups handler cannot be systemd*
> If docker deamon runs with systemd cgroups handler, we receive the following 
> error upon launching a container:
> {noformat}
> Container id: container_1561638268473_0006_01_000002
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: 
> cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
> Solution: switch to cgroupfs. Doing so can be OS-specific, but we can 
> document a {{systemcl}} example.
>  
> *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*
> Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
> It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
> there's only {{/bin/sh}}.
> If we try to use these kind of images, we'll see the following error message:
> {noformat}
> Container id: container_1561638268473_0015_01_000002
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
> runtime error: container_linux.go:235: starting container process caused 
> "exec: \"bash\": executable file not found in $PATH".
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
>  
> *#3: {{find}} command must be available on the {{$PATH}}*
> It seems obvious that we have the {{find}} command, but even very popular 
> images like {{fedora}} requires that we install it separately.
> If we don't have {{find}} available, then {{launcher_container.sh}} fails 
> with:
> {noformat}
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_000002/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_000002/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> {noformat}
> *#4 Add cmd-line example of how to tag local images*
> This is actually documented under "Privileged Container Security 
> Consideration", but an one-liner would be helpful. I had trouble running a 
> local docker image and tagging it appropriately. Just an example like 
> {{docker tag local_ubuntu local/ubuntu:latest}} is already very informative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to