[ https://issues.apache.org/jira/browse/YARN-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680660#comment-16680660 ]
Eric Yang edited comment on YARN-8983 at 11/9/18 12:30 AM: ----------------------------------------------------------- [~oliverhuh...@gmail.com] Docker overlay network can work without swarm. RPC call works as you indicated in the first diagram. The setup instruction is written in this [link|https://issues.apache.org/jira/browse/YARN-8619?focusedCommentId=16612715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16612715]. It is a straight forward process. was (Author: eyang): [~oliverhuh...@gmail.com] Docker overlay network can work without swarm. RPC call works as you indicated in the first diagram. The setup instruction is written in this link. It is a straight forward process. > YARN container with docker: hostname entry not in /etc/hosts > ------------------------------------------------------------ > > Key: YARN-8983 > URL: https://issues.apache.org/jira/browse/YARN-8983 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.9.1 > Reporter: Keqiu Hu > Priority: Critical > Labels: Docker > > I'm experimenting to use Hadoop 2.9.1 to launch applications with docker > containers. Inside the container task, we try to get the hostname of the > container using > {code:java} > InetAddress.getLocalHost().getHostName(){code} > This works fine with LXC, however it throws the following exception when I > enable docker container using: > {code:java} > YARN_CONTAINER_RUNTIME_TYPE=docker > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4 > {code} > The exception: > > {noformat} > java.net.UnknownHostException: ctr-1541488751855-0023-01-000003: > ctr-1541488751855-0023-01-000003: Temporary failure in name resolution at > java.net.InetAddress.getLocalHost(InetAddress.java:1506) > at > com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204) > > at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: > java.net.UnknownHostException: ctr-1541488751855-0023-01-000003: Temporary > failure in name resolution at > java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at > java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more > {noformat} > > Did some research online, it seems to be related to missing entry in > /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing > the entry : > {noformat} > pi@pi-aw:~/docker/$ docker ps > CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES > 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a > second container_1541488751855_0028_01_000001 > 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours > blissful_turing > pi@pi-aw:~/docker/$ de 71e3e9df8bc6 > groups: cannot find name for group ID 1000 > groups: cannot find name for group ID 116 > groups: cannot find name for group ID 126 > To run a command as administrator (user "root"), use "sudo <command>". > See "man sudo_root" for details. > pi@ctr-1541488751855-0028-01-000001:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_000001$ > cat /etc/hosts > 127.0.0.1 localhost > 192.168.0.14 pi-aw > # The following lines are desirable for IPv6 capable hosts > ::1 ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > pi@ctr-1541488751855-0028-01-000001:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_000001$ > {noformat} > If I launch the image without YARN, I saw the entry in /etc/hosts: > {noformat} > pi@61f173f95631:~$ cat /etc/hosts > 127.0.0.1 localhost > ::1 localhost ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > 172.17.0.3 61f173f95631 {noformat} > Here is my container-executor.cfg > {code:java} > 1 min.user.id=100 > 2 yarn.nodemanager.linux-container-executor.group=hadoop > 3 [docker] > 4 module.enabled=true > 5 docker.binary=/usr/bin/docker > 6 > docker.allowed.capabilities=SYS_CHROOT,MKNOD,SETFCAP,SETPCAP,FSETID,CHOWN,AUDIT_WRITE,SETGID,NET_RAW,FOWNER,SETUID,DAC_OVERRIDE,KILL,NET_BIND_SERVICE > 7 docker.allowed.networks=bridge,host,none > 8 > docker.allowed.rw-mounts=/tmp,/etc/hadoop/logs/,/private/etc/hadoop-2.9.1/logs/{code} > Since I'm using an older version of Hadoop 2.9.1, let me know if this is > something already fixed in later version :) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org