I've made a patch that worked for me. Not sure, if I should post JIRA issue. In attach, you can find hack.
On Fri, Oct 17, 2014 at 5:52 PM, Bojan Babic <gba...@gmail.com> wrote: > I'm using giraph 1.1.0-SNAPSHOT for hadoop 1.2.1 > > On Fri, Oct 17, 2014 at 4:01 PM, Bojan Babic <gba...@gmail.com> wrote: > >> Hi guys, >> >> I'm risking to post issue that has been already issued, but I'll take >> risk to be ridiculed :) >> >> I have small hadoop cluster on Digital Ocean (1 master 4 nodes). I was >> able to setup cluster and run word count example as well as single node >> sample from Quick start. >> >> As I introduce more nodes into play, I get issue where Task Tracker >> spawns Child process >> >> hduser@hdnode-2:~# jps >>> 13839 TaskTracker >>> 13697 DataNode >>> 14067 Jps >>> 13962 Child >> >> *13961 Child* >> >> >> that listen on looback interface >> >> Proto Recv-Q Send-Q Local Address Foreign Address State >>> User Inode PID/Program name >>> tcp 0 0 127.0.0.1:1337 0.0.0.0:* >>> LISTEN root 21544925 29912/python >>> tcp 0 0 0.0.0.0:50010 0.0.0.0:* >>> LISTEN hduser 21691552 13697/java >>> tcp 0 0 127.0.0.1:30011 0.0.0.0:* >>> LISTEN hduser 21693578 13962/java >>> tcp 0 0 0.0.0.0:50075 0.0.0.0:* >>> LISTEN hduser 21691554 13697/java >>> tcp 0 0 0.0.0.0:50020 0.0.0.0:* >>> LISTEN hduser 21691557 13697/java >>> tcp 0 0 127.0.0.1:50118 0.0.0.0:* >>> LISTEN hduser 21691870 13839/java >>> tcp 0 0 0.0.0.0:41640 0.0.0.0:* >>> LISTEN hduser 21691296 13697/java >>> tcp 0 0 127.0.0.1:31337 0.0.0.0:* >>> LISTEN root 20432660 1514/python >>> tcp 0 0 0.0.0.0:50060 0.0.0.0:* >>> LISTEN hduser 21692144 13839/java >>> tcp 0 0 0.0.0.0:http-alt 0.0.0.0:* >>> LISTEN root 20431897 1421/python >>> >>> >>> *tcp 0 0 127.0.0.1:30001 <http://127.0.0.1:30001/> >>> 0.0.0.0:* LISTEN hduser 21370004 7856/ssh >>> tcp 0 0 127.0.0.1:30003 <http://127.0.0.1:30003/> >>> 0.0.0.0:* LISTEN hduser 21693562 13961/java >>> *tcp >>> 0 0 127.0.0.1:58741 0.0.0.0:* LISTEN >>> hduser 21370000 7856/ssh >>> tcp 0 0 127.0.0.1:58742 0.0.0.0:* >>> LISTEN hduser 21369982 7845/autossh >>> tcp 0 0 0.0.0.0:ssh 0.0.0.0:* >>> LISTEN root 9130 834/sshd >>> tcp6 0 0 ::1:30001 :::* >>> LISTEN hduser 21370003 7856/ssh >>> tcp6 0 0 ::1:58741 :::* >>> LISTEN hduser 21369999 7856/ssh >>> tcp6 0 0 :::ssh :::* >>> LISTEN root 9165 834/sshd >> >> >> instead of all interfaces (0.0.0.0) >> >> This results in node being unreachable from other nodes. ie hdnode02: >> >>> >>> 2014-10-17 14:10:31,146 WARN org.apache.giraph.comm.netty.NettyClient: >>> 2014-10-17 14:10:31,159 WARN org.apache.giraph.comm.netty.NettyClient: >>> connectAllAddresses: Future failed to connect with >>> hdnode-2/XXX.XXX.XXX.XXX:30003 with 1 failures because of >>> java.net.ConnectException: Connection refused: >>> *hdnode-2/XXX.XXX.XXX.XXX:30003* >>> 2014-10-17 14:10:31,159 INFO org.apache.giraph.comm.netty.NettyClient: >>> connectAllAddresses: Successfully added 1 connections, (1 total connected) >>> 2 failed, 2 failures total. >> >> >> If I stop all processes and start nc on 30003, I can telnet to hdnode2. >> >> Question here is if there is any setup that will configure Child process >> to listen on 0.0.0.0 instead of loopback interface? >> >> Thanks in advance >> >> > > > -- > -------------------------------- > Bojan Babic, M.Sc.E.E > Software developer > twitter: @bojanbabic > mobile: +1312 8602944 > > -- -------------------------------- Bojan Babic, M.Sc.E.E Software developer twitter: @bojanbabic mobile: +1312 8602944
diff --git a/giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java b/giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java index 454232a..6910d90 100644 --- a/giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java +++ b/giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java @@ -58,6 +58,7 @@ import io.netty.channel.AdaptiveRecvByteBufAllocator; import java.net.InetSocketAddress; import java.net.UnknownHostException; +import static org.apache.giraph.conf.GiraphConstants.ALL_INTERFACE_ADDRESS; import static org.apache.giraph.conf.GiraphConstants.MAX_IPC_PORT_BIND_ATTEMPTS; /** @@ -87,6 +88,8 @@ public class NettyServer { private final String localHostname; /** Address of the server */ private InetSocketAddress myAddress; + /** Address of all interface of the server */ + private InetSocketAddress bindAddress; /** Current task info */ private TaskInfo myTaskInfo; /** Maximum number of threads */ @@ -343,6 +346,7 @@ public class NettyServer { // it as a constant to increase the port number with. while (bindAttempts < maxIpcPortBindAttempts) { this.myAddress = new InetSocketAddress(localHostname, bindPort); + bindAddress = new InetSocketAddress(ALL_INTERFACE_ADDRESS, bindPort); if (failFirstPortBindingAttempt && bindAttempts == 0) { if (LOG.isInfoEnabled()) { LOG.info("start: Intentionally fail first " + @@ -355,7 +359,7 @@ public class NettyServer { } try { - ChannelFuture f = bootstrap.bind(myAddress).sync(); + ChannelFuture f = bootstrap.bind(bindAddress).sync(); accepted.add(f.channel()); break; } catch (InterruptedException e) { diff --git a/giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java b/giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java index e78eb42..5be1987 100644 --- a/giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java +++ b/giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java @@ -541,6 +541,9 @@ public interface GiraphConstants { /** Local ZooKeeper directory to use */ String ZOOKEEPER_DIR = "giraph.zkDir"; + /** all interface address */ + String ALL_INTERFACE_ADDRESS = "0.0.0.0"; + /** Max attempts for handling ZooKeeper connection loss */ IntConfOption ZOOKEEPER_OPS_MAX_ATTEMPTS = new IntConfOption("giraph.zkOpsMaxAttempts", 3,