Hi Michael, I asked for hosts file because there seems to be some loopback prob to me. The log shows that call is going at 0.0.0.0. Apart from what you have said, I think disabling IPv6 and making sure that there is no prob with the DNS resolution is also necessary. Please correct me if I am wrong. Thank you.
Regards, Mohammad Tariq On Mon, Aug 13, 2012 at 7:06 PM, Michael Segel <michael_se...@hotmail.com>wrote: > Based on your /etc/hosts output, why aren't you using DNS? > > Outside of MapR, multihomed machines can be problematic. Hadoop doesn't > generally work well when you're not using the FQDN or its alias. > > The issue isn't the SSH, but if you go to the node which is having trouble > connecting to another node, then try to ping it, or some other general > communication, if it succeeds, your issue is that the port you're trying > to communicate with is blocked. Then its more than likely an ipconfig or > firewall issue. > > On Aug 13, 2012, at 8:17 AM, Björn-Elmar Macek <e...@cs.uni-kassel.de> > wrote: > > Hi Michael, > > well i can ssh from any node to any other without being prompted. The > reason for this is, that my home dir is mounted in every server in the > cluster. > > If the machines are multihomed: i dont know. i could ask if this would be > of importance. > > Shall i? > > Regards, > Elmar > > Am 13.08.12 14:59, schrieb Michael Segel: > > If the nodes can communicate and distribute data, then the odds are that > the issue isn't going to be in his /etc/hosts. > > A more relevant question is if he's running a firewall on each of these > machines? > > A simple test... ssh to one node, ping other nodes and the control nodes > at random to see if they can see one another. Then check to see if there is > a firewall running which would limit the types of traffic between nodes. > > One other side note... are these machines multi-homed? > > On Aug 13, 2012, at 7:51 AM, Mohammad Tariq <donta...@gmail.com> wrote: > > Hello there, > > Could you please share your /etc/hosts file, if you don't mind. > > Regards, > Mohammad Tariq > > > > On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek <ma...@cs.uni-kassel.de > > wrote: > >> Hi, >> >> i am currently trying to run my hadoop program on a cluster. Sadly though >> my datanodes and tasktrackers seem to have difficulties with their >> communication as their logs say: >> * Some datanodes and tasktrackers seem to have portproblems of some kind >> as it can be seen in the logs below. I wondered if this might be due to >> reasons correllated with the localhost entry in /etc/hosts as you can read >> in alot of posts with similar errors, but i checked the file neither >> localhost nor 127.0.0.1/127.0.1.1 is bound there. (although you can ping >> localhost... the technician of the cluster said he'd be looking for the >> mechanics resolving localhost) >> * The other nodes can not speak with the namenode and jobtracker >> (its-cs131). Although it is absolutely not clear, why this is happening: >> the "dfs -put" i do directly before the job is running fine, which seems to >> imply that communication between those servers is working flawlessly. >> >> Is there any reason why this might happen? >> >> >> Regards, >> Elmar >> >> LOGS BELOW: >> >> \____Datanodes >> >> After successfully putting the data to hdfs (at this point i thought >> namenode and datanodes have to communicate), i get the following errors >> when starting the job: >> >> There are 2 kinds of logs i found: the first one is big (about 12MB) and >> looks like this: >> ############################### LOG TYPE 1 >> ############################################################ >> 2012-08-13 08:23:27,331 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35554. Already tried 0 >> time(s). >> 2012-08-13 08:23:28,332 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35554. Already tried 1 >> time(s). >> 2012-08-13 08:23:29,332 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35554. Already tried 2 >> time(s). >> 2012-08-13 08:23:30,332 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35554. Already tried 3 >> time(s). >> 2012-08-13 08:23:31,333 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35554. Already tried 4 >> time(s). >> 2012-08-13 08:23:32,333 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35554. Already tried 5 >> time(s). >> 2012-08-13 08:23:33,334 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35554. Already tried 6 >> time(s). >> 2012-08-13 08:23:34,334 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35554. Already tried 7 >> time(s). >> 2012-08-13 08:23:35,334 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35554. Already tried 8 >> time(s). >> 2012-08-13 08:23:36,335 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35554. Already tried 9 >> time(s). >> 2012-08-13 08:23:36,335 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.ConnectException: >> Call to its-cs131/141.51.205.41:35554 failed on connection exception: >> java.net.ConnectException: Connection refused >> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095) >> at org.apache.hadoop.ipc.Client.call(Client.java:1071) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) >> at $Proxy5.sendHeartbeat(Unknown Source) >> at >> org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:904) >> at >> org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458) >> at java.lang.Thread.run(Thread.java:619) >> Caused by: java.net.ConnectException: Connection refused >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >> at org.apache.hadoop.net >> .SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) >> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) >> at >> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434) >> at >> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560) >> at >> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) >> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202) >> at org.apache.hadoop.ipc.Client.call(Client.java:1046) >> ... 5 more >> >> ... (this continues til the end of the log) >> >> The second is short kind: >> ########################### LOG TYPE 2 >> ############################################################ >> 2012-08-13 00:59:19,038 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: >> /************************************************************ >> STARTUP_MSG: Starting DataNode >> STARTUP_MSG: host = its-cs133.its.uni-kassel.de/141.51.205.43 >> STARTUP_MSG: args = [] >> STARTUP_MSG: version = 1.0.2 >> STARTUP_MSG: build = >> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r >> 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012 >> ************************************************************/ >> 2012-08-13 00:59:19,203 INFO >> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from >> hadoop-metrics2.properties >> 2012-08-13 00:59:19,216 INFO >> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source >> MetricsSystem,sub=Stats registered. >> 2012-08-13 00:59:19,217 INFO >> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot >> period at 10 second(s). >> 2012-08-13 00:59:19,218 INFO >> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system >> started >> 2012-08-13 00:59:19,306 INFO >> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi >> registered. >> 2012-08-13 00:59:19,346 INFO org.apache.hadoop.util.NativeCodeLoader: >> Loaded the native-hadoop library >> 2012-08-13 00:59:20,482 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35554. Already tried 0 >> time(s). >> 2012-08-13 00:59:21,584 INFO >> org.apache.hadoop.hdfs.server.common.Storage: Storage directory >> /home/work/bmacek/hadoop/hdfs/slave is not formatted. >> 2012-08-13 00:59:21,584 INFO >> org.apache.hadoop.hdfs.server.common.Storage: Formatting ... >> 2012-08-13 00:59:21,787 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: Registered >> FSDatasetStatusMBean >> 2012-08-13 00:59:21,897 INFO >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: Shutting >> down all async disk service threads... >> 2012-08-13 00:59:21,897 INFO >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All async >> disk service threads have been shut down. >> 2012-08-13 00:59:21,898 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.BindException: >> Problem binding to /0.0.0.0:50010 : Address already in use >> at org.apache.hadoop.ipc.Server.bind(Server.java:227) >> at >> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:404) >> at >> org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299) >> at >> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582) >> at >> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521) >> at >> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539) >> at >> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665) >> at >> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682) >> Caused by: java.net.BindException: Address already in use >> at sun.nio.ch.Net.bind(Native Method) >> at sun.nio.ch >> .ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) >> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) >> at org.apache.hadoop.ipc.Server.bind(Server.java:225) >> ... 7 more >> >> 2012-08-13 00:59:21,899 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: >> /************************************************************ >> SHUTDOWN_MSG: Shutting down DataNode at >> its-cs133.its.uni-kassel.de/141.51.205.43 >> ************************************************************/ >> >> >> >> >> >> \_____TastTracker >> With TaskTrackers it is the same: there are 2 kinds. >> ############################### LOG TYPE 1 >> ############################################################ >> 2012-08-13 02:09:54,645 INFO org.apache.hadoop.mapred.TaskTracker: >> Resending 'status' to 'its-cs131' with reponseId '879 >> 2012-08-13 02:09:55,646 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35555. Already tried 0 >> time(s). >> 2012-08-13 02:09:56,646 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35555. Already tried 1 >> time(s). >> 2012-08-13 02:09:57,647 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35555. Already tried 2 >> time(s). >> 2012-08-13 02:09:58,647 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35555. Already tried 3 >> time(s). >> 2012-08-13 02:09:59,648 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35555. Already tried 4 >> time(s). >> 2012-08-13 02:10:00,648 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35555. Already tried 5 >> time(s). >> 2012-08-13 02:10:01,649 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35555. Already tried 6 >> time(s). >> 2012-08-13 02:10:02,649 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35555. Already tried 7 >> time(s). >> 2012-08-13 02:10:03,650 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35555. Already tried 8 >> time(s). >> 2012-08-13 02:10:04,650 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35555. Already tried 9 >> time(s). >> 2012-08-13 02:10:04,651 ERROR org.apache.hadoop.mapred.TaskTracker: >> Caught exception: java.net.ConnectException: Call to its-cs131/ >> 141.51.205.41:35555 failed on connection exception: >> java.net.ConnectException: Connection refused >> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095) >> at org.apache.hadoop.ipc.Client.call(Client.java:1071) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) >> at org.apache.hadoop.mapred.$Proxy5.heartbeat(Unknown Source) >> at >> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1857) >> at >> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1653) >> at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2503) >> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3744) >> Caused by: java.net.ConnectException: Connection refused >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >> at org.apache.hadoop.net >> .SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) >> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) >> at >> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434) >> at >> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560) >> at >> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) >> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202) >> at org.apache.hadoop.ipc.Client.call(Client.java:1046) >> ... 6 more >> >> >> ########################### LOG TYPE 2 >> ############################################################ >> 2012-08-13 00:59:24,376 INFO org.apache.hadoop.mapred.TaskTracker: >> STARTUP_MSG: >> /************************************************************ >> STARTUP_MSG: Starting TaskTracker >> STARTUP_MSG: host = its-cs133.its.uni-kassel.de/141.51.205.43 >> STARTUP_MSG: args = [] >> STARTUP_MSG: version = 1.0.2 >> STARTUP_MSG: build = >> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r >> 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012 >> ************************************************************/ >> 2012-08-13 00:59:24,569 INFO >> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from >> hadoop-metrics2.properties >> 2012-08-13 00:59:24,626 INFO >> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source >> MetricsSystem,sub=Stats registered. >> 2012-08-13 00:59:24,627 INFO >> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot >> period at 10 second(s). >> 2012-08-13 00:59:24,627 INFO >> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics >> system started >> 2012-08-13 00:59:24,950 INFO >> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi >> registered. >> 2012-08-13 00:59:25,146 INFO org.mortbay.log: Logging to >> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via >> org.mortbay.log.Slf4jLog >> 2012-08-13 00:59:25,206 INFO org.apache.hadoop.http.HttpServer: Added >> global filtersafety >> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) >> 2012-08-13 00:59:25,232 INFO org.apache.hadoop.mapred.TaskLogsTruncater: >> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 >> 2012-08-13 00:59:25,237 INFO org.apache.hadoop.mapred.TaskTracker: >> Starting tasktracker with owner as bmacek >> 2012-08-13 00:59:25,239 INFO org.apache.hadoop.mapred.TaskTracker: Good >> mapred local directories are: /home/work/bmacek/hadoop/hdfs/tmp/mapred/local >> 2012-08-13 00:59:25,244 INFO org.apache.hadoop.util.NativeCodeLoader: >> Loaded the native-hadoop library >> 2012-08-13 00:59:25,255 INFO >> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm >> registered. >> 2012-08-13 00:59:25,256 INFO >> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source >> TaskTrackerMetrics registered. >> 2012-08-13 00:59:25,279 INFO org.apache.hadoop.ipc.Server: Starting >> SocketReader >> 2012-08-13 00:59:25,282 INFO >> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source >> RpcDetailedActivityForPort54850 registered. >> 2012-08-13 00:59:25,282 INFO >> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source >> RpcActivityForPort54850 registered. >> 2012-08-13 00:59:25,287 INFO org.apache.hadoop.ipc.Server: IPC Server >> Responder: starting >> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server >> listener on 54850: starting >> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server >> handler 0 on 54850: starting >> 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server >> handler 1 on 54850: starting >> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.TaskTracker: >> TaskTracker up at: localhost/127.0.0.1:54850 >> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server >> handler 3 on 54850: starting >> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server >> handler 2 on 54850: starting >> 2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.TaskTracker: >> Starting tracker tracker_its-cs133.its.uni-kassel.de:localhost/ >> 127.0.0.1:54850 >> 2012-08-13 00:59:26,321 INFO org.apache.hadoop.ipc.Client: Retrying >> connect to server: its-cs131/141.51.205.41:35555. Already tried 0 >> time(s). >> 2012-08-13 00:59:38,104 INFO org.apache.hadoop.mapred.TaskTracker: >> Starting thread: Map-events fetcher for all reduce tasks on >> tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850 >> 2012-08-13 00:59:38,120 INFO org.apache.hadoop.util.ProcessTree: setsid >> exited with exit code 0 >> 2012-08-13 00:59:38,134 INFO org.apache.hadoop.mapred.TaskTracker: Using >> ResourceCalculatorPlugin : >> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@445e228 >> 2012-08-13 00:59:38,137 WARN org.apache.hadoop.mapred.TaskTracker: >> TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is >> disabled. >> 2012-08-13 00:59:38,145 INFO org.apache.hadoop.mapred.IndexCache: >> IndexCache created with max memory = 10485760 >> 2012-08-13 00:59:38,158 INFO >> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source >> ShuffleServerMetrics registered. >> 2012-08-13 00:59:38,161 INFO org.apache.hadoop.http.HttpServer: Port >> returned by webServer.getConnectors()[0].getLocalPort() before open() is >> -1. Opening the listener on 50060 >> 2012-08-13 00:59:38,161 ERROR org.apache.hadoop.mapred.TaskTracker: Can >> not start task tracker because java.net.BindException: Address already in >> use >> at sun.nio.ch.Net.bind(Native Method) >> at sun.nio.ch >> .ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) >> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) >> at >> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) >> at org.apache.hadoop.http.HttpServer.start(HttpServer.java:581) >> at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1502) >> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742) >> >> 2012-08-13 00:59:38,163 INFO org.apache.hadoop.mapred.TaskTracker: >> SHUTDOWN_MSG: >> /************************************************************ >> SHUTDOWN_MSG: Shutting down TaskTracker at >> its-cs133.its.uni-kassel.de/141.51.205.43 >> ************************************************************/ >> > > > > >