Re: datanodes not connecting

Tim Dunphy Sun, 23 Nov 2014 15:46:14 -0800

Hey guys,

Can you see the slave logs to find out what is happening there? For e.g.,
> /home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.log and
> /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop-hadoop3.log.
>  There is not enough information to recognise the exact problem.
> Check the last 50-60 log lines of the datanode logs first.
> tail -fn 60 /home/hadoop/logs/hadoop-hadoop-datanode-hadoop2.log
> tail -fn 60 /home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.log




>
> You can see many useful information about the problem. Maybe the datanodes
> cannot write to the dedicated directories or there are previous hdfs
> information (for example bad version info) already exists in the datanode
> directories.
> These are just tips so first check the logs and if the logs cannot help
> you, provide to the list.



OK thanks for pointing me in the right direction! Part of my confusion in
solving this was that I'm so new at this I wasn't really sure which logs to
consult. I've made a note of what to check.

And I was able to sort out the issue on one of the data nodes straight
away! When i saw this:

-1416723865810 (Datanode Uuid null) service to
> hadoop1.mydomain.com/10.10.10.5:9000 Datanode denied communication with
> namenode because hostname cannot be resolved (ip=54.164.203.179,
> hostname=54.164.203.179): DatanodeRegistration(0.0.0.0,
> datanodeUuid=1576b716-8841-46dd-b5fe-fab000bce4f3, infoPort=50075,
> ipcPort=50020,
> storageInfo=lv=-55;cid=CID-6205ad99-30de-4d2e-925c-d7853991d376;nsid=646941884;c=0)
> 2014-11-23 18:09:49,146 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15:
> SIGTERM


I realized that what I was seeing was a name resolution failure to the
primary node. I'm using elastic IPs on the 3 hadoop nodes (1 master, 2
data). But I realized tht maybe that's part of my issue. So what I did was
put the amazon private IP into the /etc/hosts file on the two datanodes.

As soon as I fired up hadoop using start-dfs.sh and start-yarn.sh I had
success, but only with the first node:

2014-11-23 18:21:54,021 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging ACTIVE
Namenode Block pool BP-1107819734-10.10.10.5-1416723865810 (Datanode Uuid
1576b716-8841-46dd-b5fe-fab000bce4f3) service to
hadoop1.mydomain.com/10.10.10.5:9000

2014-11-23 18:21:54,064 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Sent 1 blockreports 0
blocks total. Took 1 msec to generate and 42 msecs for RPC and NN
processing.  Got back commands
org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@4a052c9e
2014-11-23 18:21:54,064 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize command for
block pool BP-1107819734-10.10.10.5-1416723865810


And when I looked at the connected nodes in the hadoop interface, there was
the node that I saw connecting in the logs! Partial success as I had one
node but not the other.

Now when I start up the services, it looks like everything is starting up
as normal. Even in the node that is still not connecting. I've highlighted
the node that's not connecting below:

bash-4.2$ start-dfs.sh

Starting namenodes on [hadoop1.mydomain.com]hadoop1.mydomain.com: starting
namenode, logging to /home/hadoop/logs/
hadoop-hadoop-namenode-hadoop1.outhadoop2.mydomain.com: starting datanode,
logging to /home/hadoop/logs/
hadoop-hadoop-datanode-hadoop2.outhadoop3.mydomain.com: starting datanode,
logging to /home/hadoop/logs/hadoop-*hadoop-datanode-hadoop3.outStarting
secondary namenodes [0.0.0.0]0.0.0.0 <http://0.0.0.0>: starting
secondarynamenode, logging to /home/hadoop/logs/hadoop-hadoop*
-secondarynamenode-hadoop1.out

bash-4.2$ start-dfs.sh

Starting namenodes on [hadoop1.mydomain.com]

hadoop1.mydomain.com: starting namenode, logging to
/home/hadoop/logs/hadoop-hadoop-namenode-hadoop1.out

hadoop2.mydomain.com: starting datanode, logging to
/home/hadoop/logs/hadoop-hadoop-datanode-hadoop2.out

hadoop3.mydomain.com: starting datanode, logging to
/home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to
/home/hadoop/logs/hadoop-hadoop-secondarynamenode-hadoop1.out


bash-4.2$ start-yarn.sh

starting yarn daemons

starting resourcemanager, logging to /home/hadoop/logs/
yarn-hadoop-resourcemanager-hadoop1.outhadoop2.mydomain.com: starting
nodemanager, logging to
/home/hadoop/logs/yarn-hadoop-nodemanager-hadoop2.ou*hadoop3.mydomain.com
<http://hadoop3.mydomain.com>: starting nodemanager, logging to
/home/hadoop/logs/yarn-hadoop-nodemanager-hadoop3.out*

But even tho everything is starting up OK on both nodes according to the
master, I see no new activity in the logs at all. Even tho there are still
entries in the log from before.


If I do a jps command on node3 this is what I see:

[root@hadoop3:~] #jps

2037 NodeManager

2143 Jps

Nodemanager is active.


And there is no activity in either the data node logs.

This is the last thing I see in the data logs:

2014-11-23 18:38:03,030 FATAL
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain

java.io.IOException: Incorrect configuration: namenode address
dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not
configured.

        at
org.apache.hadoop.hdfs.DFSUtil.getNNServiceRpcAddresses(DFSUtil.java:796)

        at
org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:155)

        at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:791)

        at
org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:292)

        at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1895)

        at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1782)

        at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1829)

        at
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2005)


        at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2029)

2014-11-23 18:38:03,032 INFO org.apache.hadoop.util.ExitUtil: Exiting with
status 1

2014-11-23 18:38:03,033 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down DataNode at hadoop3.mydomain.com/10.10.10.7

************************************************************/

In the yarn logs, however I do see new entries:

2014-11-23 18:35:08,279 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)

2014-11-23 18:35:09,280 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)

2014-11-23 18:35:10,281 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)

2014-11-23 18:35:11,281 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
I see entries in the yarn logs like that that keep repeating over and over
again.

So now all I need to do is try and figure out why node3 still isn't
working. Then I should be all set, in terms of getting started in learning
how to use hadoop!

Thanks for your help and input so far!

Tim


On Sun, Nov 23, 2014 at 3:02 PM, Andras POTOCZKY <[email protected]> wrote:

>  Hi
>
> There is not enough information to recognise the exact problem.
> Check the last 50-60 log lines of the datanode logs first.
>
> tail -fn 60 /home/hadoop/logs/hadoop-hadoop-datanode-hadoop2.log
> tail -fn 60 /home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.log
>
> You can see many useful information about the problem. Maybe the datanodes
> cannot write to the dedicated directories or there are previous hdfs
> information (for example bad version info) already exists in the datanode
> directories.
> These are just tips so first check the logs and if the logs cannot help
> you, provide to the list.
>
> Other tip: ssh to the datanodes and run jps command to check the datanode
> is alive or not.
>
> bye,
> Andras
>
>
>
> On 2014.11.23. 19:24, Tim Dunphy wrote:
>
> Hey all,
>
>   OK thanks for your advice on setting up a hadoop test environment to
> get started in learning how to use hadoop! I'm very excited to be able to
> start to take this plunge!
>
>  Although rather than using BigTop or Cloudera, I just decided to go for
> a straight apache hadoop install. I setup 3 t2micro instances on EC2 for my
> training purposes. And that seemed to go alright! As far as installing
> hadoop and starting the services goes.
>
>  I went so far as to setup the ssh access that the nodes will need. And
> the services seem to start without issue:
>
>  bash-4.2$ whoami
> hadoop
>
> bash-4.2$ start-dfs.sh
>
> Starting namenodes on [hadoop1.mydomain.com]
>
> hadoop1.mydomain.com: starting namenode, logging to
> /home/hadoop/logs/hadoop-hadoop-namenode-hadoop1.out
>
> hadoop2.mydomain.com: starting datanode, logging to
> /home/hadoop/logs/hadoop-hadoop-datanode-hadoop2.out
>
> hadoop3.mydomain.com: starting datanode, logging to
> /home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.out
>
> Starting secondary namenodes [0.0.0.0]
>
> 0.0.0.0: starting secondarynamenode, logging to
> /home/hadoop/logs/hadoop-hadoop-secondarynamenode-hadoop1.out
>
> bash-4.2$ start-yarn.sh
>
> starting yarn daemons
>
> starting resourcemanager, logging to
> /home/hadoop/logs/yarn-hadoop-resourcemanager-hadoop1.out
>
> hadoop2.mydomain.com: starting nodemanager, logging to
> /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop2.out
>
>  hadoop3.mydomain.com: starting nodemanager, logging to
> /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop3.out
>
> And I opened up these ports on the security groups for the two data nodes:
>
> [root@hadoop2:~] #netstat -tulpn | grep -i listen | grep java
>
> tcp        0      0 0.0.0.0:*50010*           0.0.0.0:*
> LISTEN      21405/java
>
> tcp        0      0 0.0.0.0:*50075*           0.0.0.0:*
> LISTEN      21405/java
>
> tcp        0      0 0.0.0.0:*50020*           0.0.0.0:*
> LISTEN      21405/java
>  But when I go to the hadoop web interface at:
>
> http://hadoop1.mydomain.com:50070 <http://hadoop1.jokefire.com:50070/>
>
> And click on the data node tab, I see no nodes are connected!
>
> I see that the hosts are listening on all interfaces.
>
> I also put all hosts into the /etc/hosts file on the master node.
>
> Using the first data node as an example I can telnet into each port on
> both datanodes from the master node:
>
> bash-4.2$ telnet hadoop2.mydomain.com *50010*
>
> Trying 172.31.63.42...
>
> Connected to hadoop2.mydomain.com.
>
> Escape character is '^]'.
>
> ^]
>
> telnet> quit
>
> Connection closed.
>
> bash-4.2$ telnet hadoop2.mydomain.com *50075*
>
> Trying 172.31.63.42...
>
> Connected to hadoop2.mydomain.com.
>
> Escape character is '^]'.
>
> ^]
>
> telnet> quit
>
> Connection closed.
>
> bash-4.2$ telnet hadoop2.mydomain.com *50020*
>
> Trying 172.31.63.42...
>
> Connected to hadoop2.mydomain.com.
>
> Escape character is '^]'.
>
> ^]
>
> telnet> quit
>
>  Connection closed.
>
> So apparently I've hit my first snag in setting up a hadoop cluster. Can
> anyone give me some tips as to how I can get the data nodes to show as
> connected to the master?
>
>
>  Thanks
>
> Tim
>
>
>
>
>  --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: datanodes not connecting

Reply via email to