Hey guys, Can you see the slave logs to find out what is happening there? For e.g., > /home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.log and > /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop-hadoop3.log. > There is not enough information to recognise the exact problem. > Check the last 50-60 log lines of the datanode logs first. > tail -fn 60 /home/hadoop/logs/hadoop-hadoop-datanode-hadoop2.log > tail -fn 60 /home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.log
> > You can see many useful information about the problem. Maybe the datanodes > cannot write to the dedicated directories or there are previous hdfs > information (for example bad version info) already exists in the datanode > directories. > These are just tips so first check the logs and if the logs cannot help > you, provide to the list. OK thanks for pointing me in the right direction! Part of my confusion in solving this was that I'm so new at this I wasn't really sure which logs to consult. I've made a note of what to check. And I was able to sort out the issue on one of the data nodes straight away! When i saw this: -1416723865810 (Datanode Uuid null) service to > hadoop1.mydomain.com/10.10.10.5:9000 Datanode denied communication with > namenode because hostname cannot be resolved (ip=54.164.203.179, > hostname=54.164.203.179): DatanodeRegistration(0.0.0.0, > datanodeUuid=1576b716-8841-46dd-b5fe-fab000bce4f3, infoPort=50075, > ipcPort=50020, > storageInfo=lv=-55;cid=CID-6205ad99-30de-4d2e-925c-d7853991d376;nsid=646941884;c=0) > 2014-11-23 18:09:49,146 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: > SIGTERM I realized that what I was seeing was a name resolution failure to the primary node. I'm using elastic IPs on the 3 hadoop nodes (1 master, 2 data). But I realized tht maybe that's part of my issue. So what I did was put the amazon private IP into the /etc/hosts file on the two datanodes. As soon as I fired up hadoop using start-dfs.sh and start-yarn.sh I had success, but only with the first node: 2014-11-23 18:21:54,021 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging ACTIVE Namenode Block pool BP-1107819734-10.10.10.5-1416723865810 (Datanode Uuid 1576b716-8841-46dd-b5fe-fab000bce4f3) service to hadoop1.mydomain.com/10.10.10.5:9000 2014-11-23 18:21:54,064 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Sent 1 blockreports 0 blocks total. Took 1 msec to generate and 42 msecs for RPC and NN processing. Got back commands org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@4a052c9e 2014-11-23 18:21:54,064 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize command for block pool BP-1107819734-10.10.10.5-1416723865810 And when I looked at the connected nodes in the hadoop interface, there was the node that I saw connecting in the logs! Partial success as I had one node but not the other. Now when I start up the services, it looks like everything is starting up as normal. Even in the node that is still not connecting. I've highlighted the node that's not connecting below: bash-4.2$ start-dfs.sh Starting namenodes on [hadoop1.mydomain.com]hadoop1.mydomain.com: starting namenode, logging to /home/hadoop/logs/ hadoop-hadoop-namenode-hadoop1.outhadoop2.mydomain.com: starting datanode, logging to /home/hadoop/logs/ hadoop-hadoop-datanode-hadoop2.outhadoop3.mydomain.com: starting datanode, logging to /home/hadoop/logs/hadoop-*hadoop-datanode-hadoop3.outStarting secondary namenodes [0.0.0.0]0.0.0.0 <http://0.0.0.0>: starting secondarynamenode, logging to /home/hadoop/logs/hadoop-hadoop* -secondarynamenode-hadoop1.out bash-4.2$ start-dfs.sh Starting namenodes on [hadoop1.mydomain.com] hadoop1.mydomain.com: starting namenode, logging to /home/hadoop/logs/hadoop-hadoop-namenode-hadoop1.out hadoop2.mydomain.com: starting datanode, logging to /home/hadoop/logs/hadoop-hadoop-datanode-hadoop2.out hadoop3.mydomain.com: starting datanode, logging to /home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /home/hadoop/logs/hadoop-hadoop-secondarynamenode-hadoop1.out bash-4.2$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/logs/ yarn-hadoop-resourcemanager-hadoop1.outhadoop2.mydomain.com: starting nodemanager, logging to /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop2.ou*hadoop3.mydomain.com <http://hadoop3.mydomain.com>: starting nodemanager, logging to /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop3.out* But even tho everything is starting up OK on both nodes according to the master, I see no new activity in the logs at all. Even tho there are still entries in the log from before. If I do a jps command on node3 this is what I see: [root@hadoop3:~] #jps 2037 NodeManager 2143 Jps Nodemanager is active. And there is no activity in either the data node logs. This is the last thing I see in the data logs: 2014-11-23 18:38:03,030 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain java.io.IOException: Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured. at org.apache.hadoop.hdfs.DFSUtil.getNNServiceRpcAddresses(DFSUtil.java:796) at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:155) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:791) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:292) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1895) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1782) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1829) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2005) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2029) 2014-11-23 18:38:03,032 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2014-11-23 18:38:03,033 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at hadoop3.mydomain.com/10.10.10.7 ************************************************************/ In the yarn logs, however I do see new entries: 2014-11-23 18:35:08,279 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-11-23 18:35:09,280 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-11-23 18:35:10,281 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-11-23 18:35:11,281 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) I see entries in the yarn logs like that that keep repeating over and over again. So now all I need to do is try and figure out why node3 still isn't working. Then I should be all set, in terms of getting started in learning how to use hadoop! Thanks for your help and input so far! Tim On Sun, Nov 23, 2014 at 3:02 PM, Andras POTOCZKY <[email protected]> wrote: > Hi > > There is not enough information to recognise the exact problem. > Check the last 50-60 log lines of the datanode logs first. > > tail -fn 60 /home/hadoop/logs/hadoop-hadoop-datanode-hadoop2.log > tail -fn 60 /home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.log > > You can see many useful information about the problem. Maybe the datanodes > cannot write to the dedicated directories or there are previous hdfs > information (for example bad version info) already exists in the datanode > directories. > These are just tips so first check the logs and if the logs cannot help > you, provide to the list. > > Other tip: ssh to the datanodes and run jps command to check the datanode > is alive or not. > > bye, > Andras > > > > On 2014.11.23. 19:24, Tim Dunphy wrote: > > Hey all, > > OK thanks for your advice on setting up a hadoop test environment to > get started in learning how to use hadoop! I'm very excited to be able to > start to take this plunge! > > Although rather than using BigTop or Cloudera, I just decided to go for > a straight apache hadoop install. I setup 3 t2micro instances on EC2 for my > training purposes. And that seemed to go alright! As far as installing > hadoop and starting the services goes. > > I went so far as to setup the ssh access that the nodes will need. And > the services seem to start without issue: > > bash-4.2$ whoami > hadoop > > bash-4.2$ start-dfs.sh > > Starting namenodes on [hadoop1.mydomain.com] > > hadoop1.mydomain.com: starting namenode, logging to > /home/hadoop/logs/hadoop-hadoop-namenode-hadoop1.out > > hadoop2.mydomain.com: starting datanode, logging to > /home/hadoop/logs/hadoop-hadoop-datanode-hadoop2.out > > hadoop3.mydomain.com: starting datanode, logging to > /home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.out > > Starting secondary namenodes [0.0.0.0] > > 0.0.0.0: starting secondarynamenode, logging to > /home/hadoop/logs/hadoop-hadoop-secondarynamenode-hadoop1.out > > bash-4.2$ start-yarn.sh > > starting yarn daemons > > starting resourcemanager, logging to > /home/hadoop/logs/yarn-hadoop-resourcemanager-hadoop1.out > > hadoop2.mydomain.com: starting nodemanager, logging to > /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop2.out > > hadoop3.mydomain.com: starting nodemanager, logging to > /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop3.out > > And I opened up these ports on the security groups for the two data nodes: > > [root@hadoop2:~] #netstat -tulpn | grep -i listen | grep java > > tcp 0 0 0.0.0.0:*50010* 0.0.0.0:* > LISTEN 21405/java > > tcp 0 0 0.0.0.0:*50075* 0.0.0.0:* > LISTEN 21405/java > > tcp 0 0 0.0.0.0:*50020* 0.0.0.0:* > LISTEN 21405/java > But when I go to the hadoop web interface at: > > http://hadoop1.mydomain.com:50070 <http://hadoop1.jokefire.com:50070/> > > And click on the data node tab, I see no nodes are connected! > > I see that the hosts are listening on all interfaces. > > I also put all hosts into the /etc/hosts file on the master node. > > Using the first data node as an example I can telnet into each port on > both datanodes from the master node: > > bash-4.2$ telnet hadoop2.mydomain.com *50010* > > Trying 172.31.63.42... > > Connected to hadoop2.mydomain.com. > > Escape character is '^]'. > > ^] > > telnet> quit > > Connection closed. > > bash-4.2$ telnet hadoop2.mydomain.com *50075* > > Trying 172.31.63.42... > > Connected to hadoop2.mydomain.com. > > Escape character is '^]'. > > ^] > > telnet> quit > > Connection closed. > > bash-4.2$ telnet hadoop2.mydomain.com *50020* > > Trying 172.31.63.42... > > Connected to hadoop2.mydomain.com. > > Escape character is '^]'. > > ^] > > telnet> quit > > Connection closed. > > So apparently I've hit my first snag in setting up a hadoop cluster. Can > anyone give me some tips as to how I can get the data nodes to show as > connected to the master? > > > Thanks > > Tim > > > > > -- > GPG me!! > > gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B > > > -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
