Hi,
I was able to run a two-node cluster and crawl a 60mb index inside two virtual
machines. However,
when I switched to real machines where I don't have root access, I wasn't able
to run the hadoop
cluster.
I am using 4 nodes for the new cluster. The nodes do not have disks so I am
using a common disk
where I created the path exactly as I needed except that local, filesystem,
home are symbolic
links.
Here is how the nutch directory looks like
drwxr-xr-x 2 volos parsa 4096 Jan 28 17:58 dis_search
lrwxrwxrwx 1 volos parsa 21 Jan 28 18:01 filesystem -> /tmp/volos/filesystem
lrwxrwxrwx 1 volos parsa 15 Jan 28 18:05 home -> /tmp/volos/home
lrwxrwxrwx 1 volos parsa 16 Jan 28 18:04 local -> /tmp/volos/local
drwxr-xr-x 5 volos parsa 4096 Jan 28 17:58 parsanode042
drwxr-xr-x 5 volos parsa 4096 Jan 28 17:58 parsanode043
drwxr-xr-x 5 volos parsa 4096 Jan 28 17:58 parsanode044
drwxr-xr-x 5 volos parsa 4096 Jan 28 17:58 parsanode045
drwxr-xr-x 11 volos parsa 4096 Jan 28 19:48 search
lrwxrwxrwx 1 volos parsa 34 Jan 25 18:12 tomcat ->
../apache-tomcat-7.0.6-src/output/
Parsanode042-045 directories contain the actual folders for filesystem, local,
and home for each node
respectively. /tmp/volos/XXX are links to parsanode0YY/XXX
E.g.
lrwxrwxrwx 1 volos parsa 57 Jan 28 18:02 filesystem ->
/home/parsacom/users/volos/nutch/parsanode042/filesystem/
lrwxrwxrwx 1 volos parsa 51 Jan 28 18:05 home ->
/home/parsacom/users/volos/nutch/parsanode042/home/
lrwxrwxrwx 1 volos parsa 52 Jan 28 18:04 local ->
/home/parsacom/users/volos/nutch/parsanode042/local/
Note that search is the same directory for every node. Dis_search is the
distribution that I am going to use
for each distribute search server.
When I start bin/start-all.sh from parsanode042 (master node) everything goes
properly since all .out files do not show anything.
starting namenode, logging to
/home/parsacom/users/volos/nutch/search/logs/hadoop-volos-namenode-parsanode042.out
192.168.10.44: starting datanode, logging to
/home/parsacom/users/volos/nutch/search/logs/hadoop-volos-datanode-parsanode044.out
192.168.10.43: starting datanode, logging to
/home/parsacom/users/volos/nutch/search/logs/hadoop-volos-datanode-parsanode043.out
192.168.10.42: starting datanode, logging to
/home/parsacom/users/volos/nutch/search/logs/hadoop-volos-datanode-parsanode042.out
192.168.10.45: starting datanode, logging to
/home/parsacom/users/volos/nutch/search/logs/hadoop-volos-datanode-parsanode045.out
192.168.10.42: starting secondarynamenode, logging to
/home/parsacom/users/volos/nutch/search/logs/hadoop-volos-secondarynamenode-parsanode042.out
starting jobtracker, logging to
/home/parsacom/users/volos/nutch/search/logs/hadoop-volos-jobtracker-parsanode042.out
192.168.10.42: starting tasktracker, logging to
/home/parsacom/users/volos/nutch/search/logs/hadoop-volos-tasktracker-parsanode042.out
192.168.10.43: starting tasktracker, logging to
/home/parsacom/users/volos/nutch/search/logs/hadoop-volos-tasktracker-parsanode043.out
192.168.10.45: starting tasktracker, logging to
/home/parsacom/users/volos/nutch/search/logs/hadoop-volos-tasktracker-parsanode045.out
192.168.10.44: starting tasktracker, logging to
/home/parsacom/users/volos/nutch/search/logs/hadoop-volos-tasktracker-parsanode044.out
However I cannot access parsanode042:50070 and when I exectute bin/stop-all.sh
I have the following output
stopping jobtracker
192.168.10.42: stopping tasktracker
192.168.10.43: stopping tasktracker
192.168.10.45: stopping tasktracker
192.168.10.44: stopping tasktracker
stopping namenode
192.168.10.44: stopping datanode
192.168.10.43: stopping datanode
192.168.10.45: stopping datanode
192.168.10.42: stopping datanode
192.168.10.42: no secondarynamenode to stop
The log of the secondarynamenode contains
Exception in thread "main" java.io.IOException: Call to /192.168.10.42:9000
failed on local exception: java.io.IOException: Connection reset by peer
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:314)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:291)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:134)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:115)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:469)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
at sun.nio.ch.IOUtil.read(IOUtil.java:175)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
What can be the reason that secondary namenode does not run ?
Why does not the namenode work properly (http://parsanode042:50070 does not
respond although the port is open).
Thanks,
-Stavros