Hi, Ferdy, Thank you so so much. The problem has been solved based on your suggestions. See my reply below.
On Wed, Jul 4, 2012 at 12:48 AM, Ferdy Galema <[email protected]> wrote: > Hi, > > What distro are you running? Do you have firewall and SELinux disabled? > > The ZooKeeperConnectionException message is a bit misleading. Although it > could mean that zookeeper has to many connections, it could also mean the > zookeeper is not running at all (at the interface:port that the client is > trying to connect to). The latter must be your case. I presume you did not > change a setting in your HBase or nutchclient regarding zookeeper. If you > did try to restore all port settings. > > HBase 0.90.5 definitely works with the rc3. What I've tried is the > following and I suggest you do the same. Download the official HBase 0.90.5 > release from Apache. Untar and simply start it using the bin/start-hbase.sh > command. Try some Nutch commands. If you still get the error, look in your > HBase logs or paste them over here. > Exactly as you said, I download a clean hbase version and start it, now the nutch can successfully create the webpage table there. I double check my previous hbase. I believe the reason is that I used the hbase which is set up by my colleague, we have used it for a long time. Previously we just used scrapy to crawl page and use thrift to store those html pages into hbase. Recently we find nutch may be better to serve our crawling aim(mainly because we want to leverage the mapreduce and hbase) and want to try it out. In the our hbase, my colleague set the zookeeper related settings as: <property> <name>hbase.zookeeper.property.clientPort</name> <value>2222</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>xxxxxx</value> </property> <property> <name>hbase.zookeeper.property.maxClientCnxns</name> <value>200</value> <description>Limit on number of concurrent connections (at the socket level) that a single client, identified by IP address, may make to a single member of the ZooKeeper ensemble. Set high to avoid zk connection issues running standalone and pseudo-distributed. </description> </property> I believe those changed settings may be not consist with the settings which are assumed by nutch. I will double check and figure it out. Thanks again. Tianwei > Good luck. > > On Wed, Jul 4, 2012 at 7:31 AM, Tianwei <[email protected]> wrote: > >> Hi, all, >> >> I am trying to build the 2.0 rc3, but can't make it work. I strictly >> follow the wiki page(http://wiki.apache.org/nutch/Nutch2Tutorial). >> >> Before that, I also ensure that the hbase works well, as: >> >> hbase(main):004:0> create 'test1', 'cf' >> 0 row(s) in 1.3080 seconds >> >> The following is what I did and the error I got, Hope you can give me >> suggestion where I were doing wrong. >> >> 1. checkout the code >> svn co http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc3 >> >> 2. modify the nutch-default.xml, gora.properties and ivy/ivy.xml as >> the wiki said. >> >> 3. build the code: ant >> >> 4. test the code as: >> tianwei@132:~/nutch-src/release-2.0rc3/runtime/local$ ./bin/nutch >> inject urls/urls.txt >> InjectorJob: starting >> InjectorJob: urlDir: urls/urls.txt >> InjectorJob: org.apache.gora.util.GoraException: >> org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to >> connect to ZooKeeper but the connection closes immediately. This could >> be a sign that the server has too many connections (30 is the >> default). Consider inspecting your ZK server logs for that error and >> then make sure you are reusing HBaseConfiguration as often as you can. >> See HTable's javadoc for more information. >> at >> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167) >> at >> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135) >> at >> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:69) >> at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243) >> at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268) >> at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:288) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:298) >> Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase >> is able to connect to ZooKeeper but the connection closes immediately. >> This could be a sign that the server has too many connections (30 is >> the default). Consider inspecting your ZK server logs for that error >> and then make sure you are reusing HBaseConfiguration as often as you >> can. See HTable's javadoc for more information. >> at >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:155) >> ........... >> >> >> I also tried to switch to use mysql, but also met the IO connection >> exception. I guess there must be something wrong with my setting. >> Could you give some suggestions to diagnose and solve this problem? >> >> PS, my hbase version is hbase-0.90.5, and in nutch's lib/ directory, >> there is "hbase-0.90.4", I don't know if it matters or not. The hbase >> is installed by user "hadoop" and I ran nutch with another user >> "tianwei", don't know if I need to add something into CLASSPATH or >> not? >> >> Thanks very much. >> >> >> Tianwei >>

