Glad to hear it is working now. Looking at your previous configuration I notice the zookeeper port is set to 2222. The default port is 2181 and that is what Nutchgora with default settings tries to connect to. Problem solved :)
On Wed, Jul 4, 2012 at 10:44 AM, Tianwei <[email protected]> wrote: > Hi, Ferdy, > > Thank you so so much. The problem has been solved based on your > suggestions. See my reply below. > > On Wed, Jul 4, 2012 at 12:48 AM, Ferdy Galema <[email protected]> > wrote: > > Hi, > > > > What distro are you running? Do you have firewall and SELinux disabled? > > > > The ZooKeeperConnectionException message is a bit misleading. Although it > > could mean that zookeeper has to many connections, it could also mean the > > zookeeper is not running at all (at the interface:port that the client is > > trying to connect to). The latter must be your case. I presume you did > not > > change a setting in your HBase or nutchclient regarding zookeeper. If you > > did try to restore all port settings. > > > > HBase 0.90.5 definitely works with the rc3. What I've tried is the > > following and I suggest you do the same. Download the official HBase > 0.90.5 > > release from Apache. Untar and simply start it using the > bin/start-hbase.sh > > command. Try some Nutch commands. If you still get the error, look in > your > > HBase logs or paste them over here. > > > Exactly as you said, I download a clean hbase version and start it, > now the nutch can successfully create the webpage table there. > > I double check my previous hbase. I believe the reason is that I used > the hbase which is set up by my colleague, we have used it for a long > time. Previously we just used scrapy to crawl page and use thrift to > store those html pages into hbase. Recently we find nutch may be > better to serve our crawling aim(mainly because we want to leverage > the mapreduce and hbase) and want to try it out. > > In the our hbase, my colleague set the zookeeper related settings as: > <property> > <name>hbase.zookeeper.property.clientPort</name> > <value>2222</value> > </property> > <property> > <name>hbase.zookeeper.quorum</name> > <value>xxxxxx</value> > </property> > <property> > <name>hbase.zookeeper.property.maxClientCnxns</name> > <value>200</value> > <description>Limit on number of concurrent connections (at the socket > level) > that a single client, identified by IP address, may make to a > single member > of the ZooKeeper ensemble. Set high to avoid zk connection issues > running > standalone and pseudo-distributed. > </description> > </property> > > I believe those changed settings may be not consist with the settings > which are assumed by nutch. I will double check and figure it out. > > > Thanks again. > > > Tianwei > > Good luck. > > > > On Wed, Jul 4, 2012 at 7:31 AM, Tianwei <[email protected]> wrote: > > > >> Hi, all, > >> > >> I am trying to build the 2.0 rc3, but can't make it work. I strictly > >> follow the wiki page(http://wiki.apache.org/nutch/Nutch2Tutorial). > >> > >> Before that, I also ensure that the hbase works well, as: > >> > >> hbase(main):004:0> create 'test1', 'cf' > >> 0 row(s) in 1.3080 seconds > >> > >> The following is what I did and the error I got, Hope you can give me > >> suggestion where I were doing wrong. > >> > >> 1. checkout the code > >> svn co http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc3 > >> > >> 2. modify the nutch-default.xml, gora.properties and ivy/ivy.xml as > >> the wiki said. > >> > >> 3. build the code: ant > >> > >> 4. test the code as: > >> tianwei@132:~/nutch-src/release-2.0rc3/runtime/local$ ./bin/nutch > >> inject urls/urls.txt > >> InjectorJob: starting > >> InjectorJob: urlDir: urls/urls.txt > >> InjectorJob: org.apache.gora.util.GoraException: > >> org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to > >> connect to ZooKeeper but the connection closes immediately. This could > >> be a sign that the server has too many connections (30 is the > >> default). Consider inspecting your ZK server logs for that error and > >> then make sure you are reusing HBaseConfiguration as often as you can. > >> See HTable's javadoc for more information. > >> at > >> > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167) > >> at > >> > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135) > >> at > >> > org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:69) > >> at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243) > >> at > org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268) > >> at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:288) > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >> at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:298) > >> Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase > >> is able to connect to ZooKeeper but the connection closes immediately. > >> This could be a sign that the server has too many connections (30 is > >> the default). Consider inspecting your ZK server logs for that error > >> and then make sure you are reusing HBaseConfiguration as often as you > >> can. See HTable's javadoc for more information. > >> at > >> > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:155) > >> ........... > >> > >> > >> I also tried to switch to use mysql, but also met the IO connection > >> exception. I guess there must be something wrong with my setting. > >> Could you give some suggestions to diagnose and solve this problem? > >> > >> PS, my hbase version is hbase-0.90.5, and in nutch's lib/ directory, > >> there is "hbase-0.90.4", I don't know if it matters or not. The hbase > >> is installed by user "hadoop" and I ran nutch with another user > >> "tianwei", don't know if I need to add something into CLASSPATH or > >> not? > >> > >> Thanks very much. > >> > >> > >> Tianwei > >> >

