Glad to hear it is working now.

Looking at your previous configuration I notice the zookeeper port is set
to 2222. The default port is 2181 and that is what Nutchgora with default
settings tries to connect to. Problem solved :)

On Wed, Jul 4, 2012 at 10:44 AM, Tianwei <[email protected]> wrote:

> Hi, Ferdy,
>
> Thank you so  so much. The problem has been solved based on your
> suggestions. See my reply below.
>
> On Wed, Jul 4, 2012 at 12:48 AM, Ferdy Galema <[email protected]>
> wrote:
> > Hi,
> >
> > What distro are you running? Do you have firewall and SELinux disabled?
> >
> > The ZooKeeperConnectionException message is a bit misleading. Although it
> > could mean that zookeeper has to many connections, it could also mean the
> > zookeeper is not running at all (at the interface:port that the client is
> > trying to connect to). The latter must be your case. I presume you did
> not
> > change a setting in your HBase or nutchclient regarding zookeeper. If you
> > did try to restore all port settings.
> >
> > HBase 0.90.5 definitely works with the rc3. What I've tried is the
> > following and I suggest you do the same. Download the official HBase
> 0.90.5
> > release from Apache. Untar and simply start it using the
> bin/start-hbase.sh
> > command. Try some Nutch commands. If you still get the error, look in
> your
> > HBase logs or paste them over here.
> >
> Exactly as you said, I download a clean hbase version and start it,
> now the nutch can successfully create the webpage table there.
>
> I double check my previous hbase. I believe the reason is that I used
> the  hbase which is set up by my colleague, we have used it for a long
> time. Previously we just used scrapy to crawl page and use thrift to
> store those html pages into hbase. Recently we find nutch may be
> better to serve our crawling aim(mainly because  we want to leverage
> the mapreduce and hbase) and want to try it out.
>
> In the our hbase, my colleague set the zookeeper related settings as:
>  <property>
>     <name>hbase.zookeeper.property.clientPort</name>
>     <value>2222</value>
>   </property>
>  <property>
>     <name>hbase.zookeeper.quorum</name>
>     <value>xxxxxx</value>
>   </property>
>  <property>
>     <name>hbase.zookeeper.property.maxClientCnxns</name>
>     <value>200</value>
>     <description>Limit on number of concurrent connections (at the socket
> level)
>       that a single client, identified by IP address, may make to a
> single member
>       of the ZooKeeper ensemble. Set high to avoid zk connection issues
> running
>       standalone and pseudo-distributed.
>     </description>
>   </property>
>
> I believe those changed settings may be not consist with the settings
> which are assumed by nutch. I will double check and figure it out.
>
>
> Thanks again.
>
>
> Tianwei
> > Good luck.
> >
> > On Wed, Jul 4, 2012 at 7:31 AM, Tianwei <[email protected]> wrote:
> >
> >> Hi, all,
> >>
> >> I am trying to build the 2.0 rc3, but can't make it work.  I strictly
> >> follow the wiki page(http://wiki.apache.org/nutch/Nutch2Tutorial).
> >>
> >> Before that, I also ensure that  the hbase works well, as:
> >>
> >> hbase(main):004:0> create 'test1', 'cf'
> >> 0 row(s) in 1.3080 seconds
> >>
> >> The following is what I did and the error I got, Hope you can give me
> >> suggestion where I were doing wrong.
> >>
> >> 1. checkout the code
> >> svn co http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc3
> >>
> >> 2. modify the nutch-default.xml, gora.properties and ivy/ivy.xml as
> >> the wiki said.
> >>
> >> 3. build the code: ant
> >>
> >> 4. test the code as:
> >> tianwei@132:~/nutch-src/release-2.0rc3/runtime/local$ ./bin/nutch
> >> inject urls/urls.txt
> >> InjectorJob: starting
> >> InjectorJob: urlDir: urls/urls.txt
> >> InjectorJob: org.apache.gora.util.GoraException:
> >> org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to
> >> connect to ZooKeeper but the connection closes immediately. This could
> >> be a sign that the server has too many connections (30 is the
> >> default). Consider inspecting your ZK server logs for that error and
> >> then make sure you are reusing HBaseConfiguration as often as you can.
> >> See HTable's javadoc for more information.
> >>         at
> >>
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
> >>         at
> >>
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
> >>         at
> >>
> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:69)
> >>         at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
> >>         at
> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268)
> >>         at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:288)
> >>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >>         at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:298)
> >> Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase
> >> is able to connect to ZooKeeper but the connection closes immediately.
> >> This could be a sign that the server has too many connections (30 is
> >> the default). Consider inspecting your ZK server logs for that error
> >> and then make sure you are reusing HBaseConfiguration as often as you
> >> can. See HTable's javadoc for more information.
> >>         at
> >>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:155)
> >> ...........
> >>
> >>
> >> I also tried to switch to use mysql, but also met the IO connection
> >> exception. I guess there must be something wrong with my setting.
> >> Could you give some suggestions to diagnose and solve this problem?
> >>
> >> PS, my hbase version is hbase-0.90.5, and in nutch's lib/ directory,
> >> there is "hbase-0.90.4", I don't know if it matters or not.  The hbase
> >> is installed by user "hadoop" and I ran nutch with another user
> >> "tianwei", don't know if I need to add something into CLASSPATH or
> >> not?
> >>
> >> Thanks very much.
> >>
> >>
> >> Tianwei
> >>
>

Reply via email to