Hi Brian, On Thursday, July 18, 2013, brian4 <[email protected]> wrote: > On one machine, nutch just suddenly started freezing during the generator > job. Are these continuous crawls? What values d you have set for generate.max.count? I ask as calls must be made to the backed the determine a limit for URLs to generate into batches... I suppose if you're running with a -1 value for this figure the call could be expensive as well. > > I can also run the same crawl (using all of the same programs and files) > from another machine and it runs fine. Although it is one machine for now, > I am worried that it might randomly happen on other machines at some point > as well, so I can't rely on it for regular crawling.
Mmm. So maybe you are not doing continuous large scale crawls as I thought above? > Looking at the dumps, it looks like it may be due to / related to a deadlock > caused by a zookeeper/hbase issue listed at the following link, but maybe it > can be avoided in the nutch generator itself. > > https://issues.apache.org/jira/browse/HBASE-2966 Yep > > > However even if that is the cause we would have to wait for gora to be > updated to use the fixed hbase once it's fixed and then for nutch to be > updated to use the updated gora, so I am hoping maybe someone has an idea of > a workaround I could use now. I've not heard anyone coming here with a similar problem! I am confused on this one. > > Otherwise I am thinking of trying to switch to another data store. Which > data store is most reliable and does not have such deadlock issues? If this is a problem with a zookeeper server then it may not be linked to Gora. There is not one line of zookeeper code within Gora. I wiould check your hbase/zk installation before you think about ditching everything and jumping ship. It > seems like maybe a lot of people use Cassandra, but I had the impression > there were more issues getting it to work correctly than with HBase. Every1 to their own I suppose here. There are a number of *stable* backends which can be used. If getting things working easily is your primary criteria then I would't say there is much between the available options. hth -- *Lewis*

