On one machine, nutch just suddenly started freezing during the generator
job.  The same files and scripts that worked fine previously now result in
freezing. I checked with the system administrator and no changes were made
to the machine.

I can also run the same crawl (using all of the same programs and files)
from another machine and it runs fine.  Although it is one machine for now,
I am worried that it might randomly happen on other machines at some point
as well, so I can't rely on it for regular crawling. 

I attached one thread dump, the others are the same.  From the JVM thread
dumps:

Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.1-b03 mixed mode):

"Attach Listener" daemon prio=10 tid=0x000000000adb2800 nid=0x7a07 waiting
on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"pool-1-thread-1-EventThread" daemon prio=10 tid=0x000000000a68a000
nid=0x793b waiting on condition [0x00000000427aa000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000bce90078> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)

...

"main" prio=10 tid=0x000000000a14b800 nid=0x7913 waiting on condition
[0x0000000040971000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at
org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1387)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:583)
        at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
        at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:199)
        at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:223)
        at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:279)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:287)


Looking at the dumps, it looks like it may be due to / related to a deadlock
caused by a zookeeper/hbase issue listed at the following link, but maybe it
can be avoided in the nutch generator itself.

https://issues.apache.org/jira/browse/HBASE-2966


However even if that is the cause we would have to wait for gora to be
updated to use the fixed hbase once it's fixed and then for nutch to be
updated to use the updated gora, so I am hoping maybe someone has an idea of
a workaround I could use now.

Otherwise I am thinking of trying to switch to another data store.  Which
data store is most reliable and does not have such deadlock issues?  It
seems like maybe a lot of people use Cassandra, but I had the impression
there were more issues getting it to work correctly than with HBase.


Version info:

Hbase: 0.90.6
Nutch: 2.2.1 (also happened with 2.1)
JDK: jdk1.7.0_05 (also happened with 1.6)

Machine info:

bash-3.2$ uname -a
Linux sc-d01-bh 2.6.18-308.4.1.el5 #1 SMP Tue Apr 17 17:08:00 EDT 2012
x86_64 x86_64 x86_64 GNU/Linux

bash-3.2$ cat /proc/version
Linux version 2.6.18-308.4.1.el5 ([email protected]) (gcc
version 4.1.2 20080704 (Red Hat 4.1.2-52)) #1 SMP Tue Apr 17 17:08:00 EDT
2012





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-2-2-1-Freezing-Deadlocked-During-Generator-Job-tp4078894.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to