Hi, guys
      Yesterday, I tried to crawl a website (a Chinese website) with some seed 
links like this:
http://www.ccgp.gov.cn/cggg/dfbx/gkzb/default_4.shtml
but the crawl process failed because of a problem shown as following:
fetching http://www.ccgp.gov.cn/cggg/dfbx/gkzb/default_4.shtml (queue crawl 
delay=5000ms)
fetch of http://www.ccgp.gov.cn/cggg/dfbx/gkzb/default_4.shtml failed with: 
java.io.IOException: unzipBestEffort returned null
At first, I used nutch-1.5.1 to crawl the website and had the above problem, 
then I changed to use nutch-1.7 to do it again but it failed again. 
Now, I totally have no idea how to handle the problem! 
I would really appreciate any feedback!

-Yan Wang

Reply via email to