I tried to debug the issue. You are hit by
https://issues.apache.org/jira/browse/NUTCH-1647

Thanks,
Tejas


On Fri, Dec 27, 2013 at 8:54 AM, yan wang <dayank...@gmail.com> wrote:

> Hi, guys
>       Yesterday, I tried to crawl a website (a Chinese website) with some
> seed links like this:
> http://www.ccgp.gov.cn/cggg/dfbx/gkzb/default_4.shtml
> but the crawl process failed because of a problem shown as following:
> fetching http://www.ccgp.gov.cn/cggg/dfbx/gkzb/default_4.shtml (queue
> crawl delay=5000ms)
> fetch of http://www.ccgp.gov.cn/cggg/dfbx/gkzb/default_4.shtml failed
> with: java.io.IOException: unzipBestEffort returned null
> At first, I used nutch-1.5.1 to crawl the website and had the above
> problem, then I changed to use nutch-1.7 to do it again but it failed again.
> Now, I totally have no idea how to handle the problem!
> I would really appreciate any feedback!
>
> -Yan Wang

Reply via email to