I tried to debug the issue. You are hit by https://issues.apache.org/jira/browse/NUTCH-1647
Thanks, Tejas On Fri, Dec 27, 2013 at 8:54 AM, yan wang <dayank...@gmail.com> wrote: > Hi, guys > Yesterday, I tried to crawl a website (a Chinese website) with some > seed links like this: > http://www.ccgp.gov.cn/cggg/dfbx/gkzb/default_4.shtml > but the crawl process failed because of a problem shown as following: > fetching http://www.ccgp.gov.cn/cggg/dfbx/gkzb/default_4.shtml (queue > crawl delay=5000ms) > fetch of http://www.ccgp.gov.cn/cggg/dfbx/gkzb/default_4.shtml failed > with: java.io.IOException: unzipBestEffort returned null > At first, I used nutch-1.5.1 to crawl the website and had the above > problem, then I changed to use nutch-1.7 to do it again but it failed again. > Now, I totally have no idea how to handle the problem! > I would really appreciate any feedback! > > -Yan Wang