Hi,

I am currently using Nutch 1.8. I found repeated errors of this kind -
"*fetcher
caught:java.io.IOException: Spill failed*" while re-crawling a URL. It
shows up repeatedly during the fetching process like shown below and it
also does not stop the fetching process. I am not really sure how to handle
it.

People who reported this error on various forums have a Hadoop cluster
where as in my case I am *not *using any cluster. All I have is *Nutch 1.8*
with local filesystem on a single  server.


2015-04-06 14:57:39,160 INFO  fetcher.Fetcher - fetching
http://examplenutch/pics/pages/p0000108.shtml (queue crawl delay=5000ms)
2015-04-06 14:57:39,340 INFO  fetcher.Fetcher - fetching
http://examplenutch/pages/forest%20underburn.htm (queue crawl delay=5000ms)
2015-04-06 14:57:39,368 INFO  fetcher.Fetcher - fetching
http://examplenutch/elevation_contours_20.e00 (queue crawl delay=5000ms)
2015-04-06 14:57:39,488 ERROR fetcher.Fetcher - fetcher
caught:java.io.IOException: *Spill failed*
2015-04-06 14:57:39,591 ERROR fetcher.Fetcher - fetcher
caught:java.io.IOException: *Spill failed*
2015-04-06 14:57:39,592 INFO  fetcher.Fetcher - fetching
http://examplenutch/newsletters/2010-mar.shtml (queue crawl delay=5000ms)
2015-04-06 14:57:39,840 ERROR fetcher.Fetcher - fetcher
caught:java.io.IOException: *Spill failed*
2015-04-06 14:57:39,841 INFO  fetcher.Fetcher - fetching
http://examplenutch/scifi83.pdf (queue crawl delay=5000ms)
2015-04-06 14:57:39,979 INFO  fetcher.Fetcher - fetching
http://example.com/playground.pdf(queue crawl delay=5000ms)
2015-04-06 14:57:40,033 INFO  fetcher.Fetcher - -activeThreads=50,
spinWaiting=46, fetchQueues.totalSize=2498
2015-04-06 14:57:40,172 ERROR fetcher.Fetcher - fetcher
caught:java.io.IOException: *Spill failed*
2015-04-06 14:57:40,722 ERROR fetcher.Fetcher - fetcher
caught:java.io.IOException: *Spill failed*
2015-04-06 14:57:40,722 INFO  fetcher.Fetcher - fetching
http://examplenutch/watershed_scale.pdf (queue crawl delay=5000ms)

Please advise

Thanks a bunch!!

Reply via email to