RE: Nutch fetching times out at 3 hours, not sure why.

Chip Calhoun Tue, 17 Apr 2018 12:27:44 -0700

I'm on 1.12, and mine also defaulted at -1. It does not fail at the same URL, 
or even at the same point in a URL's fetcher loop; it really seems to be time 
based.

-----Original Message-----
From: Sadiki Latty [mailto:sla...@uottawa.ca] 
Sent: Tuesday, April 17, 2018 1:43 PM
To: user@nutch.apache.org
Subject: RE: Nutch fetching times out at 3 hours, not sure why.

Which version are you running? That value is defaulted to -1 in my current 
version (1.14)  so shouldn't be something you should have needed to change. My 
crawls, by default, go for as much as even 12 hours with little to no tweaking 
necessary from the nutch-default. Something else is causing it. Is it always 
the same URL that it fails at?

-----Original Message-----
From: Chip Calhoun [mailto:ccalh...@aip.org] 
Sent: April-17-18 10:45 AM
To: user@nutch.apache.org
Subject: Nutch fetching times out at 3 hours, not sure why.

I crawl a list of roughly 2600 URLs all on my local server, and I'm only 
crawling around 1000 of them. The fetcher quits after exactly 3 hours (give or 
take a few milliseconds) with this message in the log:

2018-04-13 15:50:48,885 INFO  fetcher.FetchItemQueues - * queue: 
https://history.aip.org >> dropping!

I've seen that 3 hours is the default in some Nutch installations, but I've got 
my fetcher.timelimit.mins set to -1. I'm sure I'm missing something obvious. 
Any thoughts would be greatly appreciated. Thank you.

Chip Calhoun
Digital Archivist
Niels Bohr Library & Archives
American Institute of Physics
One Physics Ellipse
College Park, MD  20740-3840  USA
Tel: +1 301-209-3180
Email: ccalh...@aip.org
https://www.aip.org/history-programs/niels-bohr-library

RE: Nutch fetching times out at 3 hours, not sure why.

Reply via email to