I'm still experimenting with this. I had been crawling with a depth of 1 
because I don't need anything outside my URLs list, but I tried with a depth of 
10. It went through a crawl loop that ended after 3 hours, then a second 3 hour 
crawl loop, then a third shorter loop. It still stopped 5 URLs short of 
crawling every URL in my list, though it crawled a few I hadn't included. 

Are these 3 hour loops standard for large crawls?

-----Original Message-----
From: Chip Calhoun [mailto:ccalh...@aip.org] 
Sent: Tuesday, April 17, 2018 3:27 PM
To: user@nutch.apache.org
Subject: RE: Nutch fetching times out at 3 hours, not sure why.

I'm on 1.12, and mine also defaulted at -1. It does not fail at the same URL, 
or even at the same point in a URL's fetcher loop; it really seems to be time 
based. 

-----Original Message-----
From: Sadiki Latty [mailto:sla...@uottawa.ca] 
Sent: Tuesday, April 17, 2018 1:43 PM
To: user@nutch.apache.org
Subject: RE: Nutch fetching times out at 3 hours, not sure why.

Which version are you running? That value is defaulted to -1 in my current 
version (1.14)  so shouldn't be something you should have needed to change. My 
crawls, by default, go for as much as even 12 hours with little to no tweaking 
necessary from the nutch-default. Something else is causing it. Is it always 
the same URL that it fails at?

-----Original Message-----
From: Chip Calhoun [mailto:ccalh...@aip.org] 
Sent: April-17-18 10:45 AM
To: user@nutch.apache.org
Subject: Nutch fetching times out at 3 hours, not sure why.

I crawl a list of roughly 2600 URLs all on my local server, and I'm only 
crawling around 1000 of them. The fetcher quits after exactly 3 hours (give or 
take a few milliseconds) with this message in the log:

2018-04-13 15:50:48,885 INFO  fetcher.FetchItemQueues - * queue: 
https://history.aip.org >> dropping!

I've seen that 3 hours is the default in some Nutch installations, but I've got 
my fetcher.timelimit.mins set to -1. I'm sure I'm missing something obvious. 
Any thoughts would be greatly appreciated. Thank you.

Chip Calhoun
Digital Archivist
Niels Bohr Library & Archives
American Institute of Physics
One Physics Ellipse
College Park, MD  20740-3840  USA
Tel: +1 301-209-3180
Email: ccalh...@aip.org
https://www.aip.org/history-programs/niels-bohr-library

Reply via email to