Which version are you running? That value is defaulted to -1 in my current 
version (1.14)  so shouldn't be something you should have needed to change. My 
crawls, by default, go for as much as even 12 hours with little to no tweaking 
necessary from the nutch-default. Something else is causing it. Is it always 
the same URL that it fails at?

-----Original Message-----
From: Chip Calhoun [mailto:ccalh...@aip.org] 
Sent: April-17-18 10:45 AM
To: user@nutch.apache.org
Subject: Nutch fetching times out at 3 hours, not sure why.

I crawl a list of roughly 2600 URLs all on my local server, and I'm only 
crawling around 1000 of them. The fetcher quits after exactly 3 hours (give or 
take a few milliseconds) with this message in the log:

2018-04-13 15:50:48,885 INFO  fetcher.FetchItemQueues - * queue: 
https://history.aip.org >> dropping!

I've seen that 3 hours is the default in some Nutch installations, but I've got 
my fetcher.timelimit.mins set to -1. I'm sure I'm missing something obvious. 
Any thoughts would be greatly appreciated. Thank you.

Chip Calhoun
Digital Archivist
Niels Bohr Library & Archives
American Institute of Physics
One Physics Ellipse
College Park, MD  20740-3840  USA
Tel: +1 301-209-3180
Email: ccalh...@aip.org
https://www.aip.org/history-programs/niels-bohr-library

Reply via email to