Re : Re: Re : Re: fetcher.max.crawl.delay = -1 doesn't work?

Danicela nutch Thu, 16 Feb 2012 01:39:22 -0800

I have sites which have crawl delay at 20, others at 720, but in both cases, it 
should fetch some pages, but it couldn't.


----- Message d'origine -----
De : Lewis John Mcgibbney
Envoyés : 15.02.12 23:11
À : [email protected]
Objet : Re: Re : Re: fetcher.max.crawl.delay = -1 doesn't work?

 Another question I should have asked is how long is the crawl delay in 
robots.txt? If you read the fetcher.max.crawl.delay property description it 
explicitly notes that the fetcher will wait however long it is required by 
robots.tx until it fetches the page. Do you have this information? Thanks On 
Wed, Feb 15, 2012 at 9:08 AM, Danicela nutch <[email protected]>wrote: > 
I don't think I configured such things, how can I be sure ? > > ----- Message 
d'origine ----- > De : Lewis John Mcgibbney > Envoyés : 14.02.12 19:18 > À : 
[email protected] > Objet : Re: fetcher.max.crawl.delay = -1 doesn't work? 
> > Hi Danicela, Before I try this, have you configured any other overrides > 
for generating or fetching in nutch-site.xml? Thanks On Tue, Feb 14, 2012 > at 
3:10 PM, Danicela nutch <[email protected]>wrote: > Hi, > > I > have in 
my nutch-site.xml the value fetcher.max.crawl.delay = -1. > > When > I try to 
fetch a site with a robots.txt with a Crawl Delay, it > doesn'
 t > work. > > If I put fetcher.max.crawl.delay = 10000, it works. > > I use > 
Nutch 1.2, but according to the changelog, nothing has been changed > about > 
that since then. > > Is this a Nutch bug or I misused something ? > > > Another 
thing, in hadoop.log, the pages which couldn't be fetched are > > still marked 
as "fetching", is this normal ? Shouldn't they be marked as > > "dropped" or 
something ? > > Thanks. > -- *Lewis* > -- *Lewis*

Re : Re: Re : Re: fetcher.max.crawl.delay = -1 doesn't work?

Reply via email to