It will be checking how long has been since the last fetch. So there will be a 
check which causes a natural delay. 

But 7 minutes for 50 URLs might be too much, did you investigate which URLS are 
they? Could they be large PDF files or could your bandwidth be limited? Could 
you detect the bottleneck except for checking already seen URLs?



----- Orijinal Mesaj -----
Kimden: "Weder Carlos Vieira" <[email protected]>
Kime: [email protected]
Gönderilenler: 31 Temmuz Çarşamba 2013 19:26:55
Konu: Re: Revaluation

I running this command below inside a linux script.

bin/nutch generate -topN 50
bin/nutch fetch -all
bin/nutch parse -all
bin/nutch updatedb

This takes 7 minutes to run...


Tks



On Wed, Jul 31, 2013 at 1:19 PM, Weder Carlos Vieira <[email protected]
> wrote:

> Hello
>
> Testing nutch today I could see that nutch is a little slow. This is
> because it is reviewing the urls already reviewed? checking for updates?
>
> Anyone knows if I can change it?  Change nutch to find out just news urls
> to parse?
>
>
> Thanks
> Weder
>

Reply via email to