re-Crawl re-fetch all pages each time

vetus Thu, 15 Nov 2012 06:35:54 -0800

Hello,

I have a problem...


I'm trying to index a small domain, and I'm using
org.apache.nutch.crawl.Crawler to do it. The problem, is that after the
crawler has indexed all the pages of the domain, I execute the crawler
again... and It fetch all the pages again althoug the fetch interval has not
expired...
This is wrong because it generates a lot of connections...

I'm using the default config and this is the command that I execute:

org.apache.nutch.crawl.Crawler  -depth 1 -threads 1 -topN 5

Can you help me? please

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/re-Crawl-re-fetch-all-pages-each-time-tp4020464.html
Sent from the Nutch - User mailing list archive at Nabble.com.

re-Crawl re-fetch all pages each time

Reply via email to