Re: Crawling process - Fetching

Markus Jelsma Thu, 28 Apr 2011 08:07:51 -0700

It depends on many settings. Please read the parameter descriptions in nutch-
site, it'll help you understand why.


And don't forget, as Nutch fetches a page, it discovers new URLs. For each of 
those new URLs, new URLs will be discovered. It would go on forever if there 
were no useful settings or urlfilters to keep the flow of URLs under control.

On Thursday 28 April 2011 10:20:01 jotta wrote:
> Hi!
> 
> I have question about fetching - one of crawling process's stage.
> When I use commands to injecting url and fetching content, Nutch gets
> different number of records to fetch e.g. during first fetching it usually
> gets one record (main page), then about 50 records and then 100 records and
> so on.
> 
> On what depends number of records getting to fetch and can I change it?
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Crawling-process-Fetching-tp2873786p287
> 3786.html Sent from the Nutch - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Crawling process - Fetching

Reply via email to