As I understand it, those properties will only limit the number of URLs that are crawled per site for each time you run generate.
But since Nutch works in such a way that you need to do an infinite loop of generate/fetch in order to recrawl sites then the total number of URLs that are crawled for one site will not be limited by the generate.max.count parameter. Am I right? Best regards, --Anders Rask www.findwise.com Den 11 april 2012 17:14 skrev Markus Jelsma <markus.jel...@openindex.io>: > Check these properties: > > 560 <property> > 561 <name>generate.max.count</name> > 562 <value>-1</value> > 563 <description>The maximum number of urls in a single > 564 fetchlist. -1 if unlimited. The urls are counted according > 565 to the value of the parameter generator.count.mode. > 566 </description> > 567 </property> > 568 > 569 <property> > 570 <name>generate.count.mode</name> > 571 <value>host</value> > 572 <description>Determines how the URLs are counted for > generator.max.count. > 573 Default value is 'host' but can be 'domain'. Note that we do not > count > 574 per IP in the new version of the Generator. > 575 </description> > 576 </property> > > > > On Wednesday 11 April 2012 17:05:04 Anders Rask wrote: > > Hi! > > > > I would like to be able to limit how many pages Nutch crawls from a > > specific site, either by specifying the total number of pages to crawl > from > > one site or by specifying a depth of how many links that should be > followed > > from the initial seed. > > > > I've been working with Nutch for some time now but haven't been able to > > figure out how this can be achieved. So my question is: Is there any way > to > > configure Nutch for this, and if not are there any plans to implement > this > > functionality? > > > > > > Best regards, > > --Anders Rask > > www.findwise.com > > -- > Markus Jelsma - CTO - Openindex >