Hi Marek, With your settings the generator should select all records that are _eligible_ for fetch due to their fetch time being expired. I suspect that you generate, fetch, update and generate again. In the meanwhile the DB may have changed so this would explain this behaviour.
If you do not update the DB it will (by default) always generate identical fetch lists under the similar circustances. I think it sometimes generates only ~1k because you already fetched all other records. Cheers On Wednesday 02 November 2011 14:03:08 Marek Bachmann wrote: > Hello people, > > can someone explain me how the generator genrates the fetch lists? > > In particular: > > I don't understand why it generates fetch lists which very different > amounts of urls. > > Sometimes it generates > 25k urls and somestimes > 1k. > > In every case there were more than >25k urls unfetched in the crawldb. > So I was expecting that it always generates ~ 25k urls. But as I said > before, sometimes only ~ 1k. > > In my nutch-site.xml I have defined following values: > > <property> > <name>generate.max.count</name> > <value>-1</value> > <description>The maximum number of urls in a single > fetchlist. -1 if unlimited. The urls are counted according > to the value of the parameter generator.count.mode. > </description> > </property> > > <property> > <name>generate.max.count</name> > <value>-1</value> > <description>The maximum number of urls in a single > fetchlist. -1 if unlimited. The urls are counted according > to the value of the parameter generator.count.mode. > </description> > </property> > > Any ideas? > > Thanks -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

