Re: topN value in crawl

2009-08-20 Thread Marko Bauhardt
On Aug 19, 2009, at 8:42 PM, alx...@aim.com wrote: hi Thanks. What if urls in my seed file do not have outlinks, let say .pdf files. Should I still specify topN variable? All I need is to index all urls in my seed file. And they are about 1 M. topN means that your generated

Re: topN value in crawl

2009-08-20 Thread alxsss
@lucene.apache.org Sent: Thu, Aug 20, 2009 12:17 am Subject: Re: topN value in crawl On Aug 19, 2009, at 8:42 PM, alx...@aim.com wrote:? ? ? ? ? hi? ? ? ? Thanks. What if urls in my seed file do not have outlinks, let say .pdf files. Should I still specify topN variable? All I need

topN value in crawl

2009-08-19 Thread alxsss
Hi, I have read a few tutorials on running Nutch to crawl web. However, I still do not understand the meaning of topN variable in crawl command. In tutorials it is suggested to create 3 segments and fetch them with topN=1000. What if I create 100 segments or only one. What would be

Re: topN value in crawl

2009-08-19 Thread alxsss
@lucene.apache.org Sent: Wed, Aug 19, 2009 11:02 am Subject: Re: topN value in crawl On Wed, Aug 19, 2009 at 12:13 PM, alx...@aim.com wrote: ?Hi, I have read a few tutorials on running Nutch to crawl web. However, I still do not understand the meaning of topN variable in crawl command